parse_module_definitions¶
author: Haris Zafeiropoulos package: microbetag description: Aim of this script is to build all the unique sets of KO terms that can be used to build up each KEGG module (https://www.genome.jp/brite/ko00002) output: A 2-levels .json file, in the first one, the various steps will be denoted and in the second the multiple alternative combinations of terms
will be shown. All the terms of a combination are necessary for the module to be complete
- notes: The pathway module is defined by the logical expression of K numbers, and the signature module
is defined by the logical expression of K numbers and M numbers. A SPACE ( ) or a PLUS (+) sign, representing a connection in the pathway or the molecular complex, is treated as an AND operator and a COMMA (,), used for alternatives, is treated as an OR operator. A MINUS (-) sign designates an optional item in the complex. This script was inspired by 2 functiosn from the following script of microbeAnnotator: https://github.com/cruizperez/MicrobeAnnotator/tree/master/microbeannotator/data/01.KEGG_DB/00.KEGG_Data_Scrapper.py
Attributes¶
Functions¶
|
Takes a nested list and returns its contents in a sequential one |
Takes a string and returns independent scenarions separated by commas (,) |
|
|
Function to tell you whether a string is included in a single parenthesis |
|
It takes a complete step and returns its unique indipendent pats recursively |
|
Takes a string and returns indices where you can split it to parts that can be combined |
|
Parses a module's definitions to each main steps |
|
Breaks down a module to its steps using the parse() function |
|
This function returns all the possible combinations of KOs to have a complete KEGG module |
Module Contents¶
- parse_module_definitions.structurals = ['M00144', 'M00149', 'M00151', 'M00152', 'M00154', 'M00155', 'M00153', 'M00156', 'M00158', 'M00160'][source]¶
- parse_module_definitions.flatten(lis)[source]¶
Takes a nested list and returns its contents in a sequential one e.g. [[a,b,c,][d,e,]] –> [a,b,c,d,e]
- parse_module_definitions.parse_commas_on_pre_and_post_character(string)[source]¶
Takes a string and returns independent scenarions separated by commas (,) e.g. K02304,(K24866+K03794) [‘K02304’, ‘(K24866+K03794)’]
- parse_module_definitions.check_if_all_in_one_par(string)[source]¶
Function to tell you whether a string is included in a single parenthesis e.g.: ((K00705,K22451)_(K02438,K01200)) or not e.g.: K00975_(K00703,K13679,K20812)
- parse_module_definitions.get_independent_step_alternatives(step_as_a_list)[source]¶
It takes a complete step and returns its unique indipendent pats recursively e.g.: in the first round for ((K13939,(K13940,K01633 K00950) K00796),(K01633 K13941)) we get the (K01633 K13941)) as an independent way while in the second one, we get the K13939
- parse_module_definitions.split_to_independent_chunks(string)[source]¶
Takes a string and returns indices where you can split it to parts that can be combined independently to get the part of the corresponding KEGG module definition e.g. “K00941_(K00788,K21220)” [0, 7, 22] or “((K03831,K03638)_K03750)” [0, 24]
- parse_module_definitions.parse(my_string)[source]¶
Parses a module’s definitions to each main steps e.g. md definition: (K02303,K13542) (K03394,K13540) K02229 (K05934,K13540,K13541) K05936 K02228 K05895 K00595 K06042 K02224 K02230+K09882+K09883 [‘(K02303,K13542)’, ‘(K03394,K13540)’, ‘K02229’, ‘(K05934,K13540,K13541)’, ‘K05936’, ‘K02228’, ‘K05895’, ‘K00595’, ‘K06042’, ‘K02224’, ‘K02230+K09882+K09883’]
- parse_module_definitions.parse_regular_module_dictionary(module_components_raw, structural_list)[source]¶
Breaks down a module to its steps using the parse() function