microbetag.db¶

Aim:: Establishing connection and query on microbetagDB. Useful for the API and the on-the-fly version of microbetag, more specifically for the phenotrex-based phenotypic traits and the pathway complementarity steps.

Notes

Needs a hidden file called .env_dev.json with the database credentials.

Attributes¶

DB_CREDENTIALS

Classes¶

GetPhenotrexTraits

Functions¶

`execute`(→ list[tuple])	Establish a database connection and perform an action
`execute_in_a_pool`(cursor, query)	Executes a query using a connection from the connection pool.
`init_connection_pool`(...)	Initiates a connection pool.
`gc_unify`(gc_list)	Removes duplicates of a genome that has entries both as GCA and GCF in the db.
`alt_genome_prefix`(gc)	Switches GCA prefic of a genome accession id to GCF and vice-versa.
`get_genomes_for_ncbi_tax_id`(ncbi_tax_id)	Get the genome IDs corresponding to a given NCBI Taxonomy ID from the microbetagDB.
`get_ncbi_tax_id_for_genome`(gc_id)
`patric_from_gc_list`(gc_accession_list)	Gets a list of GC accession ids and returns a dictionary where the GC ids are the keys
`update_for_patric`(module_nonseeds_pkl, gc_to_patric)
`get_path_compls_otf`(config)	On the fly way to get pathway complementarities. Builds the pathCompls.json and
`get_path_compls_for_ncbi_ids`(→ dict)
`query_for_getting_compl_ids`([beneficiary, donor])	Gets 2 gc accession ids and returns a query for their pathway complementarities
`build_complement_queries`(relative_genomes, ...)
`get_complement_ids`(unique_queries)
`map_queries_to_pairs`(complements_ids_queries, ...)
`get_coloured_complements`(pairs_to_compl_ids)	Gets thes actual complement using its unique complementId and builds its KEGG url.
`build_pairs_complements`(pairs_to_compl_ids, ...)
`build_kegg_urls`(genome_pair_compls)	Takes as input the complements list between two genomes and

Module Contents¶

microbetag.db.DB_CREDENTIALS[source]¶

microbetag.db.execute(phrase: str) → list[tuple][source]¶: Establish a database connection and perform an action

microbetag.db.execute_in_a_pool(cursor: mysql.connector, query: str)[source]¶: Executes a query using a connection from the connection pool.

microbetag.db.init_connection_pool() → mysql.connector.pooling.MySQLConnectionPool[source]¶: Initiates a connection pool. A pool opens a number of connections and handles thread safety when providing connections to requesters. For more see: https://dev.mysql.com/doc/connector-python/en/connector-python-connection-pooling.html

microbetag.db.gc_unify(gc_list)[source]¶: Removes duplicates of a genome that has entries both as GCA and GCF in the db.

microbetag.db.alt_genome_prefix(gc)[source]¶: Switches GCA prefic of a genome accession id to GCF and vice-versa.

microbetag.db.get_genomes_for_ncbi_tax_id(ncbi_tax_id: int)[source]¶

Get the genome IDs corresponding to a given NCBI Taxonomy ID from the microbetagDB.

Example: 1281578

microbetag.db.get_ncbi_tax_id_for_genome(gc_id: str)[source]¶

microbetag.db.patric_from_gc_list(gc_accession_list: list)[source]¶

Gets a list of GC accession ids and returns a dictionary where the GC ids are the keys and their corresponding PATRIC ids are the values.

[“GCA_003184265.1”]

microbetag.db.update_for_patric(module_nonseeds_pkl, gc_to_patric)[source]¶

Parameters:

module_nonseeds_pkl – path to nonseeds KEGG MODULE related pickle file
gc_to_patric – a dictionary with GTDB representative genomes as key and their corresponding PATRIC id as value.

class microbetag.db.GetPhenotrexTraits(config=None)[source]¶

get_phen_traits()[source]¶

Returns predictions for a list of genomes

repr_genomes_present, config.predictions_path

get_phendb_traits(gtdb_genome_id: str)[source]¶: Get phenotypical traits based on phenDB classes based on its GTDB representative genome “GCA_018819265.1”

static phen_query(gtdb_id)[source]¶: Builds query for microbetagDB

static to_csv(df, output_dir=None)[source]¶: Saves phenotypic trait predictions of a trait for a set of genomes in a file, in a 3-column format. Identifier, Trait present, Confidence The values the ‘Trait present’ column may get, is ‘YES, ‘NO’, and ‘N/A’.

microbetag.db.get_path_compls_otf(config)[source]¶

On the fly way to get pathway complementarities. Builds the pathCompls.json and the pathway_complements_extended.json files.

The first file, in the stand-alone version lools like this: {“bin_101”: {

“bin_101”: [], “bin_151”: [[“md:M00019”, [“K00826”], [“K01652”, “K01653”, “K00053”, “K01687”, “K00826”],

“https://www.kegg.jp/kegg-bin/show_pathway?map00290/K00826%09%23EAD1DC/K01687%09%2300A898/”], ..]

},..

} where the elements regarding the complement and the alternative are actually lists, and not strings, while the latter: {“bin_101”: {

“bin_101”: [], “bin_151”: {“0”: [“M00019”, “Valine/isoleucine biosynthesis”, “Branched-chain amino acid metabolism”,

“K00826”, “K01652;K01653;K00053;K01687;K00826”, “https://…]

}

}

To build those files for the on-the-fly version (otf), this function makes use of the config.otf_seq_tax_df attribute of the config, as returned by the taxonomy.py script.

It creates a list of tuples with the NCBI Taxonomy Ids of found related taxa (in the edgelist) and using their corresponding GTDB representative genomes gets their complements.

Parameters:

config – Insance of the microbetat’g config class but as edited in the app.py script where the otf_seq_tax_df attribute is added

Returns:

A pd.DataFrame with nodes names, NCBI Taxonomy ids and GTDB representative genomes for those: edges of the network that were both mapped to at least a GTDB genome.

Return type:

mspecies_map_df

microbetag.db.get_path_compls_for_ncbi_ids(relative_genomes: Dict[str, Set[str]], pairs_of_interest: Set[Tuple[str, str]]) → dict[source]¶

Parameters:

relative_genomes – A dictionary {“1260918”: {“GCF_002102185.1”}, “1819566”: {“GCF_009711525.1”}}
pairs_of_interest={ ("1260918", "1819566")

microbetag.db.query_for_getting_compl_ids(beneficiary='GCA_003184265.1', donor='GCA_000015645.1')[source]¶: Gets 2 gc accession ids and returns a query for their pathway complementarities

microbetag.db.build_complement_queries(relative_genomes, pairs_of_interest)[source]¶

microbetag.db.get_complement_ids(unique_queries)[source]¶

microbetag.db.map_queries_to_pairs(complements_ids_queries, unique_queries2comples)[source]¶

microbetag.db.get_coloured_complements(pairs_to_compl_ids: Dict[Tuple[str, str], dict[Tuple[str, str], list[str]]])[source]¶

Gets thes actual complement using its unique complementId and builds its KEGG url.

Parameters:: pairs_to_compl_ids – {(‘1260918’, ‘1819566’): {(‘GCA_002102185.1’, ‘GCA_009711525.1’): [‘180131’, ..’]}}
Returns:: []
Return type:: pairs_complements

microbetag.db.build_pairs_complements(pairs_to_compl_ids, all_compl_ids2coloured_compls)[source]¶

microbetag.db.build_kegg_urls(genome_pair_compls)[source]¶

Takes as input the complements list between two genomes and build urls to colorify the related to the module kegg map based on the KO terms of the beneficiary (pink) and those it gets from the donor (green).

Notes

Some modules do not belong to any map, e.g. https://www.kegg.jp/module/M00705. In these cases, we will have a N/A value in the url.