Building GENREs on microbetag¶
microbetag supports 2 ways to reconstruct GEMs based on the user’s genomes/bins:
using the
modelseedpyPython libraryusing the
CarveMetool
In the first case, modelseedpy requires RAST-annotated genomes.
microbetag can do that on its own starting from your genome sequences;
alternative, you may provide these to be used for the GEM reconstruction directly if you already have them
(either from previous microbetag runs or from other software).
Note
modelseedpy needs to establish a connection to the RAST server (RastClient())
In some cases, based on the status of the RAST server, we have observed that time errors may occur.
In this case, microbetag will exit and force a restart of its running on its own!
Yet, it is a good practice to also check its status when the modelseed reconstruction step is running.
In the following paragraphs, we highlight how to go for different scenarios of GEMs reconstruction using different file types as initial starting points.
One need to combine 2 parameters of the config.yml file to specify those scenarios: the sc_input_type where one specifies the file type and the sequence_files_for_reconstructions that points to the directory where the files to be used are located.
using modelseedpy and your bins¶
in this case, you have set
sc_input_typeasbins_fasta, andsequence_files_for_reconstructionsis blankgenre_reconstruction_withasmodelseedpy
Then, microbetag will use RASTtk programs to RAST annotate the original genomes/bins.
In the output_directory, a folder called reconstructions has been built and in this case, 3 files for each genome/bin are now available:
.gtoand.gto_2: these are genome typed object, i.e. JSON files that are compatible with KBase. The.gto_2is a second genome typed object with all the RAST annotation data..faaincludes the same information as the.gto_2file, but we export the protein translations in.fastaformat
Note
For our 7 genomes/bins this step may take about 1 hour depending on your computing system
using modelseedpy and your already RAST annotated genomes¶
Assuming you already have the .faa files coming from the rast-tk package, you may use them directly by setting
sc_input_typeasproteins_faa, andsequence_files_for_reconstructionsas the path to the folder with your.faafilesgenre_reconstruction_withasmodelseedpy
In this case, microbetag will have to establish connections with the RAST client like before.
Note
If your annotated genomes include the DNA sequences instead of the protein ones (.fna files) you may use them by setting the
sc_input_type as coding_regions.
using carveme¶
sc_input_typeasbins_fastasequence_files_for_reconstructionsis blankgenre_reconstruction_withascarveme
In this case, under the reconstructions file, we have a .tsv file for each genome/bin with the findings of the diamond against the internal database of carveme with the BiGG reactions.
bin_151.peg.3 |
iLJ478.TM0057 |
57.9 |
309 |
125 |
3 |
6 |
310 |
2 |
309 |
2.72e-128 |
369 |
bin_151.peg.3 |
iLJ478.TM1063 |
55.9 |
311 |
130 |
3 |
7 |
310 |
3 |
313 |
5.36e-124 |
358 |
For a thorough description of each column, you may check this here.