--- title: Building GENREs layout: default parent: Additional tutorials nav_order: 3 description: "how to build GENREs on microbetag" --- # Building GENREs on `microbetag` `microbetag` supports 2 ways to reconstruct GEMs based on the user's genomes/bins: 1. using the [`modelseedpy`](https://github.com/ModelSEED/ModelSEEDpy) Python library 2. using the [`CarveMe`](https://carveme.readthedocs.io/en/latest/) tool In the first case, `modelseedpy` requires RAST-annotated genomes. `microbetag` can do that on its own starting from your genome sequences; alternative, you may provide these to be used for the GEM reconstruction directly if you already have them (either from previous `microbetag` runs or from other software). ```{note} `modelseedpy` needs to establish a connection to the RAST server (`RastClient()`) In some cases, based on the status of the RAST server, we have observed that time errors may occur. In this case, `microbetag` will exit and force a restart of its running on its own! Yet, it is a good practice to also check its status when the `modelseed` reconstruction step is running. ``` In the following paragraphs, we highlight how to go for different scenarios of GEMs reconstruction using different file types as initial starting points. One need to combine 2 parameters of the `config.yml` file to specify those scenarios: the `sc_input_type` where one specifies the file type and the `sequence_files_for_reconstructions` that points to the directory where the files to be used are located. ### using `modelseedpy` and your bins in this case, you have set - `sc_input_type` as `bins_fasta`, and - `sequence_files_for_reconstructions` is blank - `genre_reconstruction_with` as `modelseedpy` Then, `microbetag` will use [`RASTtk` programs](https://www.bv-brc.org/docs///cli_tutorial/rasttk_getting_started.html) to RAST annotate the original genomes/bins. In the `output_directory`, a folder called `reconstructions` has been built and in this case, 3 files for each genome/bin are now available: - `.gto` and `.gto_2`: these are genome typed object, i.e. JSON files that are compatible with KBase. The `.gto_2` is a second genome typed object with all the RAST annotation data. - `.faa` includes the same information as the `.gto_2` file, but we export the protein translations in `.fasta` format ```{note} For our 7 genomes/bins this step may take about 1 hour depending on your computing system ``` ### using `modelseedpy` and your already RAST annotated genomes Assuming you already have the `.faa` files coming from the `rast-tk` package, you may use them directly by setting - `sc_input_type` as `proteins_faa`, and - `sequence_files_for_reconstructions` as the path to the folder with your `.faa` files - `genre_reconstruction_with` as `modelseedpy` In this case, `microbetag` will have to establish connections with the RAST client like before. ```{note} If your annotated genomes include the DNA sequences instead of the protein ones (`.fna` files) you may use them by setting the `sc_input_type` as `coding_regions`. ``` ### using `carveme` - `sc_input_type` as `bins_fasta` - `sequence_files_for_reconstructions` is blank - `genre_reconstruction_with` as `carveme` In this case, under the `reconstructions` file, we have a `.tsv` file for each genome/bin with the findings of the `diamond` against the internal database of `carveme` with the BiGG reactions. | | | | | | | | | | | | | |:------------:|----------------:|------:|----:|-----:|----:|----:|-----:|---:|-----:|-----------:|-------:| | bin_151.peg.3 | iLJ478.TM0057 | 57.9 | 309 | 125 | 3 | 6 | 310 | 2 | 309 | 2.72e-128 | 369 | | bin_151.peg.3 | iLJ478.TM1063 | 55.9 | 311 | 130 | 3 | 7 | 310 | 3 | 313 | 5.36e-124 | 358 | For a thorough description of each column, you may check this [here](https://github.com/bbuchfink/diamond_docs/blob/master/1%20Tutorial.MD).