Note that if you choose to output more than one solution, you will later be asked how many output models you wish Architect to produce—more details [below](#enumber-of-output-models).| | 2 | This option is only applicable if the user wants to output multiple gap-filling solutions. The default option indicates that any solutions worse than 10% of the optimal will not be returned; please note that the value of a gap-filling solution is given by the sum of the penalties attributed to each of the gap-filling reactions in question.

Setting this number lower will tighten the quality of solutions of interest, and the opposite loosen these restrictions. | | 3 | When performing gap-filling, an integer variable is assigned to each candidate gap-filling reaction, such that, in theory, a value of 1 is ascribed to the ith candidate gap-filling reaction if this reaction is part of the gap-filling solution being returned (and 0 otherwise). In practice however, the solver will not necessarily return integral variables. The integrality tolerance indicates how far from 0 or 1 a value can be to still be considered "integral".

If you find that gap-filling is taking more time than desirable (for example, in my experience, gap-filling that takes longer than an hour and finishes on SciNet at the very least is rare), you may set this tolerance higher, for example, starting with E-7, then E-6 and so on.| | 4 | By default, gap-filling candidate reactions associated with low-confidence EC predictions are associated with a penalty of addition less than 1. Remaining gap-filling candidate reactions that are part of the reaction database have a penalty of addition of 1. As for transport reactions for deadend metabolites in the high-confidence network, they are associated with a default penalty of 1.

If you wish to discourage the addition of such reactions, you can experiment with increasing this penalty. This is in particular relevant if you wish to enforce specific media you have specified to be used by a model. See [here](#fDefinition-of-a-media-to-be-used-by-a-model-BiGG-reconstructions) for more details with respect to BiGG-based reconstructions.| ## e. Number of output models If you specify to Architect to output more than one gap-filling solution (say n solutions), you will have the option to output fewer Excel/SBML final models (between 1 and n). To be more specific, two sets of output can be obtained from Architect after running its model reconstruction module. 1. Gap-filling solutions are, here, lists of reactions that can be used to obtain a model that is able to produce a minimum amount of the user-specified objective. - The output here is used in part to actually create the final SBML and Excel outputs. - These results can be found in $OUTPUT_DIR/$PROJECT/Model_reconstruction/temp. - The file SIMULATION_high_confidence_reactions.out contains high-confidence reactions (ie, prior to any gap-filling). - The file ESSENTIAL_reactions.out contains the sets of reactions that are necessary for the user-defined objective to carry non-zero flux, and may include gap-filling reactions. - On the other hand, the file model_gapfilled_multi_x.lst will contain, on each line, alternate sets of reactions that can be used for gap-filling where x is the number of gap-filled solutions you wish to look at. - You may consult the file model_gapfilled_multi_x.lst_check_nec_and_suff.out to get a sense of the quality of the solutions being output. The column "Is_functional" indicates whether the model is functional with this current set of gap-filling reactions, and the column "All_reactions" essential" indicates if all these reactions are indispensable for non-zero flux through the objective function. Ideally, both conditions will have been met; unfortunately, due to the complexity of the gap-filling problem, these may not be met, and currently we leave it up to the user to look through these alternate solutions. - Please note that these results are overwritten whenever Architect's model reconstruction module is re-run in the same output folder. 2. Gap-filled models are lists of reactions, complete with meta-data such as reaction name and metabolite name, that are ready to be used for constraints-based modeling. - They are found in the Final_* folder under $OUTPUT_DIR/$PROJECT/Model_reconstruction and are available as Excel and SBML models. The reason I separate these two kinds of outputs is that, in my experience, the SBML model output can be very large, and thus we leave it at the discretion of the user to determine the number of output models that is sensible to output. ## f. Definition of a media to be used by a model (BiGG reconstructions) In the case of BiGG reconstructions, growth in specific media (such as minimal media) can be specified by a user. Currently, the code does not support the specification of multiple media. These media definitions are borrowed from CarveMe (more information given [here](https://github.com/ParkinsonLab/Architect/blob/master/dependency/CarveMe/carveme/data/input/media_db.tsv)). We suggest that when specific media is defined that a high penalty (for example 10) is specified for the addition of exchange reactions for deadend metabolites. # 6. References Buchfink, B. et al., Fast and sensitive protein alignment using DIAMOND. Nature Methods, 2015. 12(1): p. 59-60. Claudel-Renard, C., et al., Enzyme-specific profiles for genome annotation: PRIAM. Nucleic Acids Res, 2003. 31(22): p. 6633-9. CPLEX: available at https://www.ibm.com/analytics/cplex-optimizer Hung, S.S., et al., DETECT--a density estimation tool for enzyme classification and its application to Plasmodium falciparum. Bioinformatics, 2010. 26(14): p. 1690-8. Keating, S.M., et al., SBML Level 3: an extensible format for the exchange and reuse of biological models. Mol Syst Biol, 2020. 16(8): e9110. Kumar, N. and J. Skolnick, EFICAz2.5: application of a high-precision enzyme function predictor to 396 proteomes. Bioinformatics, 2012. 28(20): p. 2687-8 Machado, D., et al., Fast automated reconstruction of genome-scale metabolic models for microbial species and communities. Nucleic Acids Res, 2018. 46(15): p. 7542-7553. Nguyen, N.N., et al., ENZDP: Improved enzyme annotation for metabolic network reconstruction based on domain composition profiles. Journal of Bioinformatics and Computational Biology, 2015. 13(5). Nursimulu, N., et al., Improved enzyme annotation with EC-specific cutoffs using DETECT v2. Bioinformatics, 2018. 34(19): p. 3393-3395. Nursimulu, N., Moses A.M. and Parkinson J. Architect: a tool for producing high-quality metabolic models through improved enzyme annotation. BioRxiv doi: 10.1101/2021.10.12.464133. https://www.biorxiv.org/content/10.1101/2021.10.12.464133v1.full Ponce, M. et al., Deploying a Top-100 Supercomputer for Large Parallel Workloads: the Niagara Supercomputer. PEARC'19 Proceedings, 2019. Ryu, J.Y., et al., Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers. PNAS, 2019. 116(28): p. 13996–14001. Yu, C., et al., Genome-wide enzyme annotation with precision control: catalytic families (CatFam) databases. Proteins, 2009. 74(2): p. 449-60.