MinPath version 1.2 (Oct 20, 2010) (older versions: MinPath1.1 released on July 31, 2010; MinPath1.0 released on July 3, 2009) Developer: Yuzhen Ye Affiliation: School of Informatics, Indiana University, Bloomington 0. What's new MinPath1.2: works with any pathway system (see 3.1.3; and usages) MinPath1.1: outputs detailed pathway assignments 1. Introduction MinPath (Minimal set of Pathways) is a parsimony approach for biological pathway reconstructions using protein family predictions, achieving a more conservative, yet more faithful, estimation of the biological pathways for a query dataset. MinPath is free software under the terms of the GNU General Public License as published by the Free Software Foundation. If you use MinPath, please cite, Yuzhen Ye and Thomas G. Doak. A parsimony approach to biological pathway reconstruction/inference for genomes and metagenomes (PLoS Computational Biology, 5(8):e1000465, 2009) 2. Installation Download the minpath.tar.gz package, then run, tar zxvf minpath.tar.gz This creates MinPath directory with the source code, and the required files and package (see 3. Requirements) You need to setup MinPath environment variable before you use MinPath. If you use Bash, you can add a line in your .bashrc file as follows: PATH=$PATH:"/whateverfold/MinPath" (note: replace /whateverfold by the folder name where you install MinPath) 3. Requirements 3.1 files of reference pathways with function to pathway (subsystem) mapping 3.1.1 Files required for KEGG pathway reconstruction by MinPath (placed under data/) * map_title.tab * ko (note: the two files included in this package are from KEGG database as of Dec 2008. If you need more recent files, you may download tar files from KEGG ftp website ftp://ftp.genome.jp/pub/kegg/. After you extract the files, copy the corresponding files to data/) 3.1.2 File required for SEED subsystem reconstruction by MinPath (placed under data/) * figfam_subsystem.dat (note: current file is copied from figfam release6: Release6/FigfamsData.Release.6/release_history/figfam_subsystem.dat ) 3.1.3 MinPath1.2 now supports any pathway system * what you need to prepare is a pathway-function mapping file (see an example file, ec2path file under data/) 3.2 LP package for integer programming of the minimal pathway reconstruction package glpk-4.6 under glpk-4.6/ directory (you may need to re-install this package to use MinPath correctly, if you are NOT using it on a red hat linux machine; just go to glpk-4.6 directory, first call ./configure, then call make to re-install the package; make sure that you can call command glpsol under glpk-4.6/examples on your computer after you reinstall it) 4. Running MinPath 4.1 Simply call MinPath1.2.py to check its usage. Also check out the included examples as follows. 4.2 Examples (under examples/ directory): under examples/ python ../MinPath1.2.py -ko demo.ko -report demo.ko.report -details demo.ko.details python ../MinPath1.2.py -fig demo.fig -report demo.fig.report -details demo.fig.details python ../MinPath1.2.py -any demo.ec -map ec2path -report demo.ec.report -details demo.ec.details Input of MinPath: ko (or fig, or any) annotations (check demo.ko, demo.fig and demo.ec under examples/ directory) Outputs of MinPath: demo.ko.report, and demo.ko.details (and so on) 4.3 How to read the MinPath report file? e.g, demo.ko.report ... path 00030 kegg n/a naive 1 minpath 1 fam0 42 fam-found 18 name Pentose phosphate pathway path 00031 kegg n/a naive 1 minpath 0 fam0 12 fam-found 2 name Inositol metabolism ... 1) path 00030, path 00031 are the KEGG pathway IDs (if your input file has fig families, the pathways will then be SEED subsystems) 2) kegg n/a, indicates pathway reconstruction of your input dataset if not available (note: this information is only available for the genomes annotated in KEGG database); for input with fig families, KEGG is replaced by SEED 3) naive 1 or 0: the pathway is reconstructed, or not, by the naive mapping approach 4) minpath 1 or 0: the pathway is kept, or removed by MinPath 5) fam0: the total number of families involved in the corresponding pathway 6) fam-found: the total number of involved families that are annotated 7) name: the description of the corresponding pathway (subsystem) 4.4 How to read the MinPath detailed report file? This report file lists all the pathways found MinPath, and a list of families each pathway includes e.g. demo.ko.details ... path 00010 fam0 56 fam-found 27 # Glycolysis / Gluconeogenesis K00001 hits 6 # E1.1.1.1, adh K00002 hits 1 # E1.1.1.2, adh ... 1) path 00010 is the KEGG pathway ID, and fam0 and fam-found are the same as in 4.3 2) this pathway includes families, K00001 and K00002, and so on (here K numbers are used for KEGG families, and FIG ids for FIG families) 3) family K00001 has 6 hits (i.e., 6 proteins/reads are annotated as this family) 5. MinPath on the web: http://omics.informatics.indiana.edu/MinPath (please check the project home page for updates and newer version of the MinPath) 6. Acknowledgments Development was supported by NIH grant 1R01HG004908 to YY