Program Name: GeneStitch Version: 1.1 Developer: Yu-Wei Wu Mina Rho Thomas Doak Yuzhen Ye Affiliation: School of Informatics and Computing, Indiana University >> INTRODUCTION GeneStitch is a tool to assemble genes using network matching algorithm. Given an already-assembled dataset, it is capable of assembling contigs together to form more complete genes with the help of a reference gene set. Currently the assembly software that GeneStitch support is SOAPdenovo. Users are required to firstly assemble the input dataset using SOAPdenovo then feed the assembled data into GeneStitch. GeneStitch's home on the web: http://omics.informatics.indiana.edu/hmp/GeneStitch/ >> INSTALLATION To make an executable file, simple type "make." The executable file "GeneStitch" will be created and placed in the directory. >> Program requirement (added for v1.1) - RAPSearch2 GeneStitch needs RAPSearch2 to perform its functionality. RAPSearch2 could be downloaded at http://omics.informatics.indiana.edu/RAPSearch2. It is required that users add the RAPSearch2 program path into the $PATH environment parameter. Example command to add RAPSearch2 executable path into system environment: > export PATH=/usr/local/bin/RAPSearch2/bin:$PATH - FragGeneScan While it is not a mandatory requirement for GeneStitch, we recommend that users use FragGeneScan to predict genes beforehand so that GeneStitch could focus its focus on the fragmented genes. FragGeneScan program could be downloaded at http://omics.informatics.indiana.edu/FragGeneScan. It is recommended that users add the FragGeneScan program path into the $PATH environment parameter. Alternatively users can also specify the RAPSearch2 program location in the parameter -FragGeneScan_path. >> RUNNING GeneStitch Usage: GeneStitch -input [seq filename] -db [database fasta filename] -kmer_len [kmer length] [-db_protein (optional protein db filename)] [-use_FragGeneScan] [-FragGeneScan_path (full path of FragGeneScan directory. Set when the path is not in system env.)] Parameters: -input: the soapdenovo output prefix -db: the reference gene set (in nucleotide fasta format) -kmer_len: the kmer setting for the soapdenovo assembly (optional parameters) -db_protein: the reference gene set (in amino acid). This option will force the BLAST search to run BLASTX instead of BLASTN. One should note that the fasta header of the protein gene set need to be EXACTLY THE SAME as the nucleotide gene set. -use_FragGeneScan: Specify this parameter if you want GeneStitch to predict genes using FragGeneScan. -FragGeneScan_path: here you can specify the path of FragGeneScan so that the GeneStitch program will first predict genes using FragGeneScan and then do further analysis. example (assume the prefix of SOAPdenovo is sra_data, and the kmer length is 31): (use only nucleotide database) GeneStitch -input sra_data -db db_nuc.fa -kmer_len 31 (use both nucleotide and protein database) GeneStitch -input sra_data -db db_nuc.fa -db_protein db_pro.fa -kmer_len 31 (use only nucleotide database and use FragGeneScan) GeneStitch -input sra_data -db db_nuc.fa -kmer_len 31 -use_FragGeneScan -FragGeneScan_path /bin/FragGeneScan/run_FragGeneScan.pl >> ACKNOWLEDGEMENTS Development was supported by NIH 1R01HG004908 and NSF DBI-0845685 to YY