Program Name: RAPSearch (rapsearch) Version: 1.01 Released: May 17, 2011 Developers: Yuzhen Ye , Justin Choi(jeochoi@cgb.indiana.edu), and Haixu Tang Affiliation: School of Informatics and Computing, Indiana University, Bloomington The development of rapsearch was supported by NIH grant 1R01HG004908 to YY rapsearch is free software under the terms of the GNU General Public License as published by the Free Software Foundation. >> What's new as compared to previous release(s) --fixed the -e option >> Before you start RAPSearch means Reduced Alphabet based Protein similarity Search (note RAPSearch was named SWIFT) A linux/unix computer that has 8G memory should be sufficient for similarity search against database of any size >> Introduction rapsearch is a tool for fast protein similarity search for short reads. rapsearch on the web: http://omics.informatics.indiana.edu/mg/RAPSearch (please check the project home page for updates and newer version of the rapsearch) >> Installation simply call: install The executable files "rapsearch" and "prerapsearch", and a python wrapper will be created and placed under bin/ rapsearch: the similarity search tool prerapsearch: the program that generates suffix array files to be used by rapsearch run_rapsearch.py: python wrapper to use multiple processors >> Using prerapsearch & rapsearch 1. Before using rapsearch for similarity search against a database, run prerapsearch to prepare a suffix array file for the database. Usage: prerapsearch -d database -n suffix-array-file Example 1: prerapsearch -d nogCOGdomN95.seq -n nogCOGdomN95 Input: nogCOGdomN95.seq (a fasta file) Outputs: nogCOGdomN95.swt (a binary file with suffix array information) & a couple of other files Example 2: prerapsearch -d nr -n nr Input: nr (NCBI nr file) Outputs: nr.swt.0, nr.swt.1, etc 2. Using rapsearch for similarity search Usage: type rapsearch for usages Example 1: rapsearch -q 4440037.3.dna.fa -d nogCOGdomN95 -o 4440037.3.dna-vs-nogCOGdomN95 Input: 4440037.3.dna.fa #query file, note if it is a file of short nucleotide sequences, nogCOGdomN95 #the base name of the similarity search database Output: 4440037.3.dna-vs-nogCOGdomN95.m8 #the similarty search result, #one hit in one line, like -m 8 output from blast #note the only difference is that log10(E-value) is listed in the file #maximum 500 hits per query Output: 4440037.3.dna-vs-nogCOGdomN95.aln #detailed alignments #maximum 100 alignments per query the program outputs hits with log_10(E-value) < 1 (i.e., E-value of 10); you may change this by setting log_10(E-value) using -e option >> Using run_rapsearch.py to utilize multiple processors Usage: run_rapsearch.py -q query -d database -o outputfile -p number-of-processors Example: run_rapsearch.py -q 4440037.3.dna.fa -d nogCOGdomN95 -p 3 -o 4440037.3.dna-vs-nogCOGdomN95 (in this example, the similarity search of the query dataset is split into 3 jobs, and the similarity search results will be merged in the end) >> More notes 1. The sample files (e.g.,4440037.3.dna.fa & nogCOGdomN95.seq) are available at the rapsearch website; nr can be downloaded from NCBI ftp site. 2. For big similarity search database (like nr), prerapsearch automatically splits the database into several files to reduce the memory requirement. If you need further reduce the memory usage, you can choose to use -s option to run prerapsearch