Program Name: RAPSearch (rapsearch)
Version: 1.01
Released: May 17, 2011
Developers: Yuzhen Ye <yuwwu@indiana.edu>, Justin Choi(jeochoi@cgb.indiana.edu), and Haixu Tang <hatang@indiana.edu>
Affiliation: School of Informatics and Computing, Indiana University, Bloomington

The development of rapsearch was supported by NIH grant 1R01HG004908 to YY

rapsearch is free software under the terms of the GNU General Public License as published by 
the Free Software Foundation.

>> What's new as compared to previous release(s)
   --fixed the -e option

>> Before you start
   RAPSearch means Reduced Alphabet based Protein similarity Search 
   (note RAPSearch was named SWIFT)

   A linux/unix computer that has 8G memory should be sufficient 
   for similarity search against database of any size

>> Introduction

rapsearch is a tool for fast protein similarity search for short reads. 

rapsearch on the web:
  http://omics.informatics.indiana.edu/mg/RAPSearch
  (please check the project home page for updates and newer version of the rapsearch)

>> Installation

simply call: install

The executable files "rapsearch" and "prerapsearch", and a python wrapper will be created and placed under bin/
rapsearch:    the similarity search tool
prerapsearch: the program that generates suffix array files to be used by rapsearch
run_rapsearch.py: python wrapper to use multiple processors 

>> Using prerapsearch & rapsearch

1. Before using rapsearch for similarity search against a database, run prerapsearch to prepare a suffix array 
file for the database. 

Usage: prerapsearch -d database -n suffix-array-file

Example 1: prerapsearch -d nogCOGdomN95.seq -n nogCOGdomN95
Input: nogCOGdomN95.seq (a fasta file)
Outputs: nogCOGdomN95.swt (a binary file with suffix array information) & a couple of other files

Example 2: prerapsearch -d nr -n nr
Input: nr (NCBI nr file)
Outputs: nr.swt.0, nr.swt.1, etc

2. Using rapsearch for similarity search

Usage: type rapsearch for usages

Example 1: rapsearch -q 4440037.3.dna.fa -d nogCOGdomN95 -o 4440037.3.dna-vs-nogCOGdomN95
Input:  4440037.3.dna.fa #query file, note if it is a file of short nucleotide sequences, 
	nogCOGdomN95     #the base name of the similarity search database 

Output: 4440037.3.dna-vs-nogCOGdomN95.m8  #the similarty search result, 
                                           #one hit in one line, like -m 8 output from blast
					   #note the only difference is that log10(E-value) is listed in the file
					   #maximum 500 hits per query
Output: 4440037.3.dna-vs-nogCOGdomN95.aln #detailed alignments
					   #maximum 100 alignments per query

the program outputs hits with log_10(E-value) < 1 (i.e., E-value of 10); 
you may change this by setting log_10(E-value) using -e option

>> Using run_rapsearch.py to utilize multiple processors

Usage: run_rapsearch.py -q query -d database -o outputfile -p number-of-processors 
Example: run_rapsearch.py -q 4440037.3.dna.fa -d nogCOGdomN95 -p 3 -o 4440037.3.dna-vs-nogCOGdomN95
         (in this example, the similarity search of the query dataset is split into 3 jobs, and 
	  the similarity search results will be merged in the end)

>> More notes 
1. The sample files (e.g.,4440037.3.dna.fa & nogCOGdomN95.seq) are available at the rapsearch website; nr can
   be downloaded from NCBI ftp site. 
2. For big similarity search database (like nr), prerapsearch automatically splits the database into 
   several files to reduce the memory requirement. If you need further reduce the memory usage,
   you can choose to use -s option to run prerapsearch