Users can upload a sequence file or paste the sequence(s) in the submission page. All sequences must be in the FASTA format. To ensure that every sequence has a unique ID, we will check and assign an ID number to each sequence. Special characters (including white space, dot and comma) will be removed from the sequences. However, if a sequence contains illegal letters (e.g., numbers), the job will be terminated.
Input GFF file
As an option, users can also provide gene information in GFF (General Feature Format) format. Otherwise, CRISPRone will call FragGeneScan to predict protein coding genes.
In the GFF file, the information of contig/genome ID, gene location and gene translation direction must be provided.
CRISPRone will report the CRISPR-Cas system prediction results for each sequence. A sequence may contain both CRISPR array(s) and cas genes, or CRISPR array(s) only, or cas gene(s) only, or none of them. The type or subtype of identified CRISPR–Cas systems will be reported based on type/subtype signature genes. In addition to provide results in text, CRISPRone provides visualization of the predicted systems.
Visualization of predicted CRISPR-Cas systems
CRISPRone provides both global and local views of predicted CRISPR-Cas system(s). The type/subtype signature cas genes will be colored according to the types/subtype. Putative cas genes belonging to our newly defined Cas families are shown in dark red. Other cas genes are called "other" and shown in purple. CRISPRone predicts putative tracrRNA genes for predicted type II CRISPR–Cas sytems by looking for anti-repeat regions. Each putative tracrRNA is represented as a red triangle in the visualization (the detailed information, such as the length of the anti-repeat region, the number of mismatches in the alignment between the anti-repeat and the repeat sequence, and its direction of the tracrRNA gene, is also made available on the website).
Elements of interest
Users can move the mouse over to the elements to see the detailed information:
Cas genes: Locations and type/subtype information;
CRISPR array: Location and array information (how many repeats in the array);
tracrRNA: Location and direction, anti-repeat region length & mismatch.
Cas families/cas genes
Genes encoding CRISPR-associated (Cas) proteins are usually found next to the CRISPR arrays. Cas proteins play important roles in three distinct stages (adaptation, expression and interference) of CRISPR defense mechanism. Based on the different participating Cas proteins, CRISPR–Cas immune systems are now divided into two classes (class 1 and class 2). Each class has several types: class I has types I, III and IV and class II has types II and V and VI.
Cas1 and Cas2 genes are found to be universally present in all three types. Besides the cas1 and cas2 genes, typical type I loci usually contain the cas3 gene, which encode helicase and DNase activities. Type II (also known as the Nmeni) system contain cas9 gene, which is its signature gene. Type III systems can be further divided into subtypes III‑A (known as Mtube including csm genes) and III‑B (also known as the polymerase–RAMP module including cmr genes). More information about the classification of cas genes could be found in Paper: Evolution and classification of the CRISPR-Cas systems. Nat. Rev. Microbiol. 2011,9;467–477 [Link]
More recently discovered Cas proteins include Cas12 and Cas13 protins. Like Cas9, Cas12a (Cfp1) is a large protein, and it is the sole protein in the crRNA-effector complex. [Ref]
Types/subtypes of predicted CRISPR–Cas system
There are two classes and six main types of the CRISPR—Cas system based on the defense mechanism & cas genes involved.
The class 1 system performs the function by a multisubunit Cas protein complex, and the class 2 system requires only a single Cas protein (Cas9, Cas12 or Cas13) in the crRNA-effector complex.
Class I includes type I, III and IV and VI; Class II includes type II and V.
Differences between Cas12a (also called Cpf1, type V signature gene) and Cas9 (type II signature gene), the two class 2 systems. (Ref)
Cas9 requires two RNA molecules (crRNA and traceRNA) to cut DNA; Cas12a (Cpf1) needs only one (just crRNA).
Cas12a (Cpf1) also cuts DNA in a different way. Cas9 cuts both strands in a DNA molecule at the same position, leaving behind what molecular biologists call ‘blunt’ ends. But Cpf1 leaves one strand longer than the other, creating a 'sticky' end.
Extracompact (400-700 aa) cas effector proteins (Cas14) were recently discovered in archaeal genomes. Despite their small size, Cas14 proteins are capable of targeted single-stranded DNA (ssDNA) cleavage without restrictive sequence requirements. (ref)
Some types include subtypes:
Type I includes I-A, I-B, I-C, I-D, I-E, I-F, and I-U subtypes;
Type II includes II-A, II-B, and II-C subtypes;
Type III includes III-A, III-B, and III-U subtypes;
Type IV invludes distinct variants A and B, one of which contains a DinG family helicase, and a second one that lacks DinG but typically contains a gene encoding a small α-helical protein, which is a putative small subunit.
Type V includes A (signature gene cas12a), B (signature gene cas12b), C (signature gene cas12c), D (signature gene cas12d, also called CasY) and E (signature gene cas13e, also called CasX), and subtypes with extracompact Cas14 proteins (we named them subtype V-compact).
Type VI includes A (signature gene cas13a1), C (signature gene cas13a2) and B (signature gene cas13b) subtypes.
For CRISPRone visualization, each type/subtype CRISPR system gets a unique color.
anti-repeat (associated with tracrRNA)
For type II CRISPR–Cas systems, CRISPRone identifies anti-repeat regions. Using the consensus sequence of the repeats derived from the CRISPR array as the query, CRISPRone searches for putative anti-repeats in the cas locus and in its neighborhood (1,000 bps upstream and downstream) with loose criteria (anti-repeat and repeat are known to be only partially complementary): the pairing length is at least 15 bps with at most 2 mismatches.
We tested our procedure for anti-repeat detection using the examples reported in Chylinski et al 2013 (RNA Biology). The results (as reported in the table below) show that our pipeline can detect most of the tracrRNAs, except for some weak genes (with very short anti-report regions).
tracrRNA reported in Chylinski et al 2013
tracrRNA predicted by CRISPRone
Match length (of the anti-repeat with repeat), mismatch
Neisseria meningitidis serogroup A strain Z2491 (NC_003116)
Type II-C tracrRNA: 23-1
Pasteurella multocida str. Pm70 (NC_002663)
Type II-C tracrRNA: 24-1
Listeria_innocua Clip 11262 (NC_003212)
Type II tracrRNA: (1)25-1 (2)12-0
Suspicious cas genes
Some genes may encode for proteins that are involved in, but not dedicated to the CRISPR–Cas immunity system. These genes, esp. when they are not found isolated along the genome, we call them suspicious. CRISPRone provides report of these genes, in case users might be interested in these genes as well.
Some elements superfically reassemble CRISPRs but are not actual CRISPRs. Mock CRISPRs may be tendem repeats, STAR-like elements, and others. Read more about this topic.
A sequence is considered to contain a questionable CRISPR–Cas system if CRISPR array(s) are predicted, but no cas genes are found in the sequence.
No gene prediction/annotation is perfect. FragGeneScan may miss a gene; or you may see a missing gene in the NCBI gene annotation. For example, in this example (Streptococcus thermophilus LMD-9), FragGeneScan recovered two csm1 genes there were missed in the NCBI annotation.