How to avoid inputing large data

By using clean-reads.pl, the size of raw data file is reduced to an acceptable size for the web server because low quality reads and 3/5' adaptor sequences can be filtered and removed.

1. low quality reads (in default):
For Solexa/Illumina 1.0 format, the quality value can be calculated by Q = (ASCII character code) – 64. If Q < 9, then the reads were defined as low quality reads.
For Sanger format, the quality value can be calculated by Q = (ASCII character code) – 33. If Q < 15, then the reads were defined as low quality reads.
For Illumina 1.3+ format, the quality value can be calculated by Q = (ASCII character code) – 64. If Q < 10, then the reads were defined as low quality reads.

2. adaptor sequences:
Solexa use standard 3- and 5-adaptor sequences in their small RNA libraryso it is not necessary analysis the adaptor sequences. If the adaptor sequence is at 5' of the read, then the read is removed. If the adaptor sequence is at 3' of the read, the adaptor sequence is trimmed from the read sequence and discard reads <16nt after the removal of 3' adaptor.

The data in FASTQ format could be purified into FASTA format. An example sequence in FASTA format as follow:

>tagid1_785344 ("tagid1" = unique ID, "785344" = reads count)
TGAGGTAGTAGATTGTATAGTT

Usage:
Options:
-i Short reads file in fastq file_type
-T File type. default=1
  1 = Solexa/Illumina 1.0 format: encode a Solexa/Illumina quality score from -5 to 62 using ASCII 59 to 126
  2 = Sanger format: encode a Phred quality score from 0 to 93 using ASCII 33 to 126
  3 = Illumina 1.3+ format: encode a Phred quality score from 0 to 62 using ASCII 64 to 126
-t Sample name. default=tagid
-f 5\' adaptor. default="GTTCAGAGTTCTACAGTCCGACGATC"
-r 3\' adaptor. default="TCGTATGCCGTCTTCTGCTTG"
-l The minmal length of the reads. default=16
-h Help


Examples:

perl clean_reads.pl -i xxx.fq -T 1 -t "tagid" -f
"GTTCAGAGTTCTACAGTCCGACGATC" -r"TCGTATGCCGTCTTCTGCTTG" -l 16 >output

 

 

 

or:

perl clean_reads.pl -i xxx.fq >output
 

if the default parameters were selected.
The clean_reads.pl is freely available for all users at here. (clean_reads.pl)

RPG