Reads files in FASTA format are required as input (See the figure below as example). User can also use FASTX toolkit to convert their data in FASTQ to FASTA format.
In order to minimize the size of uploaded files, we recommend the users to upload tag files, that will combine the identical reads into unique tags (figure below. e.g. "t0000001" = unique tag ID, "2397630" = reads count).
To further minimize the size of uploading files, users can use 7-Zip (for Windows) or tar -zcvf compressedFile.tar.gz uncompressedFile*.fasta command (for Linux, Mac OS) to compress the reads or tag file into *tar.gz format file. Also each tar.gz file must contain only one fasta file and with no subdirectory.
Notice: Both reads file and tag file should be in FASTA format and should not contain any adapters. User can use cutadapt to remove the adapter.
User can use tools provided on our server to convert FASTQ format into FASTA format and to compress it into *tar.gz. Click to view detailed information.
Pre-Installation
PreprocessFiles is a software to convert *.fastq files into *.fasta files and compress *.fasta files into *.tar.gz file. PreprocessFiles works with Unix based and Windows systems. Python environment is required, and Python 2.7.x. were recommended.
Python environment are pre-installed in Unix based systems. However, for Windows users, the following step is needed:
1. Open "windows commend" and enter "python -V".
2. If Python version displayed, users can skip to "Usage" step. Otherwise, users should install Python first. Please go to the official website of Python (https://www.python.org/), and download the "msi" installer (Python 2.7.x).
Usage
1. Download and decompress PreprocessFiles.tar.gz
tar -zxvf PreprocessFiles.tar.gz
cd PreprocessFiles
Two scripts named "main.py" and "tarfile.py" in the "PreprocessFiles" folder.
2. Type "main.py -h" to view the help information.
-h,Show help information:
The main.py script has three operating modes: –a, -v, -t (Users can enter their own definition of the results file name or using an input file name as the default name.)
3. The main.py script has three operating modes: -v, -t, -a (Users can enter their own definition of the results file name or using an input file name as the default name.)
"-v": Convert *.fastq flies into *.fasta files.
"-t": Compress files into *.tar.gz files. User can use this mode to generated files used for TARGET ANALYSIS (http://mcg.ustc.edu.cn/bsc/deanniso/enrichment.html).
"-a": Convert *.fastq flies into *.fasta files and compress *.fasta files into *.tar.gz file.