biotype¶
This module generates a RNA biotype analysis. The aim of this computation is to check if there are samples with a very different configuration, outliers that show different proportions of uniquely mapped, multimapped or no mapped reads or with different quantities of miRNA, misc_RNA,… than the rest. If this happens, it could be due to possible differences in sample manipulation, extraction, library preparation, sequencing, etc. Those samples should be excluded.
The module is divided in two steps:
Mapping the reads: performed with STAR software.
Feature counts: perforfed with featureCounts.
The output of this module is ‘descriptive’, thus, we won’t need the output to continue with the analysis, it is an informative step to know the RNA types that are present in each sample.
Note that: the mapping process performed with STAR software requires high RAM values.
We are working on the implementation of alternatives to be able to execute the biotype
module in computers with less RAM capacity.
How to run the biotype module¶
Executing the following:
XICRA biotype -h
The different options and parameters for this module should appear in the command line prompt:
- Module XICRA biotype help
- Parameters
--help (-h) – Show this help message and exit.
- Module XICRA QC Input/Output
- Parameters
--input – Folder containing the files with reads. Files could be .fastq/.fq/ or fastq.gz/.fq.gz. See --help_format for additional details. REQUIRED.
--output_folder – Output folder. Name for the project folder.
--single_end – Single end files. Default OFF. Default mode is paired-end.
--batch – Provide this option if input is a file containing multiple paths instead a path.
--in_sample – File containing a list of samples to include (one per line) from input folder(s). Default OFF.
--ex_sample – File containing a list of samples to exclude (one per line) from input folder(s). Default OFF.
--detached – Isolated mode. No project folder initiated for further steps. Default OFF.
--include_lane – Include the lane tag (L00X) in the sample identification. See --help_format for additional details. Default OFF.
--include_all – Include all file name characters in the sample identification. See --help_format for additional details. Default OFF.
- Module XICRA biotype options
- Parameters
--threads – Number of CPUs to use. Default: 2.
--annotation – Reference genome annotation in GTF format.
--limitRAM – limitRAM parameter for STAR mapping. Default 20 Gbytes.
--noTrim – Use non-trimmed reads [or not containing ‘_trim’ in the name].
--skip_report – Do not report statistics using MultiQC report module. Default OFF. See details in --help_multiqc
- Module XICRA biotype parameters
- Parameters
--no_multiMapping – Set NO to counting multimapping in the feature count.By default, multimapping reads are allowed. Default: False
STRANDED (--stranded) – Select if reads are stranded [1], reverse stranded [2] or non-stranded [0], Default: 0.
- Module XICRA biotype reference genome
- Parameters
--fasta – Reference genome to map reads.
--genomeDir – STAR genomeDir for reference genome.
- Module XICRA biotype additional information
- Parameters
--help_format – Show additional help on name format for files.
--help_project – Show additional help on the project scheme.
--help_RNAbiotype – Show additional help on the RNAbiotype paired-end reads process.
--debug – Show additional message for debugging purposes.
For further information of the module functionallity, check this page.
Output of biotype¶
Inside the data folder of each sample, a ‘map’ directory will be generated containing a report from the mapping of MultiQC. After that, a final report will also be created in the ‘report’ folder with the featureCounts information of all samples.