miRNA¶

The XICRA pipeline provides this module, miRNA, to generate the miRNA analysis. MicroRNAs (miRNAs), a class of small non-coding RNAs, have an average length of 21–23 nucleotides (nt) and modulate gene expression post-transcriptionally. Most miRNA expression studies based on next generation sequencing (NGS) data, summarize all the reads mapping to a specific miRNA locus or miRNA sequence with or without mismatches and assign it to a single miRNA entity, this is, a unique miRBase reference database entry (miRBase database is a searchable database of published miRNA sequences and annotation).

However, this type of analysis neglects the fact that not all reads are identical to the reference sequence in miRBase, which is called the canonical sequence. Small RNA sequencing NGS methodology has revealed that miRNAs can frequently appear in the form of multiple sequence variants or isoforms (termed isomiRs). Each isomiR can modulate gene expression post-transcriptionally.

With the miRNA module we capture all the available information at different levels, the analysis can be done at miRNA or isomiR level.

Its functionalyty is divided in three steps:

miRNA analysis
Standarization of the results
Expression count matrix generation with miRTop (Command line tool to annotate with a standard naming miRNAs e isomiRs).

The analysis can be performed with three different softwares:

Miraligner: maps small RNA data to miRBase repository.
Optimir: algorithm for integrating available genome-wide genotype data into miRNA sequence alignment analysis.
sRNAbench: application for processing small-RNA data obtained from NGS platforms. Unfortunately, the downloading of this tool is no longer available, thus, only users with sRNAbench already installed will be able to run the XICRA analysis with it.

According to our tests, published in this article, miraligner is the software with the best performance for the miRNA analysis.

How to run the miRNA module¶

Executing the following:

XICRA miRNA -h

The different options and parameters for this module should appear in the command line prompt:

Module XICRA miRNA help

Parameters: --help (-h) – Show this help message and exit.

Module XICRA miRNA Input/Output

Parameters

--input – Folder containing a project or reads, according to the mode selected. Files could be .fastq/.fq/ or fastq.gz/.fq.gz. See –help_format for additional details. REQUIRED.
--output_folder – Output folder.
--single_end – Single end files. Default mode is paired-end. Default OFF.
--batch – Provide this option if input is a file containing multiple paths instead a path.
--in_sample – File containing a list of samples to include (one per line) from input folder(s). Default OFF.
--ex_sample – File containing a list of samples to exclude (one per line) from input folder(s). Default OFF.
--detached – Isolated mode. –input is a folder containing fastq reads. Provide a unique path o several using –batch option.
--include_lane – Include the lane tag (L00X) in the sample name. See –help_format for additional details. Default OFF.
--include_all – Include all characters as tag name before read pair, if any. See –help_format for additional details. Default OFF.
--noTrimm – Use non-trimmed reads (or not containing ‘_trim’ in the name).

Module XICRA miRNA options

Parameters

--threads – Number of CPUs to use. Default: 2.
--species – Species tag ID. Default: hsa (Homo sapiens).
--database – Path to store miRNA annotation files downloaded: miRBase, miRCarta, etc.
--miRNA_gff – miRBase hsa GFF file containing miRNA information.
--hairpinFasta – miRNA hairpin fasta file.
--matureFasta – miRNA mature fasta file.
--miRBase_str – miRBase str information.

Module XICRA miRNA software

Parameters: --software – Software to analyze miRNAs, sRNAbench, optimir, miraligner. Provide several input if desired separated by a space. REQUIRED.

Module XICRA miRNA additional information

Parameters

--help_format – Show additional help on name format for files.
--help_project – Show additional help on the project scheme.
--help_miRNA – Show additional help on the miRNA paired-end reads process.
--debug – Show additional message for debugging purposes.

For further information of the module functionallity, check this page.

Output of miRNA for each sample¶

As the rest of the modules, the miRNA module will generate a folder in each of the sample directories called “miRNA”. Inside this folder another two will be created for each of the softwares selected. For example, if we have executed --software optimir miraligner we will obtain four output folders:

data/sampleName/miRNA/optimiR
data/sampleName/miRNA/miraligner
data/sampleName/miRNA/optimiR_miRTop
data/sampleName/miRNA/miraligner_miRTop

The first two folders will store the outputs of the corresponding softwares in their particular format.

The folders ended in “_miRTop” will contain the results in the miRTop standarized format.

Finally, the expression count matrix will be stored in .tsv format. Following the previous example, these files would be located in:

data/sampleName/miRNA/optimiR_miRTop/counts/mirtop.tsv
data/sampleName/miRNA/miraligner_miRTop/counts/mirtop.tsv

Expression count matrix for each sample¶

As a result for each sample (and software used) we will end up with a table like this, mirtop.tsv, called the expression count matrix. Here we can see an example of this matrix:

UID	Read	miRNA	Variant	iso_5p	iso_3p	iso_add3p	iso_snp	sampleName
iso-22-B175JXN0Q	AAACCGTTACCATTACTGAGTT	hsa-miR-451a	NA	0	0	0	0	69047
iso-23-B175JXN00O	AAACCGTTACCATTACTGAGTTA	hsa-miR-451a	iso_add3p:1	0	0	1	0	169
iso-24-B175JXN0KF	AAACCGTTACCATTACTGAGTTAA	hsa-miR-451a	iso_add3p:2	0	0	2	0	1
iso-23-B175JXN005	AAACCGTTACCATTACTGAGTTC	hsa-miR-451a	iso_add3p:1	0	0	1	0	3
iso-23-B175JXN00P	AAACCGTTACCATTACTGAGTTG	hsa-miR-451a	iso_add3p:1	0	0	1	0	108
iso-23-B175JXN00Q	AAACCGTTACCATTACTGAGTTT	hsa-miR-451a	iso_3p:+1	0	1	0	0	35289
iso-24-B175JXN0KO	AAACCGTTACCATTACTGAGTTTA	hsa-miR-451a	iso_3p:+2	0	2	0	0	675

UID: unique identifier (UID) for each sequence defined by miRTop.
Read: DNA sequence.
miRNA: miRNA precursor, identifier defined by miRBase for each miRNA canonical sequence.
Variant: variant type of each isomiR, ‘NA’ for the canonical sequence (checkout the miRTop variant nomenclature).
The following four columns indicate the amount of base pairs added or substracted, compared to the canonical sequence.
SampleName: raw read count expression for this sample.

Output of miRNA, comparing samples¶

On the other hand, as other modules, miRNA also builds an output to compare samples. In the folder report/miRNA, three different files will be created for each software executed. For example, if we have run --software miraligner, we will obtain the following files:

report/miRNA/miRNA_expression-miraligner_dup.csv: Matrix with the number of reads of each UID of each sample that are duplicated. Normally, they occur when some bases are added at the beginning and the end, so it cannot be differentiated if they are 3p:+1;5p:+2 or 3p:+2;5p:+1. In those cases, they will both have the same UID. They are removed (they typically have very few counts).
report/miRNA/miRNA_expression-miraligner.csv: Final matrix (without the duplicated UIDs). Number of counts of each UID of each sample, to be further analyzed with R.
report/miRNA/miRNA_expression-miraligner_seq.csv: table with the DNA sequence corresponding to each UID.

The analysis of the matrix stored in miRNA_expression-miraligner.csv can be done at the isomiR level, differenciating by UID, variant type or miRNA (just considering the miRNA identifier). It can be done with the package XICRA.stats.

miRNA¶

How to run the miRNA module¶

Output of miRNA for each sample¶

Expression count matrix for each sample¶

Output of miRNA, comparing samples¶

Table of Contents

Previous topic

Next topic

This Page