miRNA

The XICRA pipeline provides this module, miRNA, to generate the miRNA analysis. MicroRNAs (miRNAs), a class of small non-coding RNAs, have an average length of 21–23 nucleotides (nt) and modulate gene expression post-transcriptionally. Most miRNA expression studies based on next generation sequencing (NGS) data, summarize all the reads mapping to a specific miRNA locus or miRNA sequence with or without mismatches and assign it to a single miRNA entity, this is, a unique miRBase reference database entry (miRBase database is a searchable database of published miRNA sequences and annotation).

However, this type of analysis neglects the fact that not all reads are identical to the reference sequence in miRBase, which is called the canonical sequence. Small RNA sequencing NGS methodology has revealed that miRNAs can frequently appear in the form of multiple sequence variants or isoforms (termed isomiRs). Each isomiR can modulate gene expression post-transcriptionally.

With the miRNA module we capture all the available information at different levels, the analysis can be done at miRNA or isomiR level.

Its functionalyty is divided in three steps:
  1. miRNA analysis

  2. Standarization of the results

  3. Expression count matrix generation with miRTop (Command line tool to annotate with a standard naming miRNAs e isomiRs).

The analysis can be performed with three different softwares:

  • Miraligner: maps small RNA data to miRBase repository.

  • Optimir: algorithm for integrating available genome-wide genotype data into miRNA sequence alignment analysis.

  • sRNAbench: application for processing small-RNA data obtained from NGS platforms. Unfortunately, the downloading of this tool is no longer available, thus, only users with sRNAbench already installed will be able to run the XICRA analysis with it.

According to our tests, published in this article, miraligner is the software with the best performance for the miRNA analysis.

How to run the miRNA module

Executing the following:

XICRA miRNA -h

The different options and parameters for this module should appear in the command line prompt:

Module XICRA miRNA help
Parameters

--help (-h) – Show this help message and exit.

Module XICRA miRNA Input/Output
Parameters
  • --input – Folder containing a project or reads, according to the mode selected. Files could be .fastq/.fq/ or fastq.gz/.fq.gz. See –help_format for additional details. REQUIRED.

  • --output_folder – Output folder.

  • --single_end – Single end files. Default mode is paired-end. Default OFF.

  • --batch – Provide this option if input is a file containing multiple paths instead a path.

  • --in_sample – File containing a list of samples to include (one per line) from input folder(s). Default OFF.

  • --ex_sample – File containing a list of samples to exclude (one per line) from input folder(s). Default OFF.

  • --detached – Isolated mode. –input is a folder containing fastq reads. Provide a unique path o several using –batch option.

  • --include_lane – Include the lane tag (L00X) in the sample name. See –help_format for additional details. Default OFF.

  • --include_all – Include all characters as tag name before read pair, if any. See –help_format for additional details. Default OFF.

  • --noTrimm – Use non-trimmed reads (or not containing ‘_trim’ in the name).

Module XICRA miRNA options
Parameters
  • --threads – Number of CPUs to use. Default: 2.

  • --species – Species tag ID. Default: hsa (Homo sapiens).

  • --database – Path to store miRNA annotation files downloaded: miRBase, miRCarta, etc.

  • --miRNA_gff – miRBase hsa GFF file containing miRNA information.

  • --hairpinFasta – miRNA hairpin fasta file.

  • --matureFasta – miRNA mature fasta file.

  • --miRBase_str – miRBase str information.

Module XICRA miRNA software
Parameters

--software – Software to analyze miRNAs, sRNAbench, optimir, miraligner. Provide several input if desired separated by a space. REQUIRED.

Module XICRA miRNA additional information
Parameters
  • --help_format – Show additional help on name format for files.

  • --help_project – Show additional help on the project scheme.

  • --help_miRNA – Show additional help on the miRNA paired-end reads process.

  • --debug – Show additional message for debugging purposes.

  • For further information of the module functionallity, check this page.

Output of miRNA for each sample

As the rest of the modules, the miRNA module will generate a folder in each of the sample directories called “miRNA”. Inside this folder another two will be created for each of the softwares selected. For example, if we have executed --software optimir miraligner we will obtain four output folders:

  • data/sampleName/miRNA/optimiR

  • data/sampleName/miRNA/miraligner

  • data/sampleName/miRNA/optimiR_miRTop

  • data/sampleName/miRNA/miraligner_miRTop

The first two folders will store the outputs of the corresponding softwares in their particular format.

The folders ended in “_miRTop” will contain the results in the miRTop standarized format.

Finally, the expression count matrix will be stored in .tsv format. Following the previous example, these files would be located in:

  • data/sampleName/miRNA/optimiR_miRTop/counts/mirtop.tsv

  • data/sampleName/miRNA/miraligner_miRTop/counts/mirtop.tsv

Expression count matrix for each sample

As a result for each sample (and software used) we will end up with a table like this, mirtop.tsv, called the expression count matrix. Here we can see an example of this matrix:

UID

Read

miRNA

Variant

iso_5p

iso_3p

iso_add3p

iso_snp

sampleName

iso-22-B175JXN0Q

AAACCGTTACCATTACTGAGTT

hsa-miR-451a

NA

0

0

0

0

69047

iso-23-B175JXN00O

AAACCGTTACCATTACTGAGTTA

hsa-miR-451a

iso_add3p:1

0

0

1

0

169

iso-24-B175JXN0KF

AAACCGTTACCATTACTGAGTTAA

hsa-miR-451a

iso_add3p:2

0

0

2

0

1

iso-23-B175JXN005

AAACCGTTACCATTACTGAGTTC

hsa-miR-451a

iso_add3p:1

0

0

1

0

3

iso-23-B175JXN00P

AAACCGTTACCATTACTGAGTTG

hsa-miR-451a

iso_add3p:1

0

0

1

0

108

iso-23-B175JXN00Q

AAACCGTTACCATTACTGAGTTT

hsa-miR-451a

iso_3p:+1

0

1

0

0

35289

iso-24-B175JXN0KO

AAACCGTTACCATTACTGAGTTTA

hsa-miR-451a

iso_3p:+2

0

2

0

0

675

  • UID: unique identifier (UID) for each sequence defined by miRTop.

  • Read: DNA sequence.

  • miRNA: miRNA precursor, identifier defined by miRBase for each miRNA canonical sequence.

  • Variant: variant type of each isomiR, ‘NA’ for the canonical sequence (checkout the miRTop variant nomenclature).

  • The following four columns indicate the amount of base pairs added or substracted, compared to the canonical sequence.

  • SampleName: raw read count expression for this sample.

Output of miRNA, comparing samples

On the other hand, as other modules, miRNA also builds an output to compare samples. In the folder report/miRNA, three different files will be created for each software executed. For example, if we have run --software miraligner, we will obtain the following files:

  • report/miRNA/miRNA_expression-miraligner_dup.csv: Matrix with the number of reads of each UID of each sample that are duplicated. Normally, they occur when some bases are added at the beginning and the end, so it cannot be differentiated if they are 3p:+1;5p:+2 or 3p:+2;5p:+1. In those cases, they will both have the same UID. They are removed (they typically have very few counts).

  • report/miRNA/miRNA_expression-miraligner.csv: Final matrix (without the duplicated UIDs). Number of counts of each UID of each sample, to be further analyzed with R.

  • report/miRNA/miRNA_expression-miraligner_seq.csv: table with the DNA sequence corresponding to each UID.

The analysis of the matrix stored in miRNA_expression-miraligner.csv can be done at the isomiR level, differenciating by UID, variant type or miRNA (just considering the miRNA identifier). It can be done with the package XICRA.stats.