Meta-analysis of small RNA-sequencing errors reveals ubiquitous post-transcriptional RNA modifications.
Recent advances in DNA-sequencing technology have made it possible to
obtain large datasets of small RNA sequences.
Here we demonstrate that not all non-perfectly matched small RNA
sequences are simple technological sequencing errors, but many hold
valuable biological information. Analysis of three small RNA datasets
originating from Oryza sativa and Arabidopsis thaliana small
RNA-sequencing projects demonstrates that many single nucleotide
substitution errors overlap when aligning homologous non-identical
small RNA sequences. Investigating the sites and identities of
substitution errors reveal that many potentially originate as a
result of post-transcriptional modifications or RNA editing.
Modifications include N1-methyl modified purine nucleotides in tRNA,
potential deamination or base substitutions in micro RNAs, 3' micro
RNA uridine extensions and 5' micro RNA deletions. Additionally,
further analysis of large sequencing datasets reveal that the combined
effects of 5' deletions and 3' uridine extensions can alter the
specificity by which micro RNAs associate with different Argonaute
proteins. Hence, we demonstrate that not all sequencing errors in
small RNA datasets are technical artifacts, but that these actually
often reveal valuable biological insights to the sites of
post-transcriptional RNA modifications.