from the miRBase Blog by Sam
miRBase 21 is now available on the website, and all data available for download on the FTP site. As usual, the release notes describe the major changes. Of particular note this time, the Genome Reference Consortium have released a new human genome assembly, GRCh38. We have therefore remapped the human microRNA dataset to this assembly, which includes the removal of a handful of duplicate entries that now map to a single locus — for example, GRCh37 had 6 loci representing miR-3118, whereas GRCh38 has only 4. In total, there is a small increase in the number of annotated human microRNA loci, to 1881. Elsewhere in the database, the increases have been larger — we have hundreds of new sequences in each of bat, horse, goat, cobra and salmon, amongst others. In total, 4196 new hairpin sequences and 5441 new mature products have been added. The work to clean up dubious and misannotated sequences also goes on, with another 72 entries in total removed from this release.
Unfortunately, at the last moment, we’ve found an issue with the update of the “high confidence” microRNA dataset. Rather than delay the release further, we’ve decided to go ahead without the “high confidence” set for now. That will follow in the next few days, with an announcement here.