ENABLING EFFICIENT AND STREAMLINED ACCESS TO LARGE SCALE GENOMIC EXPRESSION AND SPLICING DATA

Embargo until
Date
2020-10-14
Journal Title
Journal ISSN
Volume Title
Publisher
Johns Hopkins University
Abstract
As more and larger genomics studies appear, there is a growing need for comprehensive and queryable cross-study summaries. We focus primarily on nearly 20,000 RNA-sequencing studies in human and mouse, consisting of more than 750,000 sequencing runs, and the coverage summaries derived from their alignment to their respective gnomes. In addition to the summarized RNA-seq derived data itself we present tools (Snaptron, Monorail, Megadepth, and recount3) that can be used by downstream researchers both to process their own data into comparable summaries as well as access and query our processed, publicly available data. Additionally we present a related study of errors in the splicing of long read transcriptomic alignments, including comparison to the existing splicing summaries from short reads already described (LongTron).
Description
Keywords
big data, RNA-seq, genomics, cloud
Citation