STATISTICAL AND COMPUTATIONAL METHODS FOR ANALYZING SINGLE-CELL RNA-SEQ AND IMMUNE PROFILING DATA

Embargo until
2026-08-01
Date
2022-07-21
Journal Title
Journal ISSN
Volume Title
Publisher
Johns Hopkins University
Abstract
With the advancement of single-cell technologies, single-cell RNA-seq experiments increasingly generate data from multiple biological or patient samples. In addition to single modality, single-cell multimodal omics, such as paired single-cell RNA-seq (scRNA-seq) and single-cell TCR-seq (scTCR-seq), enables one to profile multiple data types in the same cell simultaneously and thus provide unprecedented opportunities to study the complex interactions among different features from multiple molecular layers. However, analyzing and visualizing the complex cell type-phenotype association in such multi-sample single-cell data remains challenging. First, we develop TreeCorTreat, an open source computational tool that utilizes a tree-based correlation screen to analyze and visualize the association between phenotype and transcriptomic features and cell types at multiple cell type resolution levels. We also introduce a new TreeCorTreat plot to summarize and visualize the results. With TreeCorTreat, one can conveniently explore, visualize and compare results from different cell types, resolutions, feature types, traits, datasets, analysis protocols and covariate adjustments. These functionalities are demonstrated through two real data datasets: a COVID-19 dataset and a non-small cell lung cancer study. Second, we develop TreeCorIWAS to interrogate a large collection of repertoire features and transcriptional profiles simultaneously and systematically at different cell type resolutions. TreeCorIWAS can facilitate the detection of the immune features associated with sample phenotype that are defined by gene expression profile and the comparison of transcriptional profile changes across different immunophenotypic groups. Third, we utilize immune profiling data to confirm the existence of unique memory CD4+ T cell clonotypes crossrecognizing severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and common cold coronaviruses (CCCs) and assess the functional avidity. Overall, this thesis provides new statistical and computational insights for analyzing large, complex, multi-sample high-throughput sequencing datasets.
Description
Keywords
Single cell genomics, immune repertoire, hierarchical clustering tree, multi-resolution and multi-feature type association analysis, TreeCorTreat plot
Citation