Búsqueda | BVS Bolivia

FIRM: Flexible integration of single-cell RNA-sequencing data for large-scale multi-tissue cell atlas datasets.

Ming, Jingsi; Lin, Zhixiang; Zhao, Jia; Wan, Xiang; Yang, Can; Wu, Angela Ruohao.

Brief Bioinform ; 23(5)2022 09 20.

Artículo en Inglés | MEDLINE | ID: mdl-35561293

RESUMEN

Single-cell RNA-sequencing (scRNA-seq) is being used extensively to measure the mRNA expression of individual cells from deconstructed tissues, organs and even entire organisms to generate cell atlas references, leading to discoveries of novel cell types and deeper insight into biological trajectories. These massive datasets are usually collected from many samples using different scRNA-seq technology platforms, including the popular SMART-Seq2 (SS2) and 10X platforms. Inherent heterogeneities between platforms, tissues and other batch effects make scRNA-seq data difficult to compare and integrate, especially in large-scale cell atlas efforts; yet, accurate integration is essential for gaining deeper insights into cell biology. We present FIRM, a re-scaling algorithm which accounts for the effects of cell type compositions, and achieve accurate integration of scRNA-seq datasets across multiple tissue types, platforms and experimental batches. Compared with existing state-of-the-art integration methods, FIRM provides accurate mixing of shared cell type identities and superior preservation of original structure without overcorrection, generating robust integrated datasets for downstream exploration and analysis. FIRM is also a facile way to transfer cell type labels and annotations from one dataset to another, making it a reliable and versatile tool for scRNA-seq analysis, especially for cell atlas data integration.

Asunto(s)

Perfilación de la Expresión Génica , Análisis de la Célula Individual , Perfilación de la Expresión Génica/métodos , ARN , ARN Mensajero , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos

Adversarial domain translation networks for integrating large-scale atlas-level single-cell datasets.

Zhao, Jia; Wang, Gefei; Ming, Jingsi; Lin, Zhixiang; Wang, Yang; Wu, Angela Ruohao; Yang, Can.

Nat Comput Sci ; 2(5): 317-330, 2022 May.

Artículo en Inglés | MEDLINE | ID: mdl-38177826

RESUMEN

The rapid emergence of large-scale atlas-level single-cell RNA-seq datasets presents remarkable opportunities for broad and deep biological investigations through integrative analyses. However, harmonizing such datasets requires integration approaches to be not only computationally scalable, but also capable of preserving a wide range of fine-grained cell populations. We have created Portal, a unified framework of adversarial domain translation to learn harmonized representations of datasets. When compared to other state-of-the-art methods, Portal achieves better performance for preserving biological variation during integration, while achieving the integration of millions of cells, in minutes, with low memory consumption. We show that Portal is widely applicable to integrating datasets across different samples, platforms and data types. We also apply Portal to the integration of cross-species datasets with limited shared information among them, elucidating biological insights into the similarities and divergences in the spermatogenesis process among mouse, macaque and human.

RNA splicing programs define tissue compartments and cell types at single-cell resolution.

Olivieri, Julia Eve; Dehghannasiri, Roozbeh; Wang, Peter L; Jang, SoRi; de Morree, Antoine; Tan, Serena Y; Ming, Jingsi; Ruohao Wu, Angela; Quake, Stephen R; Krasnow, Mark A; Salzman, Julia.

Elife ; 102021 09 13.

Artículo en Inglés | MEDLINE | ID: mdl-34515025

RESUMEN

The extent splicing is regulated at single-cell resolution has remained controversial due to both available data and methods to interpret it. We apply the SpliZ, a new statistical approach, to detect cell-type-specific splicing in >110K cells from 12 human tissues. Using 10X Chromium data for discovery, 9.1% of genes with computable SpliZ scores are cell-type-specifically spliced, including ubiquitously expressed genes MYL6 and RPS24. These results are validated with RNA FISH, single-cell PCR, and Smart-seq2. SpliZ analysis reveals 170 genes with regulated splicing during human spermatogenesis, including examples conserved in mouse and mouse lemur. The SpliZ allows model-based identification of subpopulations indistinguishable based on gene expression, illustrated by subpopulation-specific splicing of classical monocytes involving an ultraconserved exon in SAT1. Together, this analysis of differential splicing across multiple organs establishes that splicing is regulated cell-type-specifically.

Asunto(s)

Cheirogaleidae/genética , Ratones/genética , Empalme del ARN , Análisis de la Célula Individual , Animales

LPM: a latent probit model to characterize the relationship among complex traits using summary statistics from multiple GWASs and functional annotations.

Ming, Jingsi; Wang, Tao; Yang, Can.

Bioinformatics ; 36(8): 2506-2514, 2020 04 15.

Artículo en Inglés | MEDLINE | ID: mdl-31860024

RESUMEN

MOTIVATION: Much effort has been made toward understanding the genetic architecture of complex traits and diseases. In the past decade, fruitful GWAS findings have highlighted the important role of regulatory variants and pervasive pleiotropy. Because of the accumulation of GWAS data on a wide range of phenotypes and high-quality functional annotations in different cell types, it is timely to develop a statistical framework to explore the genetic architecture of human complex traits by integrating rich data resources. RESULTS: In this study, we propose a unified statistical approach, aiming to characterize relationship among complex traits, and prioritize risk variants by leveraging regulatory information collected in functional annotations. Specifically, we consider a latent probit model (LPM) to integrate summary-level GWAS data and functional annotations. The developed computational framework not only makes LPM scalable to hundreds of annotations and phenotypes but also ensures its statistically guaranteed accuracy. Through comprehensive simulation studies, we evaluated LPM's performance and compared it with related methods. Then, we applied it to analyze 44 GWASs with 9 genic category annotations and 127 cell-type specific functional annotations. The results demonstrate the benefits of LPM and gain insights of genetic architecture of complex traits. AVAILABILITY AND IMPLEMENTATION: The LPM package, all simulation codes and real datasets in this study are available at https://github.com/mingjingsi/LPM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Estudio de Asociación del Genoma Completo , Herencia Multifactorial , Humanos , Anotación de Secuencia Molecular , Fenotipo , Polimorfismo de Nucleótido Simple

Bayesian weighted Mendelian randomization for causal inference based on summary statistics.

Zhao, Jia; Ming, Jingsi; Hu, Xianghong; Chen, Gang; Liu, Jin; Yang, Can.

Bioinformatics ; 36(5): 1501-1508, 2020 03 01.

Artículo en Inglés | MEDLINE | ID: mdl-31593215

RESUMEN

MOTIVATION: The results from Genome-Wide Association Studies (GWAS) on thousands of phenotypes provide an unprecedented opportunity to infer the causal effect of one phenotype (exposure) on another (outcome). Mendelian randomization (MR), an instrumental variable (IV) method, has been introduced for causal inference using GWAS data. Due to the polygenic architecture of complex traits/diseases and the ubiquity of pleiotropy, however, MR has many unique challenges compared to conventional IV methods. RESULTS: We propose a Bayesian weighted Mendelian randomization (BWMR) for causal inference to address these challenges. In our BWMR model, the uncertainty of weak effects owing to polygenicity has been taken into account and the violation of IV assumption due to pleiotropy has been addressed through outlier detection by Bayesian weighting. To make the causal inference based on BWMR computationally stable and efficient, we developed a variational expectation-maximization (VEM) algorithm. Moreover, we have also derived an exact closed-form formula to correct the posterior covariance which is often underestimated in variational inference. Through comprehensive simulation studies, we evaluated the performance of BWMR, demonstrating the advantage of BWMR over its competitors. Then we applied BWMR to make causal inference between 130 metabolites and 93 complex human traits, uncovering novel causal relationship between exposure and outcome traits. AVAILABILITY AND IMPLEMENTATION: The BWMR software is available at https://github.com/jiazhao97/BWMR. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Estudio de Asociación del Genoma Completo , Análisis de la Aleatorización Mendeliana , Algoritmos , Teorema de Bayes , Humanos , Polimorfismo de Nucleótido Simple , Programas Informáticos

LSMM: a statistical approach to integrating functional annotations with genome-wide association studies.

Ming, Jingsi; Dai, Mingwei; Cai, Mingxuan; Wan, Xiang; Liu, Jin; Yang, Can.

Bioinformatics ; 34(16): 2788-2796, 2018 08 15.

Artículo en Inglés | MEDLINE | ID: mdl-29608640

RESUMEN

Motivation: Thousands of risk variants underlying complex phenotypes (quantitative traits and diseases) have been identified in genome-wide association studies (GWAS). However, there are still two major challenges towards deepening our understanding of the genetic architectures of complex phenotypes. First, the majority of GWAS hits are in non-coding region and their biological interpretation is still unclear. Second, accumulating evidence from GWAS suggests the polygenicity of complex traits, i.e. a complex trait is often affected by many variants with small or moderate effects, whereas a large proportion of risk variants with small effects remain unknown. Results: The availability of functional annotation data enables us to address the above challenges. In this study, we propose a latent sparse mixed model (LSMM) to integrate functional annotations with GWAS data. Not only does it increase the statistical power of identifying risk variants, but also offers more biological insights by detecting relevant functional annotations. To allow LSMM scalable to millions of variants and hundreds of functional annotations, we developed an efficient variational expectation-maximization algorithm for model parameter estimation and statistical inference. We first conducted comprehensive simulation studies to evaluate the performance of LSMM. Then we applied it to analyze 30 GWAS of complex phenotypes integrated with nine genic category annotations and 127 cell-type specific functional annotations from the Roadmap project. The results demonstrate that our method possesses more statistical power than conventional methods, and can help researchers achieve deeper understanding of genetic architecture of these complex phenotypes. Availability and implementation: The LSMM software is available at https://github.com/mingjingsi/LSMM. Supplementary information: Supplementary data are available at Bioinformatics online.

Asunto(s)

Estudio de Asociación del Genoma Completo , Algoritmos , Estudio de Asociación del Genoma Completo/métodos , Anotación de Secuencia Molecular , Fenotipo , Programas Informáticos

IGESS: a statistical approach to integrating individual-level genotype data and summary statistics in genome-wide association studies.

Dai, Mingwei; Ming, Jingsi; Cai, Mingxuan; Liu, Jin; Yang, Can; Wan, Xiang; Xu, Zongben.

Bioinformatics ; 33(18): 2882-2889, 2017 Sep 15.

Artículo en Inglés | MEDLINE | ID: mdl-28498950

RESUMEN

MOTIVATION: Results from genome-wide association studies (GWAS) suggest that a complex phenotype is often affected by many variants with small effects, known as 'polygenicity'. Tens of thousands of samples are often required to ensure statistical power of identifying these variants with small effects. However, it is often the case that a research group can only get approval for the access to individual-level genotype data with a limited sample size (e.g. a few hundreds or thousands). Meanwhile, summary statistics generated using single-variant-based analysis are becoming publicly available. The sample sizes associated with the summary statistics datasets are usually quite large. How to make the most efficient use of existing abundant data resources largely remains an open question. RESULTS: In this study, we propose a statistical approach, IGESS, to increasing statistical power of identifying risk variants and improving accuracy of risk prediction by i ntegrating individual level ge notype data and s ummary s tatistics. An efficient algorithm based on variational inference is developed to handle the genome-wide analysis. Through comprehensive simulation studies, we demonstrated the advantages of IGESS over the methods which take either individual-level data or summary statistics data as input. We applied IGESS to perform integrative analysis of Crohns Disease from WTCCC and summary statistics from other studies. IGESS was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.2% ( ±0.4% ) to 69.4% ( ±0.1% ) using about 240 000 variants. AVAILABILITY AND IMPLEMENTATION: The IGESS software is available at https://github.com/daviddaigithub/IGESS . CONTACT: zbxu@xjtu.edu.cn or xwan@comp.hkbu.edu.hk or eeyang@hkbu.edu.hk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Estudio de Asociación del Genoma Completo/métodos , Modelos Estadísticos , Programas Informáticos , Algoritmos , Humanos , Tamaño de la Muestra

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA