Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 102
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Proc Natl Acad Sci U S A ; 121(23): e2322376121, 2024 Jun 04.
Artigo em Inglês | MEDLINE | ID: mdl-38809705

RESUMO

In this article, we develop CausalEGM, a deep learning framework for nonlinear dimension reduction and generative modeling of the dependency among covariate features affecting treatment and response. CausalEGM can be used for estimating causal effects in both binary and continuous treatment settings. By learning a bidirectional transformation between the high-dimensional covariate space and a low-dimensional latent space and then modeling the dependencies of different subsets of the latent variables on the treatment and response, CausalEGM can extract the latent covariate features that affect both treatment and response. By conditioning on these features, one can mitigate the confounding effect of the high dimensional covariate on the estimation of the causal relation between treatment and response. In a series of experiments, the proposed method is shown to achieve superior performance over existing methods in both binary and continuous treatment settings. The improvement is substantial when the sample size is large and the covariate is of high dimension. Finally, we established excess risk bounds and consistency results for our method, and discuss how our approach is related to and improves upon other dimension reduction approaches in causal inference.

2.
Proc Natl Acad Sci U S A ; 120(28): e2305236120, 2023 07 11.
Artigo em Inglês | MEDLINE | ID: mdl-37399400

RESUMO

Plasma cell-free DNA (cfDNA) is a noninvasive biomarker for cell death of all organs. Deciphering the tissue origin of cfDNA can reveal abnormal cell death because of diseases, which has great clinical potential in disease detection and monitoring. Despite the great promise, the sensitive and accurate quantification of tissue-derived cfDNA remains challenging to existing methods due to the limited characterization of tissue methylation and the reliance on unsupervised methods. To fully exploit the clinical potential of tissue-derived cfDNA, here we present one of the largest comprehensive and high-resolution methylation atlas based on 521 noncancer tissue samples spanning 29 major types of human tissues. We systematically identified fragment-level tissue-specific methylation patterns and extensively validated them in orthogonal datasets. Based on the rich tissue methylation atlas, we develop the first supervised tissue deconvolution approach, a deep-learning-powered model, cfSort, for sensitive and accurate tissue deconvolution in cfDNA. On the benchmarking data, cfSort showed superior sensitivity and accuracy compared to the existing methods. We further demonstrated the clinical utilities of cfSort with two potential applications: aiding disease diagnosis and monitoring treatment side effects. The tissue-derived cfDNA fraction estimated from cfSort reflected the clinical outcomes of the patients. In summary, the tissue methylation atlas and cfSort enhanced the performance of tissue deconvolution in cfDNA, thus facilitating cfDNA-based disease detection and longitudinal treatment monitoring.


Assuntos
Ácidos Nucleicos Livres , Aprendizado Profundo , Humanos , Ácidos Nucleicos Livres/genética , Metilação de DNA , Biomarcadores , Regiões Promotoras Genéticas , Biomarcadores Tumorais/genética
3.
Hum Mol Genet ; 32(21): 3105-3120, 2023 10 17.
Artigo em Inglês | MEDLINE | ID: mdl-37584462

RESUMO

DNA methyltransferase type 1 (DNMT1) is a major enzyme involved in maintaining the methylation pattern after DNA replication. Mutations in DNMT1 have been associated with autosomal dominant cerebellar ataxia, deafness and narcolepsy (ADCA-DN). We used fibroblasts, induced pluripotent stem cells (iPSCs) and induced neurons (iNs) generated from patients with ADCA-DN and controls, to explore the epigenomic and transcriptomic effects of mutations in DNMT1. We show cell type-specific changes in gene expression and DNA methylation patterns. DNA methylation and gene expression changes were negatively correlated in iPSCs and iNs. In addition, we identified a group of genes associated with clinical phenotypes of ADCA-DN, including PDGFB and PRDM8 for cerebellar ataxia, psychosis and dementia and NR2F1 for deafness and optic atrophy. Furthermore, ZFP57, which is required to maintain gene imprinting through DNA methylation during early development, was hypomethylated in promoters and exhibited upregulated expression in patients with ADCA-DN in both iPSC and iNs. Our results provide insight into the functions of DNMT1 and the molecular changes associated with ADCA-DN, with potential implications for genes associated with related phenotypes.


Assuntos
Ataxia Cerebelar , Surdez , Humanos , Ataxia Cerebelar/genética , DNA (Citosina-5-)-Metiltransferases/genética , Transcriptoma/genética , Epigenômica , DNA (Citosina-5-)-Metiltransferase 1/genética , Metilação de DNA/genética , Surdez/genética , Mutação , DNA
4.
Nucleic Acids Res ; 51(D1): D159-D166, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36215037

RESUMO

Elucidating the role of 3D architecture of DNA in gene regulation is crucial for understanding cell differentiation, tissue homeostasis and disease development. Among various chromatin conformation capture methods, HiChIP has received increasing attention for its significant improvement over other methods in profiling of regulatory (e.g. H3K27ac) and structural (e.g. cohesin) interactions. To facilitate the studies of 3D regulatory interactions, we developed a HiChIP interactions database, HiChIPdb (http://health.tsinghua.edu.cn/hichipdb/). The current version of HiChIPdb contains ∼262M annotated HiChIP interactions from 200 high-throughput HiChIP samples across 108 cell types. The functionalities of HiChIPdb include: (i) standardized categorization of HiChIP interactions in a hierarchical structure based on organ, tissue and cell line and (ii) comprehensive annotations of HiChIP interactions with regulatory genes and GWAS Catalog SNPs. To the best of our knowledge, HiChIPdb is the first comprehensive database that utilizes a unified pipeline to map the functional interactions across diverse cell types and tissues in different resolutions. We believe this database has the potential to advance cutting-edge research in regulatory mechanisms in development and disease by removing the barrier in data aggregation, preprocessing, and analysis.


Assuntos
Cromatina , DNA , Linhagem Celular , Cromatina/genética , Regulação da Expressão Gênica , Análise de Sequência de DNA/métodos , Bases de Dados Genéticas
5.
Proc Natl Acad Sci U S A ; 119(1)2022 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-34930827

RESUMO

Abdominal aortic aneurysm (AAA) is a common degenerative cardiovascular disease whose pathobiology is not clearly understood. The cellular heterogeneity and cell-type-specific gene regulation of vascular cells in human AAA have not been well-characterized. Here, we performed analysis of whole-genome sequencing data in AAA patients versus controls with the aim of detecting disease-associated variants that may affect gene regulation in human aortic smooth muscle cells (AoSMC) and human aortic endothelial cells (HAEC), two cell types of high relevance to AAA disease. To support this analysis, we generated H3K27ac HiChIP data for these cell types and inferred cell-type-specific gene regulatory networks. We observed that AAA-associated variants were most enriched in regulatory regions in AoSMC, compared with HAEC and CD4+ cells. The cell-type-specific regulation defined by this HiChIP data supported the importance of ERG and the KLF family of transcription factors in AAA disease. The analysis of regulatory elements that contain noncoding variants and also are differentially open between AAA patients and controls revealed the significance of the interleukin-6-mediated signaling pathway. This finding was further validated by including information from the deleteriousness effect of nonsynonymous single-nucleotide variants in AAA patients and additional control data from the Medical Genome Reference Bank dataset. These results shed important insights into AAA pathogenesis and provide a model for cell-type-specific analysis of disease-associated variants.


Assuntos
Aneurisma da Aorta Abdominal/genética , Redes Reguladoras de Genes , Estudos de Casos e Controles , Células Cultivadas , Regulação para Baixo , Humanos , Interleucina-6/metabolismo , Fatores de Transcrição Kruppel-Like/genética , Regulador Transcricional ERG/genética
6.
Proc Natl Acad Sci U S A ; 118(15)2021 04 13.
Artigo em Inglês | MEDLINE | ID: mdl-33833061

RESUMO

Density estimation is one of the fundamental problems in both statistics and machine learning. In this study, we propose Roundtrip, a computational framework for general-purpose density estimation based on deep generative neural networks. Roundtrip retains the generative power of deep generative models, such as generative adversarial networks (GANs) while it also provides estimates of density values, thus supporting both data generation and density estimation. Unlike previous neural density estimators that put stringent conditions on the transformation from the latent space to the data space, Roundtrip enables the use of much more general mappings where target density is modeled by learning a manifold induced from a base density (e.g., Gaussian distribution). Roundtrip provides a statistical framework for GAN models where an explicit evaluation of density values is feasible. In numerical experiments, Roundtrip exceeds state-of-the-art performance in a diverse range of density estimation tasks.

7.
Genome Res ; 30(4): 622-634, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-32188700

RESUMO

A time course experiment is a widely used design in the study of cellular processes such as differentiation or response to stimuli. In this paper, we propose time course regulatory analysis (TimeReg) as a method for the analysis of gene regulatory networks based on paired gene expression and chromatin accessibility data from a time course. TimeReg can be used to prioritize regulatory elements, to extract core regulatory modules at each time point, to identify key regulators driving changes of the cellular state, and to causally connect the modules across different time points. We applied the method to analyze paired chromatin accessibility and gene expression data from a retinoic acid (RA)-induced mouse embryonic stem cells (mESCs) differentiation experiment. The analysis identified 57,048 novel regulatory elements regulating cerebellar development, synapse assembly, and hindbrain morphogenesis, which substantially extended our knowledge of cis-regulatory elements during differentiation. Using single-cell RNA-seq data, we showed that the core regulatory modules can reflect the properties of different subpopulations of cells. Finally, the driver regulators are shown to be important in clarifying the relations between modules across adjacent time points. As a second example, our method on Ascl1-induced direct reprogramming from fibroblast to neuron time course data identified Id1/2 as driver regulators of early stage of reprogramming.


Assuntos
Montagem e Desmontagem da Cromatina , Cromatina/genética , Regulação da Expressão Gênica , Células-Tronco Embrionárias Murinas/metabolismo , Algoritmos , Animais , Diferenciação Celular/efeitos dos fármacos , Diferenciação Celular/genética , Linhagem da Célula , Reprogramação Celular/genética , Técnicas de Reprogramação Celular , Cromatina/metabolismo , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes , Camundongos , Células-Tronco Embrionárias Murinas/efeitos dos fármacos , Fatores de Transcrição/metabolismo , Transcriptoma , Tretinoína/farmacologia
8.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34180954

RESUMO

Multi-omics data allow us to select a small set of informative markers for the discrimination of specific cell types and study of cellular heterogeneity. However, it is often challenging to choose an optimal marker panel from the high-dimensional molecular profiles for a large amount of cell types. Here, we propose a method called Mixed Integer programming Model to Identify Cell type-specific marker panel (MIMIC). MIMIC maintains the hierarchical topology among different cell types and simultaneously maximizes the specificity of a fixed number of selected markers. MIMIC was benchmarked on the mouse ENCODE RNA-seq dataset, with 29 diverse tissues, for 43 surface markers (SMs) and 1345 transcription factors (TFs). MIMIC could select biologically meaningful markers and is robust for different accuracy criteria. It shows advantages over the standard single gene-based approaches and widely used dimensional reduction methods, such as multidimensional scaling and t-SNE, both in accuracy and in biological interpretation. Furthermore, the combination of SMs and TFs achieves better specificity than SMs or TFs alone. Applying MIMIC to a large collection of 641 RNA-seq samples covering 231 cell types identifies a panel of TFs and SMs that reveal the modularity of cell type association networks. Finally, the scalability of MIMIC is demonstrated by selecting enhancer markers from mouse ENCODE data. MIMIC is freely available at https://github.com/MengZou1/MIMIC.


Assuntos
Biomarcadores , Biologia Computacional , Citometria de Fluxo/métodos , Perfilação da Expressão Gênica/métodos , Especificidade de Órgãos , Software , Algoritmos , Biologia Computacional/métodos , Bases de Dados Genéticas , Regulação da Expressão Gênica , Humanos , Especificidade de Órgãos/genética , Reprodutibilidade dos Testes
9.
Proc Natl Acad Sci U S A ; 117(35): 21364-21372, 2020 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-32817564

RESUMO

A person's genome typically contains millions of variants which represent the differences between this personal genome and the reference human genome. The interpretation of these variants, i.e., the assessment of their potential impact on a person's phenotype, is currently of great interest in human genetics and medicine. We have developed a prioritization tool called OpenCausal which takes as inputs 1) a personal genome and 2) a reference context-specific TF expression profile and returns a list of noncoding variants prioritized according to their impact on chromatin accessibility for any given genomic region of interest. We applied OpenCausal to 6,430 samples across 18 tissues derived from the GTEx project and found that the variants prioritized by OpenCausal are highly enriched for eQTLs and caQTLs. We further propose a strategy to integrate the predicted open scores with genome-wide association studies (GWAS) data to prioritize putative causal variants and regulatory elements for a given risk locus (i.e., fine-mapping analysis). As an initial example, we applied this method to a GWAS dataset of human height and found that the prioritized putative variants and elements are correlated with the phenotype (i.e., heights of individuals) better than others.


Assuntos
Técnicas Genéticas , Variação Genética , Genoma Humano , Modelos Genéticos , Elementos Reguladores de Transcrição , Estatura/genética , Perfilação da Expressão Gênica , Estudo de Associação Genômica Ampla , Humanos , Locos de Características Quantitativas , Software , Fatores de Transcrição/metabolismo
10.
Proc Natl Acad Sci U S A ; 117(9): 4864-4873, 2020 03 03.
Artigo em Inglês | MEDLINE | ID: mdl-32071206

RESUMO

In both Turner syndrome (TS) and Klinefelter syndrome (KS) copy number aberrations of the X chromosome lead to various developmental symptoms. We report a comparative analysis of TS vs. KS regarding differences at the genomic network level measured in primary samples by analyzing gene expression, DNA methylation, and chromatin conformation. X-chromosome inactivation (XCI) silences transcription from one X chromosome in female mammals, on which most genes are inactive, and some genes escape from XCI. In TS, almost all differentially expressed escape genes are down-regulated but most differentially expressed inactive genes are up-regulated. In KS, differentially expressed escape genes are up-regulated while the majority of inactive genes appear unchanged. Interestingly, 94 differentially expressed genes (DEGs) overlapped between TS and female and KS and male comparisons; and these almost uniformly display expression changes into opposite directions. DEGs on the X chromosome and the autosomes are coexpressed in both syndromes, indicating that there are molecular ripple effects of the changes in X chromosome dosage. Six potential candidate genes (RPS4X, SEPT6, NKRF, CX0rf57, NAA10, and FLNA) for KS are identified on Xq, as well as candidate central genes on Xp for TS. Only promoters of inactive genes are differentially methylated in both syndromes while escape gene promoters remain unchanged. The intrachromosomal contact map of the X chromosome in TS exhibits the structure of an active X chromosome. The discovery of shared DEGs indicates the existence of common molecular mechanisms for gene regulation in TS and KS that transmit the gene dosage changes to the transcriptome.


Assuntos
Dosagem de Genes , Regulação da Expressão Gênica , Genômica , Síndrome de Klinefelter/genética , Síndrome de Turner/genética , Cromossomo X , Animais , Cromatina/química , Cromossomos Humanos X , Metilação de DNA , Feminino , Filaminas , Humanos , Cariótipo , Masculino , Mamíferos/genética , Acetiltransferase N-Terminal A , Acetiltransferase N-Terminal E , Proteínas Serina-Treonina Quinases/genética , Receptor PAR-2 , Proteínas Repressoras/genética , Septinas , Transcriptoma/genética , Inativação do Cromossomo X
11.
Nucleic Acids Res ; 47(10): e60, 2019 06 04.
Artigo em Inglês | MEDLINE | ID: mdl-30869141

RESUMO

Interactions between regulatory elements are of crucial importance for the understanding of transcriptional regulation and the interpretation of disease mechanisms. Hi-C technique has been developed for genome-wide detection of chromatin contacts. However, unless extremely deep sequencing is performed on a very large number of input cells, which is technically limited and expensive, current Hi-C experiments do not have high enough resolution to resolve contacts between regulatory elements. Here, we develop DeepTACT, a bootstrapping deep learning model, to integrate genome sequences and chromatin accessibility data for the prediction of chromatin contacts between regulatory elements. DeepTACT can infer not only promoter-enhancer interactions, but also promoter-promoter interactions. In tests based on promoter capture Hi-C data, DeepTACT shows better performance over existing methods. DeepTACT analysis also identifies a class of hub promoters, which are correlated with transcriptional activation across cell lines, enriched in housekeeping genes, functionally related to fundamental biological processes, and capable of reflecting cell similarity. Finally, the utility of chromatin contacts in the study of human diseases is illustrated by the association of IFNA2 to coronary artery disease via an integrative analysis of GWAS data and interactions predicted by DeepTACT.


Assuntos
Algoritmos , Cromatina/genética , Biologia Computacional/métodos , Aprendizado Profundo , Regiões Promotoras Genéticas/genética , Sequências Reguladoras de Ácido Nucleico/genética , Células Cultivadas , Cromatina/metabolismo , Regulação da Expressão Gênica , Estudo de Associação Genômica Ampla , Sequenciamento de Nucleotídeos em Larga Escala , Humanos
12.
Proc Natl Acad Sci U S A ; 115(30): 7723-7728, 2018 07 24.
Artigo em Inglês | MEDLINE | ID: mdl-29987051

RESUMO

When different types of functional genomics data are generated on single cells from different samples of cells from the same heterogeneous population, the clustering of cells in the different samples should be coupled. We formulate this "coupled clustering" problem as an optimization problem and propose the method of coupled nonnegative matrix factorizations (coupled NMF) for its solution. The method is illustrated by the integrative analysis of single-cell RNA-sequencing (RNA-seq) and single-cell ATAC-sequencing (ATAC-seq) data.


Assuntos
Bases de Dados Genéticas , Modelos Genéticos , Análise de Sequência de RNA/métodos , Animais , Humanos
13.
Nucleic Acids Res ; 46(15): e89, 2018 09 06.
Artigo em Inglês | MEDLINE | ID: mdl-29897492

RESUMO

The detection of tumor-derived cell-free DNA in plasma is one of the most promising directions in cancer diagnosis. The major challenge in such an approach is how to identify the tiny amount of tumor DNAs out of total cell-free DNAs in blood. Here we propose an ultrasensitive cancer detection method, termed 'CancerDetector', using the DNA methylation profiles of cell-free DNAs. The key of our method is to probabilistically model the joint methylation states of multiple adjacent CpG sites on an individual sequencing read, in order to exploit the pervasive nature of DNA methylation for signal amplification. Therefore, CancerDetector can sensitively identify a trace amount of tumor cfDNAs in plasma, at the level of individual reads. We evaluated CancerDetector on the simulated data, and showed a high concordance of the predicted and true tumor fraction. Testing CancerDetector on real plasma data demonstrated its high sensitivity and specificity in detecting tumor cfDNAs. In addition, the predicted tumor fraction showed great consistency with tumor size and survival outcome. Note that all of those testing were performed on sequencing data at low to medium coverage (1× to 10×). Therefore, CancerDetector holds the great potential to detect cancer early and cost-effectively.


Assuntos
Algoritmos , Ácidos Nucleicos Livres/genética , Biologia Computacional/métodos , Metilação de DNA , Neoplasias/diagnóstico , Ácidos Nucleicos Livres/química , Ilhas de CpG/genética , DNA de Neoplasias/química , DNA de Neoplasias/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Neoplasias/sangue , Neoplasias/genética , Curva ROC , Reprodutibilidade dos Testes
14.
Proc Natl Acad Sci U S A ; 114(25): E4914-E4923, 2017 06 20.
Artigo em Inglês | MEDLINE | ID: mdl-28576882

RESUMO

The rapid increase of genome-wide datasets on gene expression, chromatin states, and transcription factor (TF) binding locations offers an exciting opportunity to interpret the information encoded in genomes and epigenomes. This task can be challenging as it requires joint modeling of context-specific activation of cis-regulatory elements (REs) and the effects on transcription of associated regulatory factors. To meet this challenge, we propose a statistical approach based on paired expression and chromatin accessibility (PECA) data across diverse cellular contexts. In our approach, we model (i) the localization to REs of chromatin regulators (CRs) based on their interaction with sequence-specific TFs, (ii) the activation of REs due to CRs that are localized to them, and (iii) the effect of TFs bound to activated REs on the transcription of target genes (TGs). The transcriptional regulatory network inferred by PECA provides a detailed view of how trans- and cis-regulatory elements work together to affect gene expression in a context-specific manner. We illustrate the feasibility of this approach by analyzing paired expression and accessibility data from the mouse Encyclopedia of DNA Elements (ENCODE) and explore various applications of the resulting model.


Assuntos
Cromatina/genética , Regulação da Expressão Gênica/genética , Redes Reguladoras de Genes/genética , Animais , Sítios de Ligação/genética , Montagem e Desmontagem da Cromatina/genética , Elementos Facilitadores Genéticos/genética , Humanos , Camundongos , Ligação Proteica/genética , Elementos Reguladores de Transcrição/genética , Fatores de Transcrição/genética
15.
Nucleic Acids Res ; 45(14): e132, 2017 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-28586438

RESUMO

Third generation sequencing (TGS) are highly promising technologies but the long and noisy reads from TGS are difficult to align using existing algorithms. Here, we present COSINE, a conceptually new method designed specifically for aligning long reads contaminated by a high level of errors. COSINE computes the context similarity of two stretches of nucleobases given the similarity over distributions of their short k-mers (k = 3-4) along the sequences. The results on simulated and real data show that COSINE achieves high sensitivity and specificity under a wide range of read accuracies. When the error rate is high, COSINE can offer substantial advantages over existing alignment methods.


Assuntos
Algoritmos , Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Software , Sequência de Bases , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Reprodutibilidade dos Testes
16.
Nucleic Acids Res ; 45(10): 5666-5677, 2017 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-28472398

RESUMO

Transcription factors (TFs) play crucial roles in regulating gene expression through interactions with specific DNA sequences. Recently, the sequence motif of almost 400 human TFs have been identified using high-throughput SELEX sequencing. However, there remain a large number of TFs (∼800) with no high-throughput-derived binding motifs. Computational methods capable of associating known motifs to such TFs will avoid tremendous experimental efforts and enable deeper understanding of transcriptional regulatory functions. We present a method to associate known motifs to TFs (MATLAB code is available in Supplementary Materials). Our method is based on a probabilistic framework that not only exploits DNA-binding domains and specificities, but also integrates open chromatin, gene expression and genomic data to accurately infer monomeric and homodimeric binding motifs. Our analysis resulted in the assignment of motifs to 200 TFs with no SELEX-derived motifs, roughly a 50% increase compared to the existing coverage.


Assuntos
Algoritmos , Cromatina/química , DNA/química , Regulação da Expressão Gênica , Modelos Estatísticos , Fatores de Transcrição/genética , Sítios de Ligação , Cromatina/metabolismo , DNA/genética , DNA/metabolismo , Genoma Humano , Humanos , Motivos de Nucleotídeos , Ligação Proteica , Técnica de Seleção de Aptâmeros , Fatores de Transcrição/metabolismo
17.
Proc Natl Acad Sci U S A ; 113(51): 14662-14667, 2016 12 20.
Artigo em Inglês | MEDLINE | ID: mdl-27930330

RESUMO

Dimension reduction methods are commonly applied to high-throughput biological datasets. However, the results can be hindered by confounding factors, either biological or technical in origin. In this study, we extend principal component analysis (PCA) to propose AC-PCA for simultaneous dimension reduction and adjustment for confounding (AC) variation. We show that AC-PCA can adjust for (i) variations across individual donors present in a human brain exon array dataset and (ii) variations of different species in a model organism ENCODE RNA sequencing dataset. Our approach is able to recover the anatomical structure of neocortical regions and to capture the shared variation among species during embryonic development. For gene selection purposes, we extend AC-PCA with sparsity constraints and propose and implement an efficient algorithm. The methods developed in this paper can also be applied to more general settings. The R package and MATLAB source code are available at https://github.com/linzx06/AC-PCA.


Assuntos
Encéfalo/metabolismo , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Componente Principal , Análise de Sequência de RNA , Algoritmos , Mapeamento Encefálico , Simulação por Computador , Interpretação Estatística de Dados , Éxons , Humanos , Modelos Estatísticos , Software , Transcriptoma
18.
PLoS Comput Biol ; 13(12): e1005875, 2017 12.
Artigo em Inglês | MEDLINE | ID: mdl-29281633

RESUMO

Mass cytometry (CyTOF) has greatly expanded the capability of cytometry. It is now easy to generate multiple CyTOF samples in a single study, with each sample containing single-cell measurement on 50 markers for more than hundreds of thousands of cells. Current methods do not adequately address the issues concerning combining multiple samples for subpopulation discovery, and these issues can be quickly and dramatically amplified with increasing number of samples. To overcome this limitation, we developed Partition-Assisted Clustering and Multiple Alignments of Networks (PAC-MAN) for the fast automatic identification of cell populations in CyTOF data closely matching that of expert manual-discovery, and for alignments between subpopulations across samples to define dataset-level cellular states. PAC-MAN is computationally efficient, allowing the management of very large CyTOF datasets, which are increasingly common in clinical studies and cancer studies that monitor various tissue samples for each subject.


Assuntos
Análise de Célula Única/estatística & dados numéricos , Animais , Biomarcadores/análise , Análise por Conglomerados , Biologia Computacional , Simulação por Computador , Interpretação Estatística de Dados , Bases de Dados Factuais , Citometria de Fluxo/estatística & dados numéricos , Expressão Gênica , Humanos , Camundongos
19.
Nucleic Acids Res ; 43(18): e116, 2015 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-26040699

RESUMO

We developed an innovative hybrid sequencing approach, IDP-fusion, to detect fusion genes, determine fusion sites and identify and quantify fusion isoforms. IDP-fusion is the first method to study gene fusion events by integrating Third Generation Sequencing long reads and Second Generation Sequencing short reads. We applied IDP-fusion to PacBio data and Illumina data from the MCF-7 breast cancer cells. Compared with the existing tools, IDP-fusion detects fusion genes at higher precision and a very low false positive rate. The results show that IDP-fusion will be useful for unraveling the complexity of multiple fusion splices and fusion isoforms within tumorigenesis-relevant fusion genes.


Assuntos
Carcinogênese/genética , Perfilação da Expressão Gênica , Fusão Gênica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Feminino , Humanos , Células MCF-7 , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Alinhamento de Sequência
20.
Proc Natl Acad Sci U S A ; 111(44): 15675-80, 2014 Nov 04.
Artigo em Inglês | MEDLINE | ID: mdl-25331876

RESUMO

We formulate a statistical model for the regulation of global gene expression by multiple regulatory programs and propose a thresholding singular value decomposition (T-SVD) regression method for learning such a model from data. Extensive simulations demonstrate that this method offers improved computational speed and higher sensitivity and specificity over competing approaches. The method is used to analyze microRNA (miRNA) and long noncoding RNA (lncRNA) data from The Cancer Genome Atlas (TCGA) consortium. The analysis yields previously unidentified insights into the combinatorial regulation of gene expression by noncoding RNAs, as well as findings that are supported by evidence from the literature.


Assuntos
Simulação por Computador , Regulação Neoplásica da Expressão Gênica , MicroRNAs/biossíntese , Neoplasias/metabolismo , RNA Neoplásico/biossíntese , Humanos , MicroRNAs/genética , Modelos Genéticos , Neoplasias/genética , RNA Neoplásico/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA