Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 59
Filtrar
1.
Proc Natl Acad Sci U S A ; 120(28): e2305236120, 2023 07 11.
Artigo em Inglês | MEDLINE | ID: mdl-37399400

RESUMO

Plasma cell-free DNA (cfDNA) is a noninvasive biomarker for cell death of all organs. Deciphering the tissue origin of cfDNA can reveal abnormal cell death because of diseases, which has great clinical potential in disease detection and monitoring. Despite the great promise, the sensitive and accurate quantification of tissue-derived cfDNA remains challenging to existing methods due to the limited characterization of tissue methylation and the reliance on unsupervised methods. To fully exploit the clinical potential of tissue-derived cfDNA, here we present one of the largest comprehensive and high-resolution methylation atlas based on 521 noncancer tissue samples spanning 29 major types of human tissues. We systematically identified fragment-level tissue-specific methylation patterns and extensively validated them in orthogonal datasets. Based on the rich tissue methylation atlas, we develop the first supervised tissue deconvolution approach, a deep-learning-powered model, cfSort, for sensitive and accurate tissue deconvolution in cfDNA. On the benchmarking data, cfSort showed superior sensitivity and accuracy compared to the existing methods. We further demonstrated the clinical utilities of cfSort with two potential applications: aiding disease diagnosis and monitoring treatment side effects. The tissue-derived cfDNA fraction estimated from cfSort reflected the clinical outcomes of the patients. In summary, the tissue methylation atlas and cfSort enhanced the performance of tissue deconvolution in cfDNA, thus facilitating cfDNA-based disease detection and longitudinal treatment monitoring.


Assuntos
Ácidos Nucleicos Livres , Aprendizado Profundo , Humanos , Ácidos Nucleicos Livres/genética , Metilação de DNA , Biomarcadores , Regiões Promotoras Genéticas , Biomarcadores Tumorais/genética
2.
Nat Methods ; 19(8): 938-949, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35817938

RESUMO

A multitude of sequencing-based and microscopy technologies provide the means to unravel the relationship between the three-dimensional organization of genomes and key regulatory processes of genome function. Here, we develop a multimodal data integration approach to produce populations of single-cell genome structures that are highly predictive for nuclear locations of genes and nuclear bodies, local chromatin compaction and spatial segregation of functionally related chromatin. We demonstrate that multimodal data integration can compensate for systematic errors in some of the data and can greatly increase accuracy and coverage of genome structure models. We also show that alternative combinations of different orthogonal data sources can converge to models with similar predictive power. Moreover, our study reveals the key contributions of low-frequency ('rare') interchromosomal contacts to accurately predicting the global nuclear architecture, including the positioning of genes and chromosomes. Overall, our results highlight the benefits of multimodal data integration for genome structure analysis, available through the Integrative Genome Modeling software package.


Assuntos
Cromatina , Cromossomos , Núcleo Celular , Cromatina/genética , Cromossomos/genética , Genoma
3.
Hepatology ; 77(3): 774-788, 2023 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-35908246

RESUMO

BACKGROUND AND AIMS: The sensitivity of current surveillance methods for detecting early-stage hepatocellular carcinoma (HCC) is suboptimal. Extracellular vesicles (EVs) are promising circulating biomarkers for early cancer detection. In this study, we aim to develop an HCC EV-based surface protein assay for early detection of HCC. APPROACH AND RESULTS: Tissue microarray was used to evaluate four potential HCC-associated protein markers. An HCC EV surface protein assay, composed of covalent chemistry-mediated HCC EV purification and real-time immuno-polymerase chain reaction readouts, was developed and optimized for quantifying subpopulations of EVs. An HCC EV ECG score, calculated from the readouts of three HCC EV subpopulations ( E pCAM + CD63 + , C D147 + CD63 + , and G PC3 + CD63 + HCC EVs), was established for detecting early-stage HCC. A phase 2 biomarker study was conducted to evaluate the performance of ECG score in a training cohort ( n  = 106) and an independent validation cohort ( n  = 72).Overall, 99.7% of tissue microarray stained positive for at least one of the four HCC-associated protein markers (EpCAM, CD147, GPC3, and ASGPR1) that were subsequently validated in HCC EVs. In the training cohort, HCC EV ECG score demonstrated an area under the receiver operating curve (AUROC) of 0.95 (95% confidence interval [CI], 0.90-0.99) for distinguishing early-stage HCC from cirrhosis with a sensitivity of 91% and a specificity of 90%. The AUROCs of the HCC EV ECG score remained excellent in the validation cohort (0.93; 95% CI, 0.87-0.99) and in the subgroups by etiology (viral: 0.95; 95% CI, 0.90-1.00; nonviral: 0.94; 95% CI, 0.88-0.99). CONCLUSION: HCC EV ECG score demonstrated great potential for detecting early-stage HCC. It could augment current surveillance methods and improve patients' outcomes.


Assuntos
Carcinoma Hepatocelular , Vesículas Extracelulares , Neoplasias Hepáticas , Humanos , Carcinoma Hepatocelular/diagnóstico , Carcinoma Hepatocelular/patologia , Neoplasias Hepáticas/diagnóstico , Neoplasias Hepáticas/patologia , Biomarcadores Tumorais/análise , Vesículas Extracelulares/química , Proteínas de Membrana , Eletrocardiografia , Glipicanas
4.
Bioinformatics ; 35(17): 3127-3132, 2019 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-30668638

RESUMO

MOTIVATION: In recent years, several experimental studies have revealed that the microRNAs (miRNAs) in serum, plasma, exosome and whole blood are dysregulated in various types of diseases, indicating that the circulating miRNAs may serve as potential noninvasive biomarkers for disease diagnosis and prognosis. However, no database has been constructed to integrate the large-scale circulating miRNA profiles, explore the functional pathways involved and predict the potential biomarkers using feature selection between the disease conditions. Although there have been several studies attempting to generate a circulating miRNA database, they have not yet integrated the large-scale circulating miRNA profiles or provided the biomarker-selection function using machine learning methods. RESULTS: To fill this gap, we constructed the Circulating MicroRNA Expression Profiling (CMEP) database for integrating, analyzing and visualizing the large-scale expression profiles of phenotype-specific circulating miRNAs. The CMEP database contains massive datasets that were manually curated from NCBI GEO and the exRNA Atlas, including 66 datasets, 228 subsets and 10 419 samples. The CMEP provides the differential expression circulating miRNAs analysis and the KEGG functional pathway enrichment analysis. Furthermore, to provide the function of noninvasive biomarker discovery, we implemented several feature-selection methods, including ridge regression, lasso regression, support vector machine and random forests. Finally, we implemented a user-friendly web interface to improve the user experience and to visualize the data and results of CMEP. AVAILABILITY AND IMPLEMENTATION: CMEP is accessible at http://syslab5.nchu.edu.tw/CMEP.


Assuntos
Bases de Dados Factuais , Biomarcadores , MicroRNA Circulante , Exossomos , Perfilação da Expressão Gênica
5.
J Transl Med ; 18(1): 5, 2020 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-31906978

RESUMO

BACKGROUND: Sepsis remains a major challenge in intensive care units, causing unacceptably high mortality rates due to the lack of rapid diagnostic tools with sufficient sensitivity. Therefore, there is an urgent need to replace time-consuming blood cultures with a new method. Ideally, such a method also provides comprehensive profiling of pathogenic bacteria to facilitate the treatment decision. METHODS: We developed a Random Forest with balanced subsampling to screen for pathogenic bacteria and diagnose sepsis based on cell-free DNA (cfDNA) sequencing data in a small blood sample. In addition, we constructed a bacterial co-occurrence network, based on a set of normal and sepsis samples, to infer unobserved bacteria. RESULTS: Based solely on cfDNA sequencing information from three independent datasets of sepsis, we distinguish sepsis from healthy samples with a satisfactory performance. This strategy also provides comprehensive bacteria profiling, permitting doctors to choose the best treatment strategy for a sepsis case. CONCLUSIONS: The combination of sepsis identification and bacteria-inferring strategies is a success for noninvasive cfDNA-based diagnosis, which has the potential to greatly enhance efficiency in disease detection and provide a comprehensive understanding of pathogens. For comparison, where a culture-based analysis of pathogens takes up to 5 days and is effective for only a third to a half of patients, cfDNA sequencing can be completed in just 1 day and our method can identify the majority of pathogens in all patients.


Assuntos
Ácidos Nucleicos Livres , Sepse , Bactérias/genética , DNA Bacteriano/genética , Humanos , Unidades de Terapia Intensiva , Sepse/diagnóstico
6.
Nucleic Acids Res ; 46(15): e89, 2018 09 06.
Artigo em Inglês | MEDLINE | ID: mdl-29897492

RESUMO

The detection of tumor-derived cell-free DNA in plasma is one of the most promising directions in cancer diagnosis. The major challenge in such an approach is how to identify the tiny amount of tumor DNAs out of total cell-free DNAs in blood. Here we propose an ultrasensitive cancer detection method, termed 'CancerDetector', using the DNA methylation profiles of cell-free DNAs. The key of our method is to probabilistically model the joint methylation states of multiple adjacent CpG sites on an individual sequencing read, in order to exploit the pervasive nature of DNA methylation for signal amplification. Therefore, CancerDetector can sensitively identify a trace amount of tumor cfDNAs in plasma, at the level of individual reads. We evaluated CancerDetector on the simulated data, and showed a high concordance of the predicted and true tumor fraction. Testing CancerDetector on real plasma data demonstrated its high sensitivity and specificity in detecting tumor cfDNAs. In addition, the predicted tumor fraction showed great consistency with tumor size and survival outcome. Note that all of those testing were performed on sequencing data at low to medium coverage (1× to 10×). Therefore, CancerDetector holds the great potential to detect cancer early and cost-effectively.


Assuntos
Algoritmos , Ácidos Nucleicos Livres/genética , Biologia Computacional/métodos , Metilação de DNA , Neoplasias/diagnóstico , Ácidos Nucleicos Livres/química , Ilhas de CpG/genética , DNA de Neoplasias/química , DNA de Neoplasias/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Neoplasias/sangue , Neoplasias/genética , Curva ROC , Reprodutibilidade dos Testes
7.
J Cell Mol Med ; 23(1): 395-404, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30338927

RESUMO

The seasonal outbreaks of influenza infection cause globally respiratory illness, or even death in all age groups. Given early-warning signals preceding the influenza outbreak, timely intervention such as vaccination and isolation management effectively decrease the morbidity. However, it is usually a difficult task to achieve the real-time prediction of influenza outbreak due to its complexity intertwining both biological systems and social systems. By exploring rich dynamical and high-dimensional information, our dynamic network marker/biomarker (DNM/DNB) method opens a new way to identify the tipping point prior to the catastrophic transition into an influenza pandemics. In order to detect the early-warning signals before the influenza outbreak by applying DNM method, the historical information of clinic hospitalization caused by influenza infection between years 2009 and 2016 were extracted and assembled from public records of Tokyo and Hokkaido, Japan. The early-warning signal, with an average of 4-week window lead prior to each seasonal outbreak of influenza, was provided by DNM-based on the hospitalization records, providing an opportunity to apply proactive strategies to prevent or delay the onset of influenza outbreak. Moreover, the study on the dynamical changes of hospitalization in local district networks unveils the influenza transmission dynamics or landscape in network level.


Assuntos
Biomarcadores/metabolismo , Influenza Humana/diagnóstico , Surtos de Doenças , Progressão da Doença , Humanos , Influenza Humana/metabolismo
8.
Proc Natl Acad Sci U S A ; 113(12): E1663-72, 2016 Mar 22.
Artigo em Inglês | MEDLINE | ID: mdl-26951677

RESUMO

Conformation capture technologies (e.g., Hi-C) chart physical interactions between chromatin regions on a genome-wide scale. However, the structural variability of the genome between cells poses a great challenge to interpreting ensemble-averaged Hi-C data, particularly for long-range and interchromosomal interactions. Here, we present a probabilistic approach for deconvoluting Hi-C data into a model population of distinct diploid 3D genome structures, which facilitates the detection of chromatin interactions likely to co-occur in individual cells. Our approach incorporates the stochastic nature of chromosome conformations and allows a detailed analysis of alternative chromatin structure states. For example, we predict and experimentally confirm the presence of large centromere clusters with distinct chromosome compositions varying between individual cells. The stability of these clusters varies greatly with their chromosome identities. We show that these chromosome-specific clusters can play a key role in the overall chromosome positioning in the nucleus and stabilizing specific chromatin interactions. By explicitly considering genome structural variability, our population-based method provides an important tool for revealing novel insights into the key factors shaping the spatial genome organization.


Assuntos
Cromossomos/ultraestrutura , Imageamento Tridimensional/métodos , Metagenômica/métodos , Animais , Evolução Biológica , Linhagem Celular , Centrômero/ultraestrutura , Cromatina/genética , Cromatina/ultraestrutura , Posicionamento Cromossômico , Cromossomos/genética , Cromossomos Humanos/genética , Cromossomos Humanos/ultraestrutura , Diploide , Genoma Humano , Heterocromatina/ultraestrutura , Humanos , Hibridização in Situ Fluorescente , Funções Verossimilhança , Linfócitos/ultraestrutura , Primatas/genética , Análise de Célula Única , Processos Estocásticos , Tomografia por Raios X/métodos
9.
Nucleic Acids Res ; 44(D1): D944-51, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26602695

RESUMO

The genome-wide transcriptome profiling of cancerous and normal tissue samples can provide insights into the molecular mechanisms of cancer initiation and progression. RNA Sequencing (RNA-Seq) is a revolutionary tool that has been used extensively in cancer research. However, no existing RNA-Seq database provides all of the following features: (i) large-scale and comprehensive data archives and analyses, including coding-transcript profiling, long non-coding RNA (lncRNA) profiling and coexpression networks; (ii) phenotype-oriented data organization and searching and (iii) the visualization of expression profiles, differential expression and regulatory networks. We have constructed the first public database that meets these criteria, the Cancer RNA-Seq Nexus (CRN, http://syslab4.nchu.edu.tw/CRN). CRN has a user-friendly web interface designed to facilitate cancer research and personalized medicine. It is an open resource for intuitive data exploration, providing coding-transcript/lncRNA expression profiles to support researchers generating new hypotheses in cancer research and personalized medicine.


Assuntos
Bases de Dados Genéticas , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Neoplasias/genética , Humanos , Neoplasias/metabolismo , Fenótipo , RNA Longo não Codificante/metabolismo , RNA Mensageiro/metabolismo
10.
Nucleic Acids Res ; 44(7): e70, 2016 Apr 20.
Artigo em Inglês | MEDLINE | ID: mdl-26704975

RESUMO

Genome-wide proximity ligation assays allow the identification of chromatin contacts at unprecedented resolution. Several studies reveal that mammalian chromosomes are composed of topological domains (TDs) in sub-mega base resolution, which appear to be conserved across cell types and to some extent even between organisms. Identifying topological domains is now an important step toward understanding the structure and functions of spatial genome organization. However, current methods for TD identification demand extensive computational resources, require careful tuning and/or encounter inconsistencies in results. In this work, we propose an efficient and deterministic method, TopDom, to identify TDs, along with a set of statistical methods for evaluating their quality. TopDom is much more efficient than existing methods and depends on just one intuitive parameter, a window size, for which we provide easy-to-implement optimization guidelines. TopDom also identifies more and higher quality TDs than the popular directional index algorithm. The TDs identified by TopDom provide strong support for the cross-tissue TD conservation. Finally, our analysis reveals that the locations of housekeeping genes are closely associated with cross-tissue conserved TDs. The software package and source codes of TopDom are available athttp://zhoulab.usc.edu/TopDom/.


Assuntos
Cromatina/química , Genômica/métodos , Software , Animais , Linhagem Celular , Cromatina/metabolismo , Epigênese Genética , Genes Essenciais , Humanos , Camundongos
11.
Methods ; 93: 110-8, 2016 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-26238263

RESUMO

In past decades, the experimental determination of protein functions was expensive and time-consuming, so numerous computational methods were developed to speed up and guide the process. However, most of these methods predict protein functions at the gene level and do not consider the fact that protein isoforms (translated from alternatively spliced transcripts), not genes, are the actual function carriers. Now, high-throughput RNA-seq technology is providing unprecedented opportunities to unravel protein functions at the isoform level. In this article, we review recent progress in the high-resolution functional annotations of protein isoforms, focusing on two methods developed by the authors. Both methods can integrate multiple RNA-seq datasets for comprehensively characterizing functions of protein isoforms.


Assuntos
Fenômenos Fisiológicos Celulares/fisiologia , Bases de Dados Genéticas , Isoformas de Proteínas/fisiologia , Animais , Previsões , Humanos , RNA/fisiologia
12.
Nucleic Acids Res ; 43(2): 1268-82, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25567984

RESUMO

FOXP3 is a lineage-specific transcription factor that is required for regulatory T cell development and function. In this study, we determined the crystal structure of the FOXP3 forkhead domain bound to DNA. The structure reveals that FOXP3 can form a stable domain-swapped dimer to bridge DNA in the absence of cofactors, suggesting that FOXP3 may play a role in long-range gene interactions. To test this hypothesis, we used circular chromosome conformation capture coupled with high throughput sequencing (4C-seq) to analyze FOXP3-dependent genomic contacts around a known FOXP3-bound locus, Ptpn22. Our studies reveal that FOXP3 induces significant changes in the chromatin contacts between the Ptpn22 locus and other Foxp3-regulated genes, reflecting a mechanism by which FOXP3 reorganizes the genome architecture to coordinate the expression of its target genes. Our results suggest that FOXP3 mediates long-range chromatin interactions as part of its mechanisms to regulate specific gene expression in regulatory T cells.


Assuntos
Cromossomos/química , DNA/química , Fatores de Transcrição Forkhead/química , Animais , DNA/metabolismo , Fatores de Transcrição Forkhead/metabolismo , Regulação da Expressão Gênica , Humanos , Camundongos Endogâmicos C57BL , Modelos Moleculares , Multimerização Proteica , Estrutura Terciária de Proteína , Proteína Tirosina Fosfatase não Receptora Tipo 22/genética
13.
Bioinformatics ; 31(6): 960-2, 2015 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-25391400

RESUMO

UNLABELLED: Genome-wide proximity ligation assays, e.g. Hi-C and its variant TCC, have recently become important tools to study spatial genome organization. Removing biases from chromatin contact matrices generated by such techniques is a critical preprocessing step of subsequent analyses. The continuing decline of sequencing costs has led to an ever-improving resolution of the Hi-C data, resulting in very large matrices of chromatin contacts. Such large-size matrices, however, pose a great challenge on the memory usage and speed of its normalization. Therefore, there is an urgent need for fast and memory-efficient methods for normalization of Hi-C data. We developed Hi-Corrector, an easy-to-use, open source implementation of the Hi-C data normalization algorithm. Its salient features are (i) scalability-the software is capable of normalizing Hi-C data of any size in reasonable times; (ii) memory efficiency-the sequential version can run on any single computer with very limited memory, no matter how little; (iii) fast speed-the parallel version can run very fast on multiple computing nodes with limited local memory. AVAILABILITY AND IMPLEMENTATION: The sequential version is implemented in ANSI C and can be easily compiled on any system; the parallel version is implemented in ANSI C with the MPI library (a standardized and portable parallel environment designed for solving large-scale scientific problems). The package is freely available at http://zhoulab.usc.edu/Hi-Corrector/.


Assuntos
Algoritmos , Cromatina/genética , Cromossomos Humanos/genética , Genoma Humano , Software , Cromatina/química , Mapeamento Cromossômico , Cromossomos Humanos/química , Biblioteca Genômica , Humanos , Modelos Estatísticos
14.
Nucleic Acids Res ; 42(6): e39, 2014 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-24369432

RESUMO

Alternative transcript processing is an important mechanism for generating functional diversity in genes. However, little is known about the precise functions of individual isoforms. In fact, proteins (translated from transcript isoforms), not genes, are the function carriers. By integrating multiple human RNA-seq data sets, we carried out the first systematic prediction of isoform functions, enabling high-resolution functional annotation of human transcriptome. Unlike gene function prediction, isoform function prediction faces a unique challenge: the lack of the training data--all known functional annotations are at the gene level. To address this challenge, we modelled the gene-isoform relationships as multiple instance data and developed a novel label propagation method to predict functions. Our method achieved an average area under the receiver operating characteristic curve of 0.67 and assigned functions to 15 572 isoforms. Interestingly, we observed that different functions have different sensitivities to alternative isoform processing, and that the function diversity of isoforms from the same gene is positively correlated with their tissue expression diversity. Finally, we surveyed the literature to validate our predictions for a number of apoptotic genes. Strikingly, for the famous 'TP53' gene, we not only accurately identified the apoptosis regulation function of its five isoforms, but also correctly predicted the precise direction of the regulation.


Assuntos
Perfilação da Expressão Gênica , Anotação de Sequência Molecular , Isoformas de Proteínas/fisiologia , Análise de Sequência de RNA , Apoptose , Redes Reguladoras de Genes , Humanos , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Isoformas de RNA/metabolismo
15.
Nucleic Acids Res ; 42(Web Server issue): W137-46, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24895436

RESUMO

The DiseaseConnect (http://disease-connect.org) is a web server for analysis and visualization of a comprehensive knowledge on mechanism-based disease connectivity. The traditional disease classification system groups diseases with similar clinical symptoms and phenotypic traits. Thus, diseases with entirely different pathologies could be grouped together, leading to a similar treatment design. Such problems could be avoided if diseases were classified based on their molecular mechanisms. Connecting diseases with similar pathological mechanisms could inspire novel strategies on the effective repositioning of existing drugs and therapies. Although there have been several studies attempting to generate disease connectivity networks, they have not yet utilized the enormous and rapidly growing public repositories of disease-related omics data and literature, two primary resources capable of providing insights into disease connections at an unprecedented level of detail. Our DiseaseConnect, the first public web server, integrates comprehensive omics and literature data, including a large amount of gene expression data, Genome-Wide Association Studies catalog, and text-mined knowledge, to discover disease-disease connectivity via common molecular mechanisms. Moreover, the clinical comorbidity data and a comprehensive compilation of known drug-disease relationships are additionally utilized for advancing the understanding of the disease landscape and for facilitating the mechanism-based development of new drug treatments.


Assuntos
Doença/genética , Software , Comorbidade , Tratamento Farmacológico , Expressão Gênica , Humanos , Internet , MicroRNAs/metabolismo , Polimorfismo de Nucleotídeo Único
16.
Methods ; 67(3): 313-24, 2014 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-24583115

RESUMO

Alternative splicing is an important gene regulatory mechanism that dramatically increases the complexity of the proteome. However, how alternative splicing is regulated and how transcription and splicing are coordinated are still poorly understood, and functions of transcript isoforms have been studied only in a few limited cases. Nowadays, RNA-seq technology provides an exceptional opportunity to study alternative splicing on genome-wide scales and in an unbiased manner. With the rapid accumulation of data in public repositories, new challenges arise from the urgent need to effectively integrate many different RNA-seq datasets for study alterative splicing. This paper discusses a set of advanced computational methods that can integrate and analyze many RNA-seq datasets to systematically identify splicing modules, unravel the coupling of transcription and splicing, and predict the functions of splicing isoforms on a genome-wide scale.


Assuntos
Processamento Alternativo , Análise de Sequência de RNA/métodos , Biologia Computacional , Interpretação Estatística de Dados , Conjuntos de Dados como Assunto
17.
Nucleic Acids Res ; 40(19): 9379-91, 2012 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-22879375

RESUMO

Recent technology has made it possible to simultaneously perform multi-platform genomic profiling (e.g. DNA methylation (DM) and gene expression (GE)) of biological samples, resulting in so-called 'multi-dimensional genomic data'. Such data provide unique opportunities to study the coordination between regulatory mechanisms on multiple levels. However, integrative analysis of multi-dimensional genomics data for the discovery of combinatorial patterns is currently lacking. Here, we adopt a joint matrix factorization technique to address this challenge. This method projects multiple types of genomic data onto a common coordinate system, in which heterogeneous variables weighted highly in the same projected direction form a multi-dimensional module (md-module). Genomic variables in such modules are characterized by significant correlations and likely functional associations. We applied this method to the DM, GE, and microRNA expression data of 385 ovarian cancer samples from the The Cancer Genome Atlas project. These md-modules revealed perturbed pathways that would have been overlooked with only a single type of data, uncovered associations between different layers of cellular activities and allowed the identification of clinically distinct patient subgroups. Our study provides an useful protocol for uncovering hidden patterns and their biological implications in multi-dimensional 'omic' data.


Assuntos
Genômica/métodos , Neoplasias/genética , Metilação de DNA , Feminino , Humanos , MicroRNAs/metabolismo , Neoplasias Ovarianas/genética , Neoplasias Ovarianas/metabolismo , Neoplasias Ovarianas/mortalidade , Análise de Sobrevida , Transcriptoma
18.
Nat Biotechnol ; 2024 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-38744947

RESUMO

Cancer immunotherapy with autologous chimeric antigen receptor (CAR) T cells faces challenges in manufacturing and patient selection that could be avoided by using 'off-the-shelf' products, such as allogeneic CAR natural killer T (AlloCAR-NKT) cells. Previously, we reported a system for differentiating human hematopoietic stem and progenitor cells into AlloCAR-NKT cells, but the use of three-dimensional culture and xenogeneic feeders precluded its clinical application. Here we describe a clinically guided method to differentiate and expand IL-15-enhanced AlloCAR-NKT cells with high yield and purity. We generated AlloCAR-NKT cells targeting seven cancers and, in a multiple myeloma model, demonstrated their antitumor efficacy, expansion and persistence. The cells also selectively depleted immunosuppressive cells in the tumor microenviroment and antagonized tumor immune evasion via triple targeting of CAR, TCR and NK receptors. They exhibited a stable hypoimmunogenic phenotype associated with epigenetic and signaling regulation and did not induce detectable graft versus host disease or cytokine release syndrome. These properties of AlloCAR-NKT cells support their potential for clinical translation.

19.
Bioinformatics ; 28(19): 2458-66, 2012 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-22863767

RESUMO

MOTIVATION: Eukaryotic gene expression (GE) is subjected to precisely coordinated multi-layer controls, across the levels of epigenetic, transcriptional and post-transcriptional regulations. Recently, the emerging multi-dimensional genomic dataset has provided unprecedented opportunities to study the cross-layer regulatory interplay. In these datasets, the same set of samples is profiled on several layers of genomic activities, e.g. copy number variation (CNV), DNA methylation (DM), GE and microRNA expression (ME). However, suitable analysis methods for such data are currently sparse. RESULTS: In this article, we introduced a sparse Multi-Block Partial Least Squares (sMBPLS) regression method to identify multi-dimensional regulatory modules from this new type of data. A multi-dimensional regulatory module contains sets of regulatory factors from different layers that are likely to jointly contribute to a local 'gene expression factory'. We demonstrated the performance of our method on the simulated data as well as on The Cancer Genomic Atlas Ovarian Cancer datasets including the CNV, DM, ME and GE data measured on 230 samples. We showed that majority of identified modules have significant functional and transcriptional enrichment, higher than that observed in modules identified using only a single type of genomic data. Our network analysis of the modules revealed that the CNV, DM and microRNA can have coupled impact on expression of important oncogenes and tumor suppressor genes. AVAILABILITY AND IMPLEMENTATION: The source code implemented by MATLAB is freely available at: http://zhoulab.usc.edu/sMBPLS/. CONTACT: xjzhou@usc.edu SUPPLEMENTARY INFORMATION: Supplementary material are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Variações do Número de Cópias de DNA , Metilação de DNA , Redes Reguladoras de Genes , Genômica/métodos , MicroRNAs/genética , Feminino , Regulação da Expressão Gênica , Humanos , MicroRNAs/metabolismo , Oncogenes , Neoplasias Ovarianas/genética , Análise de Regressão
20.
Proc Natl Acad Sci U S A ; 107(15): 6823-8, 2010 Apr 13.
Artigo em Inglês | MEDLINE | ID: mdl-20360561

RESUMO

The rapid accumulation of gene expression data has offered unprecedented opportunities to study human diseases. The National Center for Biotechnology Information Gene Expression Omnibus is currently the largest database that systematically documents the genome-wide molecular basis of diseases. However, thus far, this resource has been far from fully utilized. This paper describes the first study to transform public gene expression repositories into an automated disease diagnosis database. Particularly, we have developed a systematic framework, including a two-stage Bayesian learning approach, to achieve the diagnosis of one or multiple diseases for a query expression profile along a hierarchical disease taxonomy. Our approach, including standardizing cross-platform gene expression data and heterogeneous disease annotations, allows analyzing both sources of information in a unified probabilistic system. A high level of overall diagnostic accuracy was shown by cross validation. It was also demonstrated that the power of our method can increase significantly with the continued growth of public gene expression repositories. Finally, we showed how our disease diagnosis system can be used to characterize complex phenotypes and to construct a disease-drug connectivity map.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Acesso à Informação , Algoritmos , Inteligência Artificial , Teorema de Bayes , Bases de Dados Factuais , Bases de Dados Genéticas , Genômica/métodos , Humanos , Armazenamento e Recuperação da Informação , Análise de Sequência com Séries de Oligonucleotídeos , Reconhecimento Automatizado de Padrão/métodos , Análise de Regressão , Software , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA