Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
2.
Patterns (N Y) ; 5(5): 100969, 2024 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-38800361

RESUMO

Understanding the cellular composition of a disease-related tissue is important in disease diagnosis, prognosis, and downstream treatment. Recent advances in single-cell RNA-sequencing (scRNA-seq) technique have allowed the measurement of gene expression profiles for individual cells. However, scRNA-seq is still too expensive to be used for large-scale population studies, and bulk RNA-seq is still widely used in such situations. An essential challenge is to deconvolve cellular composition for bulk RNA-seq data based on scRNA-seq data. Here, we present DeepDecon, a deep neural network model that leverages single-cell gene expression information to accurately predict the fraction of cancer cells in bulk tissues. It provides a refining strategy in which the cancer cell fraction is iteratively estimated by a set of trained models. When applied to simulated and real cancer data, DeepDecon exhibits superior performance compared to existing decomposition methods in terms of accuracy.

3.
PLoS Comput Biol ; 19(10): e1010608, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37844077

RESUMO

Heterogeneity in different genomic studies compromises the performance of machine learning models in cross-study phenotype predictions. Overcoming heterogeneity when incorporating different studies in terms of phenotype prediction is a challenging and critical step for developing machine learning algorithms with reproducible prediction performance on independent datasets. We investigated the best approaches to integrate different studies of the same type of omics data under a variety of different heterogeneities. We developed a comprehensive workflow to simulate a variety of different types of heterogeneity and evaluate the performances of different integration methods together with batch normalization by using ComBat. We also demonstrated the results through realistic applications on six colorectal cancer (CRC) metagenomic studies and six tuberculosis (TB) gene expression studies, respectively. We showed that heterogeneity in different genomic studies can markedly negatively impact the machine learning classifier's reproducibility. ComBat normalization improved the prediction performance of machine learning classifier when heterogeneous populations are present, and could successfully remove batch effects within the same population. We also showed that the machine learning classifier's prediction accuracy can be markedly decreased as the underlying disease model became more different in training and test populations. Comparing different merging and integration methods, we found that merging and integration methods can outperform each other in different scenarios. In the realistic applications, we observed that the prediction accuracy improved when applying ComBat normalization with merging or integration methods in both CRC and TB studies. We illustrated that batch normalization is essential for mitigating both population differences of different studies and batch effects. We also showed that both merging strategy and integration methods can achieve good performances when combined with batch normalization. In addition, we explored the potential of boosting phenotype prediction performance by rank aggregation methods and showed that rank aggregation methods had similar performance as other ensemble learning approaches.


Assuntos
Algoritmos , Aprendizado de Máquina , Reprodutibilidade dos Testes , Genômica , Fenótipo
4.
Nat Commun ; 13(1): 5566, 2022 09 29.
Artigo em Inglês | MEDLINE | ID: mdl-36175411

RESUMO

Early cancer detection by cell-free DNA faces multiple challenges: low fraction of tumor cell-free DNA, molecular heterogeneity of cancer, and sample sizes that are not sufficient to reflect diverse patient populations. Here, we develop a cancer detection approach to address these challenges. It consists of an assay, cfMethyl-Seq, for cost-effective sequencing of the cell-free DNA methylome (with > 12-fold enrichment over whole genome bisulfite sequencing in CpG islands), and a computational method to extract methylation information and diagnose patients. Applying our approach to 408 colon, liver, lung, and stomach cancer patients and controls, at 97.9% specificity we achieve 80.7% and 74.5% sensitivity in detecting all-stage and early-stage cancer, and 89.1% and 85.0% accuracy for locating tissue-of-origin of all-stage and early-stage cancer, respectively. Our approach cost-effectively retains methylome profiles of cancer abnormalities, allowing us to learn new features and expand to other cancer types as training cohorts grow.


Assuntos
Ácidos Nucleicos Livres , Neoplasias Gástricas , Ácidos Nucleicos Livres/genética , Análise Custo-Benefício , Detecção Precoce de Câncer , Epigenoma , Humanos , Neoplasias Gástricas/diagnóstico , Neoplasias Gástricas/genética
5.
Front Cell Infect Microbiol ; 12: 918010, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35782128

RESUMO

The association of colorectal cancer (CRC) and the human gut microbiome dysbiosis has been the focus of several studies in the past. Many bacterial taxa have been shown to have differential abundance among CRC patients compared to healthy controls. However, the relationship between CRC and non-bacterial gut microbiome such as the gut virome is under-studied and not well understood. In this study we conducted a comprehensive analysis of the association of viral abundances with CRC using metagenomic shotgun sequencing data of 462 CRC subjects and 449 healthy controls from 7 studies performed in 8 different countries. Despite the high heterogeneity, our results showed that the virome alpha diversity was consistently higher in CRC patients than in healthy controls (p-value <0.001). This finding is in sharp contrast to previous reports of low alpha diversity of prokaryotes in CRC compared to healthy controls. In addition to the previously known association of Podoviridae, Siphoviridae and Myoviridae with CRC, we further demonstrate that Herelleviridae, a newly constructed viral family, is significantly depleted in CRC subjects. Our interkingdom association analysis reveals a less intertwined correlation between the gut virome and bacteriome in CRC compared to healthy controls. Furthermore, we show that the viral abundance profiles can be used to accurately predict CRC disease status (AUROC >0.8) in both within-study and cross-study settings. The combination of training sets resulted in rather generalized and accurate prediction models. Our study clearly shows that subjects with colorectal cancer harbor a distinct human gut virome profile which may have an important role in this disease.


Assuntos
Bacteriófagos , Neoplasias Colorretais , Siphoviridae , Bacteriófagos/genética , Humanos , Metagenoma , Metagenômica
6.
Cancers (Basel) ; 14(12)2022 Jun 10.
Artigo em Inglês | MEDLINE | ID: mdl-35740540

RESUMO

Currently, most neuroblastoma patients are treated according to the Children's Oncology Group (COG) risk group assignment; however, neuroblastoma's heterogeneity renders only a few predictors for treatment response, resulting in excessive treatment. Here, we sought to couple COG risk classification with tumor intracellular microbiome, which is part of the molecular signature of a tumor. We determine that an intra-tumor microbial gene abundance score, namely M-score, separates the high COG-risk patients into two subpopulations (Mhigh and Mlow) with higher accuracy in risk stratification than the current COG risk assessment, thus sparing a subset of high COG-risk patients from being subjected to traditional high-risk therapies. Mechanistically, the classification power of M-scores implies the effect of CREB over-activation, which may influence the critical genes involved in cellular proliferation, anti-apoptosis, and angiogenesis, affecting tumor cell proliferation survival and metastasis. Thus, intracellular microbiota abundance in neuroblastoma regulates intracellular signals to affect patients' survival.

7.
Virol J ; 19(1): 114, 2022 06 28.
Artigo em Inglês | MEDLINE | ID: mdl-35765099

RESUMO

BACKGROUND: Chronic infection with hepatitis B virus (HBV) has been proved highly associated with the development of hepatocellular carcinoma (HCC). AIMS: The purpose of the study is to investigate the association between HBV preS region quasispecies and HCC development, as well as to develop HCC diagnosis model using HBV preS region quasispecies. METHODS: A total of 104 chronic hepatitis B (CHB) patients and 117 HBV-related HCC patients were enrolled. HBV preS region was sequenced using next generation sequencing (NGS) and the nucleotide entropy was calculated for quasispecies evaluation. Sparse logistic regression (SLR) was used to predict HCC development and prediction performances were evaluated using receiver operating characteristic curves. RESULTS: Entropy of HBV preS1, preS2 regions and several nucleotide points showed significant divergence between CHB and HCC patients. Using SLR, the classification of HCC/CHB groups achieved a mean area under the receiver operating characteristic curve (AUC) of 0.883 in the training data and 0.795 in the test data. The prediction model was also validated by a completely independent dataset from Hong Kong. The 10 selected nucleotide positions showed significantly different entropy between CHB and HCC patients. The HBV quasispecies also classified three clinical parameters, including HBeAg, HBVDNA, and Alkaline phosphatase (ALP) with the AUC value greater than 0.6 in the test data. CONCLUSIONS: Using NGS and SLR, the association between HBV preS region nucleotide entropy and HCC development was validated in our study and this could promote the understanding of HCC progression mechanism.


Assuntos
Carcinoma Hepatocelular , Neoplasias Hepáticas , Antígenos de Superfície da Hepatite B/genética , Vírus da Hepatite B/genética , Humanos , Modelos Logísticos , Nucleotídeos , Quase-Espécies
8.
Synth Syst Biotechnol ; 7(1): 574-585, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35155839

RESUMO

Dysfunction of microbial communities in various human body sites has been shown to be associated with a variety of diseases raising the possibility of predicting diseases based on metagenomic samples. Although many studies have investigated this problem, there are no consensus on the optimal approaches for predicting disease status based on metagenomic samples. Using six human gut metagenomic datasets consisting of large numbers of colorectal cancer patients and healthy controls from different countries, we investigated different software packages for extracting relative abundances of known microbial genomes and for integrating mapping and assembly approaches to obtain the relative abundance profiles of both known and novel genomes. The random forests (RF) classification algorithm was then used to predict colorectal cancer status based on the microbial relative abundance profiles. Based on within data cross-validation and cross-dataset prediction, we show that the RF prediction performance using the microbial relative abundance profiles estimated by Centrifuge is generally higher than that using the microbial relative abundance profiles estimated by MetaPhlAn2 and Bracken. We also develop a novel method to integrate the relative abundance profiles of both known and novel microbial organisms to further increase the prediction performance for colorectal cancer from metagenomes.

9.
Sci Rep ; 10(1): 10, 2020 01 30.
Artigo em Inglês | MEDLINE | ID: mdl-32001736

RESUMO

Brain age is a metric that quantifies the degree of aging of a brain based on whole-brain anatomical characteristics. While associations between individual human brain regions and environmental or genetic factors have been investigated, how brain age is associated with those factors remains unclear. We investigated these associations using UK Biobank data. We first trained a statistical model for obtaining relative brain age (RBA), a metric describing a subject's brain age relative to peers, based on whole-brain anatomical measurements, from training set subjects (n = 5,193). We then applied this model to evaluation set subjects (n = 12,115) and tested the association of RBA with tobacco smoking, alcohol consumption, and genetic variants. We found that daily or almost daily consumption of tobacco and alcohol were both significantly associated with increased RBA (P < 0.001). We also found SNPs significantly associated with RBA (p-value < 5E-8). The SNP most significantly associated with RBA is located in MAPT gene. Our results suggest that both environmental and genetic factors are associated with structural brain aging.


Assuntos
Envelhecimento/efeitos dos fármacos , Consumo de Bebidas Alcoólicas/efeitos adversos , Encéfalo/anatomia & histologia , Polimorfismo de Nucleotídeo Único/genética , Fumar/efeitos adversos , Idoso , Idoso de 80 Anos ou mais , Envelhecimento/genética , Bancos de Espécimes Biológicos , Encéfalo/diagnóstico por imagem , Encéfalo/crescimento & desenvolvimento , Cognição/efeitos dos fármacos , Feminino , Humanos , Imageamento por Ressonância Magnética , Masculino , Pessoa de Meia-Idade , Neuroimagem , Reino Unido , Proteínas tau/genética
10.
Quant Biol ; 8(1): 64-77, 2020 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-34084563

RESUMO

BACKGROUND: The recent development of metagenomic sequencing makes it possible to massively sequence microbial genomes including viral genomes without the need for laboratory culture. Existing reference-based and gene homology-based methods are not efficient in identifying unknown viruses or short viral sequences from metagenomic data. METHODS: Here we developed a reference-free and alignment-free machine learning method, DeepVirFinder, for identifying viral sequences in metagenomic data using deep learning. RESULTS: Trained based on sequences from viral RefSeq discovered before May 2015, and evaluated on those discovered after that date, DeepVirFinder outperformed the state-of-the-art method VirFinder at all contig lengths, achieving AUROC 0.93, 0.95, 0.97, and 0.98 for 300, 500, 1000, and 3000 bp sequences respectively. Enlarging the training data with additional millions of purified viral sequences from metavirome samples further improved the accuracy for identifying virus groups that are under-represented. Applying DeepVirFinder to real human gut metagenomic samples, we identified 51,138 viral sequences belonging to 175 bins in patients with colorectal carcinoma (CRC). Ten bins were found associated with the cancer status, suggesting viruses may play important roles in CRC. CONCLUSIONS: Powered by deep learning and high throughput sequencing metagenomic data, DeepVirFinder significantly improved the accuracy of viral identification and will assist the study of viruses in the era of metagenomics.

11.
Genome Biol ; 20(1): 154, 2019 08 06.
Artigo em Inglês | MEDLINE | ID: mdl-31387630

RESUMO

We develop a metagenomic data analysis pipeline, MicroPro, that takes into account all reads from known and unknown microbial organisms and associates viruses with complex diseases. We utilize MicroPro to analyze four metagenomic datasets relating to colorectal cancer, type 2 diabetes, and liver cirrhosis and show that including reads from unknown organisms significantly increases the prediction accuracy of the disease status for three of the four datasets. We identify new microbial organisms associated with these diseases and show viruses play important prediction roles in colorectal cancer and liver cirrhosis, but not in type 2 diabetes. MicroPro is freely available at https://github.com/zifanzhu/MicroPro .


Assuntos
Doença , Metagenômica/métodos , Microbiota/genética , Software , Fenômenos Fisiológicos Virais , Neoplasias Colorretais/virologia , Diabetes Mellitus Tipo 2/virologia , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Cirrose Hepática/virologia
12.
PLoS Genet ; 14(2): e1007206, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29474353

RESUMO

Hepatitis B virus (HBV) infection is a common problem in the world, especially in China. More than 60-80% of hepatocellular carcinoma (HCC) cases can be attributed to HBV infection in high HBV prevalent regions. Although traditional Sanger sequencing has been extensively used to investigate HBV sequences, NGS is becoming more commonly used. Further, it is unknown whether word pattern frequencies of HBV reads by Next Generation Sequencing (NGS) can be used to investigate HBV genotypes and predict HCC status. In this study, we used NGS to sequence the pre-S region of the HBV sequence of 94 HCC patients and 45 chronic HBV (CHB) infected individuals. Word pattern frequencies among the sequence data of all individuals were calculated and compared using the Manhattan distance. The individuals were grouped using principal coordinate analysis (PCoA) and hierarchical clustering. Word pattern frequencies were also used to build prediction models for HCC status using both K-nearest neighbors (KNN) and support vector machine (SVM). We showed the extremely high power of analyzing HBV sequences using word patterns. Our key findings include that the first principal coordinate of the PCoA analysis was highly associated with the fraction of genotype B (or C) sequences and the second principal coordinate was significantly associated with the probability of having HCC. Hierarchical clustering first groups the individuals according to their major genotypes followed by their HCC status. Using cross-validation, high area under the receiver operational characteristic curve (AUC) of around 0.88 for KNN and 0.92 for SVM were obtained. In the independent data set of 46 HCC patients and 31 CHB individuals, a good AUC score of 0.77 was obtained using SVM. It was further shown that 3000 reads for each individual can yield stable prediction results for SVM. Thus, another key finding is that word patterns can be used to predict HCC status with high accuracy. Therefore, our study shows clearly that word pattern frequencies of HBV sequences contain much information about the composition of different HBV genotypes and the HCC status of an individual.


Assuntos
Carcinoma Hepatocelular/virologia , Heterogeneidade Genética , Antígenos de Superfície da Hepatite B/genética , Vírus da Hepatite B/genética , Hepatite B Crônica/virologia , Neoplasias Hepáticas/virologia , Carcinoma Hepatocelular/epidemiologia , Carcinoma Hepatocelular/genética , Impressões Digitais de DNA , DNA Viral/análise , Frequência do Gene , Estudos de Associação Genética/métodos , Genótipo , Vírus da Hepatite B/classificação , Hepatite B Crônica/complicações , Hepatite B Crônica/epidemiologia , Hepatite B Crônica/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Neoplasias Hepáticas/epidemiologia , Neoplasias Hepáticas/genética , Filogenia , Precursores de Proteínas/genética
13.
BioData Min ; 10: 39, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29270229

RESUMO

BACKGROUND: Feature selection and prediction are the most important tasks for big data mining. The common strategies for feature selection in big data mining are L1, SCAD and MC+. However, none of the existing algorithms optimizes L0, which penalizes the number of nonzero features directly. RESULTS: In this paper, we develop a novel sparse generalized linear model (GLM) with L0 approximation for feature selection and prediction with big omics data. The proposed approach approximate the L0 optimization directly. Even though the original L0 problem is non-convex, the problem is approximated by sequential convex optimizations with the proposed algorithm. The proposed method is easy to implement with only several lines of code. Novel adaptive ridge algorithms (L0ADRIDGE) for L0 penalized GLM with ultra high dimensional big data are developed. The proposed approach outperforms the other cutting edge regularization methods including SCAD and MC+ in simulations. When it is applied to integrated analysis of mRNA, microRNA, and methylation data from TCGA ovarian cancer, multilevel gene signatures associated with suboptimal debulking are identified simultaneously. The biological significance and potential clinical importance of those genes are further explored. CONCLUSIONS: The developed Software L0ADRIDGE in MATLAB is available at https://github.com/liuzqx/L0adridge.

14.
J Gen Virol ; 98(11): 2748-2758, 2017 11.
Artigo em Inglês | MEDLINE | ID: mdl-29022863

RESUMO

In order to investigate if deletion patterns of the preS region can predict liver disease advancement, the preS region of the hepatitis B virus (HBV) genome in 45 chronic hepatitis B (CHB) and 94 HBV-related hepatocellular carcinoma (HCC) patients was sequenced by next-generation sequencing (NGS) and the percentages of nucleotide deletion in the preS region were analysed. Hierarchical clustering and heatmaps based on deletion percentages of preS revealed different deletion patterns between CHB and HCC patients. Intergenotype comparison also indicated divergence in preS deletions between HBV genotype B and C. No significant difference was found in preS deletion patterns between sera and matched adjacent non-tumour tissues. Based on hierarchical clustering, HCC patients were classed into two groups with different preS deletion patterns and different clinical features. Finally, the support vector machine (SVM) model was trained on preS nucleotide deletion percentages and used to predict HCC versus CHB patients. The prediction performance was assessed with fivefold cross-validation and independent cohort validation. The median area under the curve (AUC) was 0.729 after repeating SVM 500 times with fivefold cross-validations. After parameter optimization, the SVM model was used to predict an independent cohort with 51 CHB patients and 72 HCC patients and the AUC was 0.727. In conclusion, the use of the NGS method revealed a prominent divergence in preS deletion patterns between disease groups and virus genotypes, but not between different tissue types. Quantitative NGS data combined with a machine learning method could be a powerful approach for prediction of the status of different diseases.


Assuntos
Carcinoma Hepatocelular/virologia , Antígenos de Superfície da Hepatite B/genética , Vírus da Hepatite B/genética , Hepatite B Crônica/virologia , Polimorfismo Genético , Deleção de Sequência , Adulto , Biologia Computacional , Feminino , Genoma Viral , Genótipo , Vírus da Hepatite B/classificação , Vírus da Hepatite B/isolamento & purificação , Hepatite B Crônica/complicações , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Técnicas de Diagnóstico Molecular
15.
Genome Biol ; 18(1): 53, 2017 03 24.
Artigo em Inglês | MEDLINE | ID: mdl-28335812

RESUMO

We propose a probabilistic method, CancerLocator, which exploits the diagnostic potential of cell-free DNA by determining not only the presence but also the location of tumors. CancerLocator simultaneously infers the proportions and the tissue-of-origin of tumor-derived cell-free DNA in a blood sample using genome-wide DNA methylation data. CancerLocator outperforms two established multi-class classification methods on simulations and real data, even with the low proportion of tumor-derived DNA in the cell-free DNA scenarios. CancerLocator also achieves promising results on patient plasma samples with low DNA methylation sequencing coverage.


Assuntos
Metilação de DNA , Epigênese Genética , Epigenômica/métodos , Neoplasias/diagnóstico , Neoplasias/genética , Algoritmos , Simulação por Computador , Ilhas de CpG , DNA de Neoplasias/sangue , DNA de Neoplasias/genética , Humanos , Modelos Estatísticos , Reprodutibilidade dos Testes , Fluxo de Trabalho
16.
BMC Syst Biol ; 10 Suppl 1: 4, 2016 Jan 11.
Artigo em Inglês | MEDLINE | ID: mdl-26818594

RESUMO

BACKGROUND: Protein domains can be viewed as portable units of biological function that defines the functional properties of proteins. Therefore, if a protein is associated with a disease, protein domains might also be associated and define disease endophenotypes. However, knowledge about such domain-disease relationships is rarely available. Thus, identification of domains associated with human diseases would greatly improve our understanding of the mechanism of human complex diseases and further improve the prevention, diagnosis and treatment of these diseases. METHODS: Based on phenotypic similarities among diseases, we first group diseases into overlapping modules. We then develop a framework to infer associations between domains and diseases through known relationships between diseases and modules, domains and proteins, as well as proteins and disease modules. Different methods including Association, Maximum likelihood estimation (MLE), Domain-disease pair exclusion analysis (DPEA), Bayesian, and Parsimonious explanation (PE) approaches are developed to predict domain-disease associations. RESULTS: We demonstrate the effectiveness of all the five approaches via a series of validation experiments, and show the robustness of the MLE, Bayesian and PE approaches to the involved parameters. We also study the effects of disease modularization in inferring novel domain-disease associations. Through validation, the AUC (Area Under the operating characteristic Curve) scores for Bayesian, MLE, DPEA, PE, and Association approaches are 0.86, 0.84, 0.83, 0.83 and 0.79, respectively, indicating the usefulness of these approaches for predicting domain-disease relationships. Finally, we choose the Bayesian approach to infer domains associated with two common diseases, Crohn's disease and type 2 diabetes. CONCLUSIONS: The Bayesian approach has the best performance for the inference of domain-disease relationships. The predicted landscape between domains and diseases provides a more detailed view about the disease mechanisms.


Assuntos
Doença de Crohn/genética , Diabetes Mellitus Tipo 2/genética , Domínios Proteicos , Área Sob a Curva , Teorema de Bayes , Doença/genética , Estudos de Associação Genética , Funções Verossimilhança , Fenótipo
17.
BMC Syst Biol ; 5: 55, 2011 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-21504591

RESUMO

BACKGROUND: Domains are basic units of proteins, and thus exploring associations between protein domains and human inherited diseases will greatly improve our understanding of the pathogenesis of human complex diseases and further benefit the medical prevention, diagnosis and treatment of these diseases. Within a given domain-domain interaction network, we make the assumption that similarities of disease phenotypes can be explained using proximities of domains associated with such diseases. Based on this assumption, we propose a Bayesian regression approach named "domainRBF" (domain Rank with Bayes Factor) to prioritize candidate domains for human complex diseases. RESULTS: Using a compiled dataset containing 1,614 associations between 671 domains and 1,145 disease phenotypes, we demonstrate the effectiveness of the proposed approach through three large-scale leave-one-out cross-validation experiments (random control, simulated linkage interval, and genome-wide scan), and we do so in terms of three criteria (precision, mean rank ratio, and AUC score). We further show that the proposed approach is robust to the parameters involved and the underlying domain-domain interaction network through a series of permutation tests. Once having assessed the validity of this approach, we show the possibility of ab initio inference of domain-disease associations and gene-disease associations, and we illustrate the strong agreement between our inferences and the evidences from genome-wide association studies for four common diseases (type 1 diabetes, type 2 diabetes, Crohn's disease, and breast cancer). Finally, we provide a pre-calculated genome-wide landscape of associations between 5,490 protein domains and 5,080 human diseases and offer free access to this resource. CONCLUSIONS: The proposed approach effectively ranks susceptible domains among the top of the candidates, and it is robust to the parameters involved. The ab initio inference of domain-disease associations shows strong agreement with the evidence provided by genome-wide association studies. The predicted landscape provides a comprehensive understanding of associations between domains and human diseases.


Assuntos
Teorema de Bayes , Biologia de Sistemas/métodos , Área Sob a Curva , Neoplasias da Mama/metabolismo , Doença de Crohn/metabolismo , Diabetes Mellitus Tipo 1/metabolismo , Diabetes Mellitus Tipo 2/metabolismo , Genoma , Estudo de Associação Genômica Ampla , Humanos , Modelos Estatísticos , Fenótipo , Mapeamento de Interação de Proteínas , Estrutura Terciária de Proteína , Análise de Regressão
18.
Genome Biol ; 9(12): R174, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-19087245

RESUMO

We have developed a global strategy based on the Bayesian network framework to prioritize the functional modules mediating genetic perturbations and their phenotypic effects among a set of overlapping candidate modules. We take lethality in Saccharomyces cerevisiae and human cancer as two examples to show the effectiveness of this approach. We discovered that lethality is more conserved at the module level than at the gene level and we identified several potentially 'new' cancer-related biological processes.


Assuntos
Biologia Computacional/métodos , Neoplasias/genética , Saccharomyces cerevisiae/genética , Teorema de Bayes , Reparo do DNA , Genes Letais , Humanos , Modelos Biológicos
19.
BMC Genomics ; 9: 623, 2008 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-19099599

RESUMO

BACKGROUND: Increasing evidence shows that whole genomes of eukaryotes are almost entirely transcribed into both protein coding genes and an enormous number of non-protein-coding RNAs (ncRNAs). Therefore, revealing the underlying regulatory mechanisms of transcripts becomes imperative. However, for a complete understanding of transcriptional regulatory mechanisms, we need to identify the regions in which they are found. We will call these transcriptional regulation regions, or TRRs, which can be considered functional regions containing a cluster of regulatory elements that cooperatively recruit transcriptional factors for binding and then regulating the expression of transcripts. RESULTS: We constructed a hierarchical stochastic language (HSL) model for the identification of core TRRs in yeast based on regulatory cooperation among TRR elements. The HSL model trained based on yeast achieved comparable accuracy in predicting TRRs in other species, e.g., fruit fly, human, and rice, thus demonstrating the conservation of TRRs across species. The HSL model was also used to identify the TRRs of genes, such as p53 or OsALYL1, as well as microRNAs. In addition, the ENCODE regions were examined by HSL, and TRRs were found to pervasively locate in the genomes. CONCLUSION: Our findings indicate that 1) the HSL model can be used to accurately predict core TRRs of transcripts across species and 2) identified core TRRs by HSL are proper candidates for the further scrutiny of specific regulatory elements and mechanisms. Meanwhile, the regulatory activity taking place in the abundant numbers of ncRNAs might account for the ubiquitous presence of TRRs across the genome. In addition, we also found that the TRRs of protein coding genes and ncRNAs are similar in structure, with the latter being more conserved than the former.


Assuntos
Elementos Reguladores de Transcrição , Saccharomyces cerevisiae/genética , Transcrição Gênica , Animais , Sequência Conservada/genética , Células Eucarióticas/metabolismo , Regulação da Expressão Gênica , Genoma Fúngico , Humanos , RNA não Traduzido/genética , Especificidade da Espécie , Proteína Supressora de Tumor p53/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA