Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38340090

RESUMO

MOTIVATION: Genome-wide association studies (GWAS) have enabled large-scale analysis of the role of genetic variants in human disease. Despite impressive methodological advances, subsequent clinical interpretation and application remains challenging when GWAS suffer from a lack of statistical power. In recent years, however, the use of information diffusion algorithms with molecular networks has led to fruitful insights on disease genes. RESULTS: We present an overview of the design choices and pitfalls that prove crucial in the application of network propagation methods to GWAS summary statistics. We highlight general trends from the literature, and present benchmark experiments to expand on these insights selecting as case study three diseases and five molecular networks. We verify that the use of gene-level scores based on GWAS P-values offers advantages over the selection of a set of 'seed' disease genes not weighted by the associated P-values if the GWAS summary statistics are of sufficient quality. Beyond that, the size and the density of the networks prove to be important factors for consideration. Finally, we explore several ensemble methods and show that combining multiple networks may improve the network propagation approach.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Humanos , Estudo de Associação Genômica Ampla/métodos , Algoritmos , Redes Reguladoras de Genes , Predisposição Genética para Doença
2.
Clin Lung Cancer ; 24(8): e311-e322, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37689579

RESUMO

PURPOSE: Non-small-cell lung cancer (NSCLC) shows a high incidence of brain metastases (BM). Early detection is crucial to improve clinical prospects. We trained and validated classifier models to identify patients with a high risk of developing BM, as they could potentially benefit from surveillance brain MRI. METHODS: Consecutive patients with an initial diagnosis of NSCLC from January 2011 to April 2019 and an in-house chest-CT scan (staging) were retrospectively recruited at a German lung cancer center. Brain imaging was performed at initial diagnosis and in case of neurological symptoms (follow-up). Subjects lost to follow-up or still alive without BM at the data cut-off point (12/2020) were excluded. Covariates included clinical and/or 3D-radiomics-features of the primary tumor from staging chest-CT. Four machine learning models for prediction (80/20 training) were compared. Gini Importance and SHAP were used as measures of importance; sensitivity, specificity, area under the precision-recall curve, and Matthew's Correlation Coefficient as evaluation metrics. RESULTS: Three hundred and ninety-five patients compromised the clinical cohort. Predictive models based on clinical features offered the best performance (tuned to maximize recall: sensitivity∼70%, specificity∼60%). Radiomics features failed to provide sufficient information, likely due to the heterogeneity of imaging data. Adenocarcinoma histology, lymph node invasion, and histological tumor grade were positively correlated with the prediction of BM, age, and squamous cell carcinoma histology were negatively correlated. A subgroup discovery analysis identified 2 candidate patient subpopulations appearing to present a higher risk of BM (female patients + adenocarcinoma histology, adenocarcinoma patients + no other distant metastases). CONCLUSION: Analysis of the importance of input features suggests that the models are learning the relevant relationships between clinical features/development of BM. A higher number of samples is to be prioritized to improve performance. Employed prospectively at initial diagnosis, such models can help select high-risk subgroups for surveillance brain MRI.


Assuntos
Adenocarcinoma , Neoplasias Encefálicas , Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Humanos , Feminino , Carcinoma Pulmonar de Células não Pequenas/patologia , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/patologia , Estudos Retrospectivos , Neoplasias Encefálicas/diagnóstico por imagem , Neoplasias Encefálicas/secundário , Aprendizado de Máquina
3.
Patterns (N Y) ; 4(9): 100830, 2023 Sep 08.
Artigo em Inglês | MEDLINE | ID: mdl-37720333

RESUMO

The black-box nature of most artificial intelligence (AI) models encourages the development of explainability methods to engender trust into the AI decision-making process. Such methods can be broadly categorized into two main types: post hoc explanations and inherently interpretable algorithms. We aimed at analyzing the possible associations between COVID-19 and the push of explainable AI (XAI) to the forefront of biomedical research. We automatically extracted from the PubMed database biomedical XAI studies related to concepts of causality or explainability and manually labeled 1,603 papers with respect to XAI categories. To compare the trends pre- and post-COVID-19, we fit a change point detection model and evaluated significant changes in publication rates. We show that the advent of COVID-19 in the beginning of 2020 could be the driving factor behind an increased focus concerning XAI, playing a crucial role in accelerating an already evolving trend. Finally, we present a discussion with future societal use and impact of XAI technologies and potential future directions for those who pursue fostering clinical trust with interpretable machine learning models.

4.
Nat Commun ; 14(1): 4750, 2023 08 07.
Artigo em Inglês | MEDLINE | ID: mdl-37550323

RESUMO

Epigenetic modifications are dynamic mechanisms involved in the regulation of gene expression. Unlike the DNA sequence, epigenetic patterns vary not only between individuals, but also between different cell types within an individual. Environmental factors, somatic mutations and ageing contribute to epigenetic changes that may constitute early hallmarks or causal factors of disease. Epigenetic modifications are reversible and thus promising therapeutic targets for precision medicine. However, mapping efforts to determine an individual's cell-type-specific epigenome are constrained by experimental costs and tissue accessibility. To address these challenges, we developed eDICE, an attention-based deep learning model that is trained to impute missing epigenomic tracks by conditioning on observed tracks. Using a recently published set of epigenomes from four individual donors, we show that transfer learning across individuals allows eDICE to successfully predict individual-specific epigenetic variation even in tissues that are unmapped in a given donor. These results highlight the potential of machine learning-based imputation methods to advance personalized epigenomics.


Assuntos
Epigênese Genética , Epigenômica , Humanos , Epigenômica/métodos , Aprendizado de Máquina , Epigenoma , Medicina de Precisão/métodos , Metilação de DNA/genética
5.
Genome Biol ; 24(1): 79, 2023 04 18.
Artigo em Inglês | MEDLINE | ID: mdl-37072822

RESUMO

A promising alternative to comprehensively performing genomics experiments is to, instead, perform a subset of experiments and use computational methods to impute the remainder. However, identifying the best imputation methods and what measures meaningfully evaluate performance are open questions. We address these questions by comprehensively analyzing 23 methods from the ENCODE Imputation Challenge. We find that imputation evaluations are challenging and confounded by distributional shifts from differences in data collection and processing over time, the amount of available data, and redundancy among performance measures. Our analyses suggest simple steps for overcoming these issues and promising directions for more robust research.


Assuntos
Algoritmos , Epigenômica , Genômica/métodos
6.
Cell Mol Gastroenterol Hepatol ; 15(6): 1391-1419, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36868311

RESUMO

BACKGROUND & AIMS: Patient-derived organoid cancer models are generated from epithelial tumor cells and reflect tumor characteristics. However, they lack the complexity of the tumor microenvironment, which is a key driver of tumorigenesis and therapy response. Here, we developed a colorectal cancer organoid model that incorporates matched epithelial cells and stromal fibroblasts. METHODS: Primary fibroblasts and tumor cells were isolated from colorectal cancer specimens. Fibroblasts were characterized for their proteome, secretome, and gene expression signatures. Fibroblast/organoid co-cultures were analyzed by immunohistochemistry and compared with their tissue of origin, as well as on gene expression levels compared with standard organoid models. Bioinformatics deconvolution was used to calculate cellular proportions of cell subsets in organoids based on single-cell RNA sequencing data. RESULTS: Normal primary fibroblasts, isolated from tumor adjacent tissue, and cancer associated fibroblasts retained their molecular characteristics in vitro, including higher motility of cancer associated compared with normal fibroblasts. Importantly, both cancer-associated fibroblasts and normal fibroblasts supported cancer cell proliferation in 3D co-cultures, without the addition of classical niche factors. Organoids grown together with fibroblasts displayed a larger cellular heterogeneity of tumor cells compared with mono-cultures and closely resembled the in vivo tumor morphology. Additionally, we observed a mutual crosstalk between tumor cells and fibroblasts in the co-cultures. This was manifested by considerably deregulated pathways such as cell-cell communication and extracellular matrix remodeling in the organoids. Thrombospondin-1 was identified as a critical factor for fibroblast invasiveness. CONCLUSION: We developed a physiological tumor/stroma model, which will be vital as a personalized tumor model to study disease mechanisms and therapy response in colorectal cancer.


Assuntos
Fibroblastos Associados a Câncer , Neoplasias Colorretais , Humanos , Fibroblastos/metabolismo , Técnicas de Cocultura , Organoides/metabolismo , Fibroblastos Associados a Câncer/metabolismo , Neoplasias Colorretais/patologia , Microambiente Tumoral
7.
Cell Rep ; 37(5): 109943, 2021 11 02.
Artigo em Inglês | MEDLINE | ID: mdl-34731603

RESUMO

The ARID1A subunit of SWI/SNF chromatin remodeling complexes is a potent tumor suppressor. Here, a degron is applied to detect rapid loss of chromatin accessibility at thousands of loci where ARID1A acts to generate accessible minidomains of nucleosomes. Loss of ARID1A also results in the redistribution of the coactivator EP300. Co-incident EP300 dissociation and lost chromatin accessibility at enhancer elements are highly enriched adjacent to rapidly downregulated genes. In contrast, sites of gained EP300 occupancy are linked to genes that are transcriptionally upregulated. These chromatin changes are associated with a small number of genes that are differentially expressed in the first hours following loss of ARID1A. Indirect or adaptive changes dominate the transcriptome following growth for days after loss of ARID1A and result in strong engagement with cancer pathways. The identification of this hierarchy suggests sites for intervention in ARID1A-driven diseases.


Assuntos
Proteínas de Ligação a DNA/deficiência , Células-Tronco Embrionárias Murinas/metabolismo , Nucleossomos/metabolismo , Lesões Pré-Cancerosas/metabolismo , Fatores de Transcrição/deficiência , Transcrição Gênica , Ativação Transcricional , Animais , Sítios de Ligação , Linhagem Celular , Montagem e Desmontagem da Cromatina , Proteínas de Ligação a DNA/genética , Proteína p300 Associada a E1A/genética , Proteína p300 Associada a E1A/metabolismo , Masculino , Camundongos , Camundongos da Linhagem 129 , Nucleossomos/genética , Lesões Pré-Cancerosas/genética , Proteólise , Fatores de Tempo , Fatores de Transcrição/genética
8.
Life Sci Alliance ; 4(2)2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33310759

RESUMO

Malignant transformation depends on genetic and epigenetic events that result in a burst of deregulated gene expression and chromatin changes. To dissect the sequence of events in this process, we used a T-cell-specific lymphoma model based on the human oncogenic nucleophosmin-anaplastic lymphoma kinase (NPM-ALK) translocation. We find that transformation of T cells shifts thymic cell populations to an undifferentiated immunophenotype, which occurs only after a period of latency, accompanied by induction of the MYC-NOTCH1 axis and deregulation of key epigenetic enzymes. We discover aberrant DNA methylation patterns, overlapping with regulatory regions, plus a high degree of epigenetic heterogeneity between individual tumors. In addition, ALK-positive tumors show a loss of associated methylation patterns of neighboring CpG sites. Notably, deletion of the maintenance DNA methyltransferase DNMT1 completely abrogates lymphomagenesis in this model, despite oncogenic signaling through NPM-ALK, suggesting that faithful maintenance of tumor-specific methylation through DNMT1 is essential for sustained proliferation and tumorigenesis.


Assuntos
Transformação Celular Neoplásica/genética , Transformação Celular Neoplásica/metabolismo , DNA (Citosina-5-)-Metiltransferase 1/metabolismo , Epigênese Genética , Linfoma/etiologia , Linfoma/metabolismo , Proteínas Tirosina Quinases/genética , Animais , Biomarcadores Tumorais , Biologia Computacional/métodos , DNA (Citosina-5-)-Metiltransferase 1/genética , Metilação de DNA , Modelos Animais de Doenças , Suscetibilidade a Doenças , Epigenômica , Deleção de Genes , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Imuno-Histoquímica , Imunofenotipagem , Linfoma/tratamento farmacológico , Linfoma/patologia , Camundongos , Camundongos Knockout , Camundongos Transgênicos , Proteínas Tirosina Quinases/metabolismo , Fator de Transcrição STAT3/metabolismo , Transdução de Sinais , Ensaios Antitumorais Modelo de Xenoenxerto
9.
Cell Rep ; 23(5): 1530-1542, 2018 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-29719263

RESUMO

mRNA cap addition occurs early during RNA Pol II-dependent transcription, facilitating pre-mRNA processing and translation. We report that the mammalian mRNA cap methyltransferase, RNMT-RAM, promotes RNA Pol II transcription independent of mRNA capping and translation. In cells, sublethal suppression of RNMT-RAM reduces RNA Pol II occupancy, net mRNA synthesis, and pre-mRNA levels. Conversely, expression of RNMT-RAM increases transcription independent of cap methyltransferase activity. In isolated nuclei, recombinant RNMT-RAM stimulates transcriptional output; this requires the RAM RNA binding domain. RNMT-RAM interacts with nascent transcripts along their entire length and with transcription-associated factors including the RNA Pol II subunits SPT4, SPT6, and PAFc. Suppression of RNMT-RAM inhibits transcriptional markers including histone H2BK120 ubiquitination, H3K4 and H3K36 methylation, RNA Pol II CTD S5 and S2 phosphorylation, and PAFc recruitment. These findings suggest that multiple interactions among RNMT-RAM, RNA Pol II factors, and RNA along the transcription unit stimulate transcription.


Assuntos
Metiltransferases/metabolismo , RNA Polimerase II/metabolismo , Proteínas de Ligação a RNA/metabolismo , Transcrição Gênica/fisiologia , Células HEK293 , Células HeLa , Histonas/genética , Histonas/metabolismo , Humanos , Metiltransferases/genética , RNA Polimerase II/genética , Proteínas de Ligação a RNA/genética , Ubiquitinação/fisiologia
10.
PLoS Genet ; 13(5): e1006793, 2017 May.
Artigo em Inglês | MEDLINE | ID: mdl-28498846

RESUMO

Mutations in the gene encoding the methyl-CG binding protein MeCP2 cause several neurological disorders including Rett syndrome. The di-nucleotide methyl-CG (mCG) is the classical MeCP2 DNA recognition sequence, but additional methylated sequence targets have been reported. Here we show by in vitro and in vivo analyses that MeCP2 binding to non-CG methylated sites in brain is largely confined to the tri-nucleotide sequence mCAC. MeCP2 binding to chromosomal DNA in mouse brain is proportional to mCAC + mCG density and unexpectedly defines large genomic domains within which transcription is sensitive to MeCP2 occupancy. Our results suggest that MeCP2 integrates patterns of mCAC and mCG in the brain to restrain transcription of genes critical for neuronal function.


Assuntos
Encéfalo/metabolismo , Metilação de DNA , Repetições de Dinucleotídeos , Proteína 2 de Ligação a Metil-CpG/metabolismo , Repetições de Trinucleotídeos , Animais , Ilhas de CpG , Citosina/metabolismo , Epigênese Genética , Masculino , Proteína 2 de Ligação a Metil-CpG/genética , Camundongos , Camundongos Endogâmicos C57BL , Ligação Proteica , Síndrome de Rett/genética
11.
Nat Commun ; 8(1): 12, 2017 04 11.
Artigo em Inglês | MEDLINE | ID: mdl-28400552

RESUMO

RNA-binding proteins play a key role in shaping gene expression profiles during stress, however, little is known about the dynamic nature of these interactions and how this influences the kinetics of gene expression. To address this, we developed kinetic cross-linking and analysis of cDNAs (χCRAC), an ultraviolet cross-linking method that enabled us to quantitatively measure the dynamics of protein-RNA interactions in vivo on a minute time-scale. Here, using χCRAC we measure the global RNA-binding dynamics of the yeast transcription termination factor Nab3 in response to glucose starvation. These measurements reveal rapid changes in protein-RNA interactions within 1 min following stress imposition. Changes in Nab3 binding are largely independent of alterations in transcription rate during the early stages of stress response, indicating orthogonal transcriptional control mechanisms. We also uncover a function for Nab3 in dampening expression of stress-responsive genes. χCRAC has the potential to greatly enhance our understanding of in vivo dynamics of protein-RNA interactions.Protein RNA interactions are dynamic and regulated in response to environmental changes. Here the authors describe 'kinetic CRAC', an approach that allows time resolved analyses of protein RNA interactions with minute time point resolution and apply it to gain insight into the function of the RNA-binding protein Nab3.


Assuntos
Regulação Fúngica da Expressão Gênica , Proteínas Nucleares/genética , RNA Fúngico/genética , Proteínas de Ligação a RNA/genética , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Transcriptoma , Meios de Cultura/farmacologia , DNA Complementar/genética , DNA Complementar/metabolismo , Perfilação da Expressão Gênica , Glucose/deficiência , Cinética , Proteínas Nucleares/metabolismo , Ligação Proteica , RNA Fúngico/metabolismo , Proteínas de Ligação a RNA/metabolismo , Saccharomyces cerevisiae/efeitos dos fármacos , Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/efeitos da radiação , Proteínas de Saccharomyces cerevisiae/metabolismo , Estresse Fisiológico , Fatores de Tempo , Raios Ultravioleta
12.
BMC Bioinformatics ; 17(Suppl 16): 447, 2016 Dec 13.
Artigo em Inglês | MEDLINE | ID: mdl-28105912

RESUMO

BACKGROUND: Functional genomic and epigenomic research relies fundamentally on sequencing based methods like ChIP-seq for the detection of DNA-protein interactions. These techniques return large, high dimensional data sets with visually complex structures, such as multi-modal peaks extended over large genomic regions. Current tools for visualisation and data exploration represent and leverage these complex features only to a limited extent. RESULTS: We present DGW, an open source software package for simultaneous alignment and clustering of multiple epigenomic marks. DGW uses Dynamic Time Warping to adaptively rescale and align genomic distances which allows to group regions of interest with similar shapes, thereby capturing the structure of epigenomic marks. We demonstrate the effectiveness of the approach in a simulation study and on a real epigenomic data set from the ENCODE project. CONCLUSIONS: Our results show that DGW automatically recognises and aligns important genomic features such as transcription start sites and splicing sites from histone marks. DGW is available as an open source Python package.


Assuntos
Simulação por Computador , Epigenômica/métodos , Genoma Humano , Código das Histonas , Software , Imunoprecipitação da Cromatina , Análise por Conglomerados , DNA/metabolismo , Proteínas de Ligação a DNA/metabolismo , Epigênese Genética , Humanos , Leucemia/genética
13.
Bioinformatics ; 31(6): 809-16, 2015 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-25398611

RESUMO

MOTIVATION: DNA methylation is an intensely studied epigenetic mark implicated in many biological processes of direct clinical relevance. Although sequencing-based technologies are increasingly allowing high-resolution measurements of DNA methylation, statistical modelling of such data is still challenging. In particular, statistical identification of differentially methylated regions across different conditions poses unresolved challenges in accounting for spatial correlations within the statistical testing procedure. RESULTS: We propose a non-parametric, kernel-based method, M(3)D, to detect higher order changes in methylation profiles, such as shape, across pre-defined regions. The test statistic explicitly accounts for differences in coverage levels between samples, thus handling in a principled way a major confounder in the analysis of methylation data. Empirical tests on real and simulated datasets show an increased power compared to established methods, as well as considerable robustness with respect to coverage and replication levels.


Assuntos
Metilação de DNA , Células-Tronco Embrionárias/metabolismo , Modelos Estatísticos , Animais , Simulação por Computador , Epigenômica , Humanos , Camundongos , Software
14.
BMC Genomics ; 14: 826, 2013 Nov 24.
Artigo em Inglês | MEDLINE | ID: mdl-24267901

RESUMO

BACKGROUND: Cell-specific gene expression is controlled by epigenetic modifications and transcription factor binding. While genome-wide maps for these protein-DNA interactions have become widely available, quantitative comparison of the resulting ChIP-Seq data sets remains challenging. Current approaches to detect differentially bound or modified regions are mainly borrowed from RNA-Seq data analysis, thus focusing on total counts of fragments mapped to a region, ignoring any information encoded in the shape of the peaks. RESULTS: Here, we present MMDiff, a robust, broadly applicable method for detecting differences between sequence count data sets. Based on quantifying shape changes in signal profiles, it overcomes challenges imposed by the highly structured nature of the data and the paucity of replicates.We first use a simulated data set to compare the performance of MMDiff with results obtained by four alternative methods. We demonstrate that MMDiff excels when peak profiles change between samples. We next use MMDiff to re-analyse a recent data set of the histone modification H3K4me3 elucidating the establishment of this prominent epigenomic marker. Our empirical analysis shows that the method yields reproducible results across experiments, and is able to detect functional important changes in histone modifications. To further explore the broader applicability of MMDiff, we apply it to two ENCODE data sets: one investigating the histone modification H3K27ac and one measuring the genome-wide binding of the transcription factor CTCF. In both cases, MMDiff proves to be complementary to count-based methods. In addition, we can show that MMDiff is capable of directly detecting changes of homotypic binding events at neighbouring binding sites. MMDiff is readily available as a Bioconductor package. CONCLUSIONS: Our results demonstrate that higher order features of ChIP-Seq peaks carry relevant and often complementary information to total counts, and hence are important in assessing differential histone modifications and transcription factor binding. We have developed a new computational method, MMDiff, that is capable of exploring these features and therefore closes an existing gap in the analysis of ChIP-Seq data sets.


Assuntos
Imunoprecipitação da Cromatina/métodos , Biologia Computacional/métodos , Análise de Sequência de DNA/métodos , Animais , Linhagem Celular , Simulação por Computador , Epigenômica , Histonas/metabolismo , Humanos , Camundongos , Estatísticas não Paramétricas
15.
Genome Res ; 19(11): 2133-43, 2009 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-19564452

RESUMO

We present a highly accurate gene-prediction system for eukaryotic genomes, called mGene. It combines in an unprecedented manner the flexibility of generalized hidden Markov models (gHMMs) with the predictive power of modern machine learning methods, such as Support Vector Machines (SVMs). Its excellent performance was proved in an objective competition based on the genome of the nematode Caenorhabditis elegans. Considering the average of sensitivity and specificity, the developmental version of mGene exhibited the best prediction performance on nucleotide, exon, and transcript level for ab initio and multiple-genome gene-prediction tasks. The fully developed version shows superior performance in 10 out of 12 evaluation criteria compared with the other participating gene finders, including Fgenesh++ and Augustus. An in-depth analysis of mGene's genome-wide predictions revealed that approximately 2200 predicted genes were not contained in the current genome annotation. Testing a subset of 57 of these genes by RT-PCR and sequencing, we confirmed expression for 24 (42%) of them. mGene missed 300 annotated genes, out of which 205 were unconfirmed. RT-PCR testing of 24 of these genes resulted in a success rate of merely 8%. These findings suggest that even the gene catalog of a well-studied organism such as C. elegans can be substantially improved by mGene's predictions. We also provide gene predictions for the four nematodes C. briggsae, C. brenneri, C. japonica, and C. remanei. Comparing the resulting proteomes among these organisms and to the known protein universe, we identified many species-specific gene inventions. In a quality assessment of several available annotations for these genomes, we find that mGene's predictions are most accurate.


Assuntos
Algoritmos , Caenorhabditis elegans/genética , Biologia Computacional/métodos , Genoma Helmíntico/genética , Animais , Inteligência Artificial , Caenorhabditis/classificação , Caenorhabditis/genética , Genes de Helmintos/genética , Genômica/métodos , Sítios de Splice de RNA , Reprodutibilidade dos Testes , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Análise de Sequência de DNA , Sítio de Iniciação de Transcrição
16.
Nucleic Acids Res ; 37(Web Server issue): W312-6, 2009 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-19494180

RESUMO

We describe mGene.web, a web service for the genome-wide prediction of protein coding genes from eukaryotic DNA sequences. It offers pre-trained models for the recognition of gene structures including untranslated regions in an increasing number of organisms. With mGene.web, users have the additional possibility to train the system with their own data for other organisms on the push of a button, a functionality that will greatly accelerate the annotation of newly sequenced genomes. The system is built in a highly modular way, such that individual components of the framework, like the promoter prediction tool or the splice site predictor, can be used autonomously. The underlying gene finding system mGene is based on discriminative machine learning techniques and its high accuracy has been demonstrated in an international competition on nematode genomes. mGene.web is available at http://www.mgene.org/web, it is free of charge and can be used for eukaryotic genomes of small to moderate size (several hundred Mbp).


Assuntos
Genes , Genômica , Proteínas/genética , Software , Internet , Sítios de Splice de RNA , Análise de Sequência de DNA , Sítio de Iniciação de Transcrição
17.
Science ; 317(5836): 338-42, 2007 Jul 20.
Artigo em Inglês | MEDLINE | ID: mdl-17641193

RESUMO

The genomes of individuals from the same species vary in sequence as a result of different evolutionary processes. To examine the patterns of, and the forces shaping, sequence variation in Arabidopsis thaliana, we performed high-density array resequencing of 20 diverse strains (accessions). More than 1 million nonredundant single-nucleotide polymorphisms (SNPs) were identified at moderate false discovery rates (FDRs), and approximately 4% of the genome was identified as being highly dissimilar or deleted relative to the reference genome sequence. Patterns of polymorphism are highly nonrandom among gene families, with genes mediating interaction with the biotic environment having exceptional polymorphism levels. At the chromosomal scale, regional variation in polymorphism was readily apparent. A scan for recent selective sweeps revealed several candidate regions, including a notable example in which almost all variation was removed in a 500-kilobase window. Analyzing the polymorphisms we describe in larger sets of accessions will enable a detailed understanding of forces shaping population-wide sequence variation in A. thaliana.


Assuntos
Arabidopsis/genética , Variação Genética , Genoma de Planta , Polimorfismo Genético , Polimorfismo de Nucleotídeo Único , Algoritmos , Sequência de Bases , Cromossomos de Plantas/genética , Biologia Computacional , Frequência do Gene , Genes de Plantas , Dados de Sequência Molecular , Seleção Genética , Análise de Sequência de DNA
18.
BMC Bioinformatics ; 8 Suppl 10: S7, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-18269701

RESUMO

BACKGROUND: For splice site recognition, one has to solve two classification problems: discriminating true from decoy splice sites for both acceptor and donor sites. Gene finding systems typically rely on Markov Chains to solve these tasks. RESULTS: In this work we consider Support Vector Machines for splice site recognition. We employ the so-called weighted degree kernel which turns out well suited for this task, as we will illustrate in several experiments where we compare its prediction accuracy with that of recently proposed systems. We apply our method to the genome-wide recognition of splice sites in Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, Danio rerio, and Homo sapiens. Our performance estimates indicate that splice sites can be recognized very accurately in these genomes and that our method outperforms many other methods including Markov Chains, GeneSplicer and SpliceMachine. We provide genome-wide predictions of splice sites and a stand-alone prediction tool ready to be used for incorporation in a gene finder. AVAILABILITY: Data, splits, additional information on the model selection, the whole genome predictions, as well as the stand-alone prediction tool are available for download at http://www.fml.mpg.de/raetsch/projects/splice.


Assuntos
Sítios de Splice de RNA/genética , Algoritmos , Animais , Arabidopsis/genética , Brassicaceae/genética , Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Previsões/métodos , Genômica/métodos , Humanos , Cadeias de Markov , Peixe-Zebra/genética
19.
Structure ; 13(3): 423-34, 2005 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-15766544

RESUMO

We obtained tomograms of isolated mammalian excitatory synapses by cryo-electron tomography. This method allows the investigation of biological material in the frozen-hydrated state, without staining, and can therefore provide reliable structural information at the molecular level. We developed an automated procedure for the segmentation of molecular complexes present in the synaptic cleft based on thresholding and connectivity, and calculated several morphological characteristics of these complexes. Extensive lateral connections along the synaptic cleft are shown to form a highly connected structure with a complex topology. Our results are essentially parameter-free, i.e., they do not depend on the choice of certain parameter values (such as threshold). In addition, the results are not sensitive to noise; the same conclusions can be drawn from the analysis of both nondenoised and denoised tomograms.


Assuntos
Microscopia Crioeletrônica , Sinapses/ultraestrutura , Animais , Mamíferos , Complexos Multiproteicos/análise , Complexos Multiproteicos/ultraestrutura , Conformação Proteica , Sinapses/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA