Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
Add more filters










Publication year range
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38340090

ABSTRACT

MOTIVATION: Genome-wide association studies (GWAS) have enabled large-scale analysis of the role of genetic variants in human disease. Despite impressive methodological advances, subsequent clinical interpretation and application remains challenging when GWAS suffer from a lack of statistical power. In recent years, however, the use of information diffusion algorithms with molecular networks has led to fruitful insights on disease genes. RESULTS: We present an overview of the design choices and pitfalls that prove crucial in the application of network propagation methods to GWAS summary statistics. We highlight general trends from the literature, and present benchmark experiments to expand on these insights selecting as case study three diseases and five molecular networks. We verify that the use of gene-level scores based on GWAS P-values offers advantages over the selection of a set of 'seed' disease genes not weighted by the associated P-values if the GWAS summary statistics are of sufficient quality. Beyond that, the size and the density of the networks prove to be important factors for consideration. Finally, we explore several ensemble methods and show that combining multiple networks may improve the network propagation approach.


Subject(s)
Genome-Wide Association Study , Polymorphism, Single Nucleotide , Humans , Genome-Wide Association Study/methods , Algorithms , Gene Regulatory Networks , Genetic Predisposition to Disease
2.
Patterns (N Y) ; 4(9): 100830, 2023 Sep 08.
Article in English | MEDLINE | ID: mdl-37720333

ABSTRACT

The black-box nature of most artificial intelligence (AI) models encourages the development of explainability methods to engender trust into the AI decision-making process. Such methods can be broadly categorized into two main types: post hoc explanations and inherently interpretable algorithms. We aimed at analyzing the possible associations between COVID-19 and the push of explainable AI (XAI) to the forefront of biomedical research. We automatically extracted from the PubMed database biomedical XAI studies related to concepts of causality or explainability and manually labeled 1,603 papers with respect to XAI categories. To compare the trends pre- and post-COVID-19, we fit a change point detection model and evaluated significant changes in publication rates. We show that the advent of COVID-19 in the beginning of 2020 could be the driving factor behind an increased focus concerning XAI, playing a crucial role in accelerating an already evolving trend. Finally, we present a discussion with future societal use and impact of XAI technologies and potential future directions for those who pursue fostering clinical trust with interpretable machine learning models.

3.
Clin Lung Cancer ; 24(8): e311-e322, 2023 12.
Article in English | MEDLINE | ID: mdl-37689579

ABSTRACT

PURPOSE: Non-small-cell lung cancer (NSCLC) shows a high incidence of brain metastases (BM). Early detection is crucial to improve clinical prospects. We trained and validated classifier models to identify patients with a high risk of developing BM, as they could potentially benefit from surveillance brain MRI. METHODS: Consecutive patients with an initial diagnosis of NSCLC from January 2011 to April 2019 and an in-house chest-CT scan (staging) were retrospectively recruited at a German lung cancer center. Brain imaging was performed at initial diagnosis and in case of neurological symptoms (follow-up). Subjects lost to follow-up or still alive without BM at the data cut-off point (12/2020) were excluded. Covariates included clinical and/or 3D-radiomics-features of the primary tumor from staging chest-CT. Four machine learning models for prediction (80/20 training) were compared. Gini Importance and SHAP were used as measures of importance; sensitivity, specificity, area under the precision-recall curve, and Matthew's Correlation Coefficient as evaluation metrics. RESULTS: Three hundred and ninety-five patients compromised the clinical cohort. Predictive models based on clinical features offered the best performance (tuned to maximize recall: sensitivity∼70%, specificity∼60%). Radiomics features failed to provide sufficient information, likely due to the heterogeneity of imaging data. Adenocarcinoma histology, lymph node invasion, and histological tumor grade were positively correlated with the prediction of BM, age, and squamous cell carcinoma histology were negatively correlated. A subgroup discovery analysis identified 2 candidate patient subpopulations appearing to present a higher risk of BM (female patients + adenocarcinoma histology, adenocarcinoma patients + no other distant metastases). CONCLUSION: Analysis of the importance of input features suggests that the models are learning the relevant relationships between clinical features/development of BM. A higher number of samples is to be prioritized to improve performance. Employed prospectively at initial diagnosis, such models can help select high-risk subgroups for surveillance brain MRI.


Subject(s)
Adenocarcinoma , Brain Neoplasms , Carcinoma, Non-Small-Cell Lung , Lung Neoplasms , Humans , Female , Carcinoma, Non-Small-Cell Lung/pathology , Lung Neoplasms/diagnosis , Lung Neoplasms/pathology , Retrospective Studies , Brain Neoplasms/diagnostic imaging , Brain Neoplasms/secondary , Machine Learning
4.
Nat Commun ; 14(1): 4750, 2023 08 07.
Article in English | MEDLINE | ID: mdl-37550323

ABSTRACT

Epigenetic modifications are dynamic mechanisms involved in the regulation of gene expression. Unlike the DNA sequence, epigenetic patterns vary not only between individuals, but also between different cell types within an individual. Environmental factors, somatic mutations and ageing contribute to epigenetic changes that may constitute early hallmarks or causal factors of disease. Epigenetic modifications are reversible and thus promising therapeutic targets for precision medicine. However, mapping efforts to determine an individual's cell-type-specific epigenome are constrained by experimental costs and tissue accessibility. To address these challenges, we developed eDICE, an attention-based deep learning model that is trained to impute missing epigenomic tracks by conditioning on observed tracks. Using a recently published set of epigenomes from four individual donors, we show that transfer learning across individuals allows eDICE to successfully predict individual-specific epigenetic variation even in tissues that are unmapped in a given donor. These results highlight the potential of machine learning-based imputation methods to advance personalized epigenomics.


Subject(s)
Epigenesis, Genetic , Epigenomics , Humans , Epigenomics/methods , Machine Learning , Epigenome , Precision Medicine/methods , DNA Methylation/genetics
5.
Genome Biol ; 24(1): 79, 2023 04 18.
Article in English | MEDLINE | ID: mdl-37072822

ABSTRACT

A promising alternative to comprehensively performing genomics experiments is to, instead, perform a subset of experiments and use computational methods to impute the remainder. However, identifying the best imputation methods and what measures meaningfully evaluate performance are open questions. We address these questions by comprehensively analyzing 23 methods from the ENCODE Imputation Challenge. We find that imputation evaluations are challenging and confounded by distributional shifts from differences in data collection and processing over time, the amount of available data, and redundancy among performance measures. Our analyses suggest simple steps for overcoming these issues and promising directions for more robust research.


Subject(s)
Algorithms , Epigenomics , Genomics/methods
6.
Cell Mol Gastroenterol Hepatol ; 15(6): 1391-1419, 2023.
Article in English | MEDLINE | ID: mdl-36868311

ABSTRACT

BACKGROUND & AIMS: Patient-derived organoid cancer models are generated from epithelial tumor cells and reflect tumor characteristics. However, they lack the complexity of the tumor microenvironment, which is a key driver of tumorigenesis and therapy response. Here, we developed a colorectal cancer organoid model that incorporates matched epithelial cells and stromal fibroblasts. METHODS: Primary fibroblasts and tumor cells were isolated from colorectal cancer specimens. Fibroblasts were characterized for their proteome, secretome, and gene expression signatures. Fibroblast/organoid co-cultures were analyzed by immunohistochemistry and compared with their tissue of origin, as well as on gene expression levels compared with standard organoid models. Bioinformatics deconvolution was used to calculate cellular proportions of cell subsets in organoids based on single-cell RNA sequencing data. RESULTS: Normal primary fibroblasts, isolated from tumor adjacent tissue, and cancer associated fibroblasts retained their molecular characteristics in vitro, including higher motility of cancer associated compared with normal fibroblasts. Importantly, both cancer-associated fibroblasts and normal fibroblasts supported cancer cell proliferation in 3D co-cultures, without the addition of classical niche factors. Organoids grown together with fibroblasts displayed a larger cellular heterogeneity of tumor cells compared with mono-cultures and closely resembled the in vivo tumor morphology. Additionally, we observed a mutual crosstalk between tumor cells and fibroblasts in the co-cultures. This was manifested by considerably deregulated pathways such as cell-cell communication and extracellular matrix remodeling in the organoids. Thrombospondin-1 was identified as a critical factor for fibroblast invasiveness. CONCLUSION: We developed a physiological tumor/stroma model, which will be vital as a personalized tumor model to study disease mechanisms and therapy response in colorectal cancer.


Subject(s)
Cancer-Associated Fibroblasts , Colorectal Neoplasms , Humans , Fibroblasts/metabolism , Coculture Techniques , Organoids/metabolism , Cancer-Associated Fibroblasts/metabolism , Colorectal Neoplasms/pathology , Tumor Microenvironment
7.
Cell Rep ; 37(5): 109943, 2021 11 02.
Article in English | MEDLINE | ID: mdl-34731603

ABSTRACT

The ARID1A subunit of SWI/SNF chromatin remodeling complexes is a potent tumor suppressor. Here, a degron is applied to detect rapid loss of chromatin accessibility at thousands of loci where ARID1A acts to generate accessible minidomains of nucleosomes. Loss of ARID1A also results in the redistribution of the coactivator EP300. Co-incident EP300 dissociation and lost chromatin accessibility at enhancer elements are highly enriched adjacent to rapidly downregulated genes. In contrast, sites of gained EP300 occupancy are linked to genes that are transcriptionally upregulated. These chromatin changes are associated with a small number of genes that are differentially expressed in the first hours following loss of ARID1A. Indirect or adaptive changes dominate the transcriptome following growth for days after loss of ARID1A and result in strong engagement with cancer pathways. The identification of this hierarchy suggests sites for intervention in ARID1A-driven diseases.


Subject(s)
DNA-Binding Proteins/deficiency , Mouse Embryonic Stem Cells/metabolism , Nucleosomes/metabolism , Precancerous Conditions/metabolism , Transcription Factors/deficiency , Transcription, Genetic , Transcriptional Activation , Animals , Binding Sites , Cell Line , Chromatin Assembly and Disassembly , DNA-Binding Proteins/genetics , E1A-Associated p300 Protein/genetics , E1A-Associated p300 Protein/metabolism , Male , Mice , Mice, 129 Strain , Nucleosomes/genetics , Precancerous Conditions/genetics , Proteolysis , Time Factors , Transcription Factors/genetics
8.
Life Sci Alliance ; 4(2)2021 02.
Article in English | MEDLINE | ID: mdl-33310759

ABSTRACT

Malignant transformation depends on genetic and epigenetic events that result in a burst of deregulated gene expression and chromatin changes. To dissect the sequence of events in this process, we used a T-cell-specific lymphoma model based on the human oncogenic nucleophosmin-anaplastic lymphoma kinase (NPM-ALK) translocation. We find that transformation of T cells shifts thymic cell populations to an undifferentiated immunophenotype, which occurs only after a period of latency, accompanied by induction of the MYC-NOTCH1 axis and deregulation of key epigenetic enzymes. We discover aberrant DNA methylation patterns, overlapping with regulatory regions, plus a high degree of epigenetic heterogeneity between individual tumors. In addition, ALK-positive tumors show a loss of associated methylation patterns of neighboring CpG sites. Notably, deletion of the maintenance DNA methyltransferase DNMT1 completely abrogates lymphomagenesis in this model, despite oncogenic signaling through NPM-ALK, suggesting that faithful maintenance of tumor-specific methylation through DNMT1 is essential for sustained proliferation and tumorigenesis.


Subject(s)
Cell Transformation, Neoplastic/genetics , Cell Transformation, Neoplastic/metabolism , DNA (Cytosine-5-)-Methyltransferase 1/metabolism , Epigenesis, Genetic , Lymphoma/etiology , Lymphoma/metabolism , Protein-Tyrosine Kinases/genetics , Animals , Biomarkers, Tumor , Computational Biology/methods , DNA (Cytosine-5-)-Methyltransferase 1/genetics , DNA Methylation , Disease Models, Animal , Disease Susceptibility , Epigenomics , Gene Deletion , Gene Expression Regulation, Neoplastic , Gene Regulatory Networks , Humans , Immunohistochemistry , Immunophenotyping , Lymphoma/drug therapy , Lymphoma/pathology , Mice , Mice, Knockout , Mice, Transgenic , Protein-Tyrosine Kinases/metabolism , STAT3 Transcription Factor/metabolism , Signal Transduction , Xenograft Model Antitumor Assays
9.
Cell Rep ; 23(5): 1530-1542, 2018 05 01.
Article in English | MEDLINE | ID: mdl-29719263

ABSTRACT

mRNA cap addition occurs early during RNA Pol II-dependent transcription, facilitating pre-mRNA processing and translation. We report that the mammalian mRNA cap methyltransferase, RNMT-RAM, promotes RNA Pol II transcription independent of mRNA capping and translation. In cells, sublethal suppression of RNMT-RAM reduces RNA Pol II occupancy, net mRNA synthesis, and pre-mRNA levels. Conversely, expression of RNMT-RAM increases transcription independent of cap methyltransferase activity. In isolated nuclei, recombinant RNMT-RAM stimulates transcriptional output; this requires the RAM RNA binding domain. RNMT-RAM interacts with nascent transcripts along their entire length and with transcription-associated factors including the RNA Pol II subunits SPT4, SPT6, and PAFc. Suppression of RNMT-RAM inhibits transcriptional markers including histone H2BK120 ubiquitination, H3K4 and H3K36 methylation, RNA Pol II CTD S5 and S2 phosphorylation, and PAFc recruitment. These findings suggest that multiple interactions among RNMT-RAM, RNA Pol II factors, and RNA along the transcription unit stimulate transcription.


Subject(s)
Methyltransferases/metabolism , RNA Polymerase II/metabolism , RNA-Binding Proteins/metabolism , Transcription, Genetic/physiology , HEK293 Cells , HeLa Cells , Histones/genetics , Histones/metabolism , Humans , Methyltransferases/genetics , RNA Polymerase II/genetics , RNA-Binding Proteins/genetics , Ubiquitination/physiology
10.
PLoS Genet ; 13(5): e1006793, 2017 May.
Article in English | MEDLINE | ID: mdl-28498846

ABSTRACT

Mutations in the gene encoding the methyl-CG binding protein MeCP2 cause several neurological disorders including Rett syndrome. The di-nucleotide methyl-CG (mCG) is the classical MeCP2 DNA recognition sequence, but additional methylated sequence targets have been reported. Here we show by in vitro and in vivo analyses that MeCP2 binding to non-CG methylated sites in brain is largely confined to the tri-nucleotide sequence mCAC. MeCP2 binding to chromosomal DNA in mouse brain is proportional to mCAC + mCG density and unexpectedly defines large genomic domains within which transcription is sensitive to MeCP2 occupancy. Our results suggest that MeCP2 integrates patterns of mCAC and mCG in the brain to restrain transcription of genes critical for neuronal function.


Subject(s)
Brain/metabolism , DNA Methylation , Dinucleotide Repeats , Methyl-CpG-Binding Protein 2/metabolism , Trinucleotide Repeats , Animals , CpG Islands , Cytosine/metabolism , Epigenesis, Genetic , Male , Methyl-CpG-Binding Protein 2/genetics , Mice , Mice, Inbred C57BL , Protein Binding , Rett Syndrome/genetics
11.
Nat Commun ; 8(1): 12, 2017 04 11.
Article in English | MEDLINE | ID: mdl-28400552

ABSTRACT

RNA-binding proteins play a key role in shaping gene expression profiles during stress, however, little is known about the dynamic nature of these interactions and how this influences the kinetics of gene expression. To address this, we developed kinetic cross-linking and analysis of cDNAs (χCRAC), an ultraviolet cross-linking method that enabled us to quantitatively measure the dynamics of protein-RNA interactions in vivo on a minute time-scale. Here, using χCRAC we measure the global RNA-binding dynamics of the yeast transcription termination factor Nab3 in response to glucose starvation. These measurements reveal rapid changes in protein-RNA interactions within 1 min following stress imposition. Changes in Nab3 binding are largely independent of alterations in transcription rate during the early stages of stress response, indicating orthogonal transcriptional control mechanisms. We also uncover a function for Nab3 in dampening expression of stress-responsive genes. χCRAC has the potential to greatly enhance our understanding of in vivo dynamics of protein-RNA interactions.Protein RNA interactions are dynamic and regulated in response to environmental changes. Here the authors describe 'kinetic CRAC', an approach that allows time resolved analyses of protein RNA interactions with minute time point resolution and apply it to gain insight into the function of the RNA-binding protein Nab3.


Subject(s)
Gene Expression Regulation, Fungal , Nuclear Proteins/genetics , RNA, Fungal/genetics , RNA-Binding Proteins/genetics , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae/genetics , Transcriptome , Culture Media/pharmacology , DNA, Complementary/genetics , DNA, Complementary/metabolism , Gene Expression Profiling , Glucose/deficiency , Kinetics , Nuclear Proteins/metabolism , Protein Binding , RNA, Fungal/metabolism , RNA-Binding Proteins/metabolism , Saccharomyces cerevisiae/drug effects , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae/radiation effects , Saccharomyces cerevisiae Proteins/metabolism , Stress, Physiological , Time Factors , Ultraviolet Rays
12.
BMC Bioinformatics ; 17(Suppl 16): 447, 2016 Dec 13.
Article in English | MEDLINE | ID: mdl-28105912

ABSTRACT

BACKGROUND: Functional genomic and epigenomic research relies fundamentally on sequencing based methods like ChIP-seq for the detection of DNA-protein interactions. These techniques return large, high dimensional data sets with visually complex structures, such as multi-modal peaks extended over large genomic regions. Current tools for visualisation and data exploration represent and leverage these complex features only to a limited extent. RESULTS: We present DGW, an open source software package for simultaneous alignment and clustering of multiple epigenomic marks. DGW uses Dynamic Time Warping to adaptively rescale and align genomic distances which allows to group regions of interest with similar shapes, thereby capturing the structure of epigenomic marks. We demonstrate the effectiveness of the approach in a simulation study and on a real epigenomic data set from the ENCODE project. CONCLUSIONS: Our results show that DGW automatically recognises and aligns important genomic features such as transcription start sites and splicing sites from histone marks. DGW is available as an open source Python package.


Subject(s)
Computer Simulation , Epigenomics/methods , Genome, Human , Histone Code , Software , Chromatin Immunoprecipitation , Cluster Analysis , DNA/metabolism , DNA-Binding Proteins/metabolism , Epigenesis, Genetic , Humans , Leukemia/genetics
13.
Bioinformatics ; 31(6): 809-16, 2015 Mar 15.
Article in English | MEDLINE | ID: mdl-25398611

ABSTRACT

MOTIVATION: DNA methylation is an intensely studied epigenetic mark implicated in many biological processes of direct clinical relevance. Although sequencing-based technologies are increasingly allowing high-resolution measurements of DNA methylation, statistical modelling of such data is still challenging. In particular, statistical identification of differentially methylated regions across different conditions poses unresolved challenges in accounting for spatial correlations within the statistical testing procedure. RESULTS: We propose a non-parametric, kernel-based method, M(3)D, to detect higher order changes in methylation profiles, such as shape, across pre-defined regions. The test statistic explicitly accounts for differences in coverage levels between samples, thus handling in a principled way a major confounder in the analysis of methylation data. Empirical tests on real and simulated datasets show an increased power compared to established methods, as well as considerable robustness with respect to coverage and replication levels.


Subject(s)
DNA Methylation , Embryonic Stem Cells/metabolism , Models, Statistical , Animals , Computer Simulation , Epigenomics , Humans , Mice , Software
14.
BMC Genomics ; 14: 826, 2013 Nov 24.
Article in English | MEDLINE | ID: mdl-24267901

ABSTRACT

BACKGROUND: Cell-specific gene expression is controlled by epigenetic modifications and transcription factor binding. While genome-wide maps for these protein-DNA interactions have become widely available, quantitative comparison of the resulting ChIP-Seq data sets remains challenging. Current approaches to detect differentially bound or modified regions are mainly borrowed from RNA-Seq data analysis, thus focusing on total counts of fragments mapped to a region, ignoring any information encoded in the shape of the peaks. RESULTS: Here, we present MMDiff, a robust, broadly applicable method for detecting differences between sequence count data sets. Based on quantifying shape changes in signal profiles, it overcomes challenges imposed by the highly structured nature of the data and the paucity of replicates.We first use a simulated data set to compare the performance of MMDiff with results obtained by four alternative methods. We demonstrate that MMDiff excels when peak profiles change between samples. We next use MMDiff to re-analyse a recent data set of the histone modification H3K4me3 elucidating the establishment of this prominent epigenomic marker. Our empirical analysis shows that the method yields reproducible results across experiments, and is able to detect functional important changes in histone modifications. To further explore the broader applicability of MMDiff, we apply it to two ENCODE data sets: one investigating the histone modification H3K27ac and one measuring the genome-wide binding of the transcription factor CTCF. In both cases, MMDiff proves to be complementary to count-based methods. In addition, we can show that MMDiff is capable of directly detecting changes of homotypic binding events at neighbouring binding sites. MMDiff is readily available as a Bioconductor package. CONCLUSIONS: Our results demonstrate that higher order features of ChIP-Seq peaks carry relevant and often complementary information to total counts, and hence are important in assessing differential histone modifications and transcription factor binding. We have developed a new computational method, MMDiff, that is capable of exploring these features and therefore closes an existing gap in the analysis of ChIP-Seq data sets.


Subject(s)
Chromatin Immunoprecipitation/methods , Computational Biology/methods , Sequence Analysis, DNA/methods , Animals , Cell Line , Computer Simulation , Epigenomics , Histones/metabolism , Humans , Mice , Statistics, Nonparametric
15.
Genome Res ; 19(11): 2133-43, 2009 Nov.
Article in English | MEDLINE | ID: mdl-19564452

ABSTRACT

We present a highly accurate gene-prediction system for eukaryotic genomes, called mGene. It combines in an unprecedented manner the flexibility of generalized hidden Markov models (gHMMs) with the predictive power of modern machine learning methods, such as Support Vector Machines (SVMs). Its excellent performance was proved in an objective competition based on the genome of the nematode Caenorhabditis elegans. Considering the average of sensitivity and specificity, the developmental version of mGene exhibited the best prediction performance on nucleotide, exon, and transcript level for ab initio and multiple-genome gene-prediction tasks. The fully developed version shows superior performance in 10 out of 12 evaluation criteria compared with the other participating gene finders, including Fgenesh++ and Augustus. An in-depth analysis of mGene's genome-wide predictions revealed that approximately 2200 predicted genes were not contained in the current genome annotation. Testing a subset of 57 of these genes by RT-PCR and sequencing, we confirmed expression for 24 (42%) of them. mGene missed 300 annotated genes, out of which 205 were unconfirmed. RT-PCR testing of 24 of these genes resulted in a success rate of merely 8%. These findings suggest that even the gene catalog of a well-studied organism such as C. elegans can be substantially improved by mGene's predictions. We also provide gene predictions for the four nematodes C. briggsae, C. brenneri, C. japonica, and C. remanei. Comparing the resulting proteomes among these organisms and to the known protein universe, we identified many species-specific gene inventions. In a quality assessment of several available annotations for these genomes, we find that mGene's predictions are most accurate.


Subject(s)
Algorithms , Caenorhabditis elegans/genetics , Computational Biology/methods , Genome, Helminth/genetics , Animals , Artificial Intelligence , Caenorhabditis/classification , Caenorhabditis/genetics , Genes, Helminth/genetics , Genomics/methods , RNA Splice Sites , Reproducibility of Results , Reverse Transcriptase Polymerase Chain Reaction , Sequence Analysis, DNA , Transcription Initiation Site
16.
Nucleic Acids Res ; 37(Web Server issue): W312-6, 2009 Jul.
Article in English | MEDLINE | ID: mdl-19494180

ABSTRACT

We describe mGene.web, a web service for the genome-wide prediction of protein coding genes from eukaryotic DNA sequences. It offers pre-trained models for the recognition of gene structures including untranslated regions in an increasing number of organisms. With mGene.web, users have the additional possibility to train the system with their own data for other organisms on the push of a button, a functionality that will greatly accelerate the annotation of newly sequenced genomes. The system is built in a highly modular way, such that individual components of the framework, like the promoter prediction tool or the splice site predictor, can be used autonomously. The underlying gene finding system mGene is based on discriminative machine learning techniques and its high accuracy has been demonstrated in an international competition on nematode genomes. mGene.web is available at http://www.mgene.org/web, it is free of charge and can be used for eukaryotic genomes of small to moderate size (several hundred Mbp).


Subject(s)
Genes , Genomics , Proteins/genetics , Software , Internet , RNA Splice Sites , Sequence Analysis, DNA , Transcription Initiation Site
17.
Science ; 317(5836): 338-42, 2007 Jul 20.
Article in English | MEDLINE | ID: mdl-17641193

ABSTRACT

The genomes of individuals from the same species vary in sequence as a result of different evolutionary processes. To examine the patterns of, and the forces shaping, sequence variation in Arabidopsis thaliana, we performed high-density array resequencing of 20 diverse strains (accessions). More than 1 million nonredundant single-nucleotide polymorphisms (SNPs) were identified at moderate false discovery rates (FDRs), and approximately 4% of the genome was identified as being highly dissimilar or deleted relative to the reference genome sequence. Patterns of polymorphism are highly nonrandom among gene families, with genes mediating interaction with the biotic environment having exceptional polymorphism levels. At the chromosomal scale, regional variation in polymorphism was readily apparent. A scan for recent selective sweeps revealed several candidate regions, including a notable example in which almost all variation was removed in a 500-kilobase window. Analyzing the polymorphisms we describe in larger sets of accessions will enable a detailed understanding of forces shaping population-wide sequence variation in A. thaliana.


Subject(s)
Arabidopsis/genetics , Genetic Variation , Genome, Plant , Polymorphism, Genetic , Polymorphism, Single Nucleotide , Algorithms , Base Sequence , Chromosomes, Plant/genetics , Computational Biology , Gene Frequency , Genes, Plant , Molecular Sequence Data , Selection, Genetic , Sequence Analysis, DNA
18.
BMC Bioinformatics ; 8 Suppl 10: S7, 2007.
Article in English | MEDLINE | ID: mdl-18269701

ABSTRACT

BACKGROUND: For splice site recognition, one has to solve two classification problems: discriminating true from decoy splice sites for both acceptor and donor sites. Gene finding systems typically rely on Markov Chains to solve these tasks. RESULTS: In this work we consider Support Vector Machines for splice site recognition. We employ the so-called weighted degree kernel which turns out well suited for this task, as we will illustrate in several experiments where we compare its prediction accuracy with that of recently proposed systems. We apply our method to the genome-wide recognition of splice sites in Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, Danio rerio, and Homo sapiens. Our performance estimates indicate that splice sites can be recognized very accurately in these genomes and that our method outperforms many other methods including Markov Chains, GeneSplicer and SpliceMachine. We provide genome-wide predictions of splice sites and a stand-alone prediction tool ready to be used for incorporation in a gene finder. AVAILABILITY: Data, splits, additional information on the model selection, the whole genome predictions, as well as the stand-alone prediction tool are available for download at http://www.fml.mpg.de/raetsch/projects/splice.


Subject(s)
RNA Splice Sites/genetics , Algorithms , Animals , Arabidopsis/genetics , Brassicaceae/genetics , Caenorhabditis elegans/genetics , Drosophila melanogaster/genetics , Forecasting/methods , Genomics/methods , Humans , Markov Chains , Zebrafish/genetics
19.
Structure ; 13(3): 423-34, 2005 Mar.
Article in English | MEDLINE | ID: mdl-15766544

ABSTRACT

We obtained tomograms of isolated mammalian excitatory synapses by cryo-electron tomography. This method allows the investigation of biological material in the frozen-hydrated state, without staining, and can therefore provide reliable structural information at the molecular level. We developed an automated procedure for the segmentation of molecular complexes present in the synaptic cleft based on thresholding and connectivity, and calculated several morphological characteristics of these complexes. Extensive lateral connections along the synaptic cleft are shown to form a highly connected structure with a complex topology. Our results are essentially parameter-free, i.e., they do not depend on the choice of certain parameter values (such as threshold). In addition, the results are not sensitive to noise; the same conclusions can be drawn from the analysis of both nondenoised and denoised tomograms.


Subject(s)
Cryoelectron Microscopy , Synapses/ultrastructure , Animals , Mammals , Multiprotein Complexes/analysis , Multiprotein Complexes/ultrastructure , Protein Conformation , Synapses/chemistry
SELECTION OF CITATIONS
SEARCH DETAIL
...