Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
Add more filters










Publication year range
1.
Sci Data ; 10(1): 610, 2023 09 11.
Article in English | MEDLINE | ID: mdl-37696882

ABSTRACT

An in-depth insight into the chemistry and nature of the individual chemical bonds is essential for understanding materials. Bonding analysis is thus expected to provide important features for large-scale data analysis and machine learning of material properties. Such chemical bonding information can be computed using the LOBSTER software package, which post-processes modern density functional theory data by projecting the plane wave-based wave functions onto an atomic orbital basis. With the help of a fully automatic workflow, the VASP and LOBSTER software packages are used to generate the data. We then perform bonding analyses on 1520 compounds (insulators and semiconductors) and provide the results as a database. The projected densities of states and bonding indicators are benchmarked on standard density-functional theory computations and available heuristics, respectively. Lastly, we illustrate the predictive power of bonding descriptors by constructing a machine learning model for phononic properties, which shows an increase in prediction accuracies by 27% (mean absolute errors) compared to a benchmark model differing only by not relying on any quantum-chemical bonding features.

2.
Bioinformatics ; 39(6)2023 06 01.
Article in English | MEDLINE | ID: mdl-37294786

ABSTRACT

MOTIVATION: Deep learning has moved to the forefront of tandem mass spectrometry-driven proteomics and authentic prediction for peptide fragmentation is more feasible than ever. Still, at this point spectral prediction is mainly used to validate database search results or for confined search spaces. Fully predicted spectral libraries have not yet been efficiently adapted to large search space problems that often occur in metaproteomics or proteogenomics. RESULTS: In this study, we showcase a workflow that uses Prosit for spectral library predictions on two common metaproteomes and implement an indexing and search algorithm, Mistle, to efficiently identify experimental mass spectra within the library. Hence, the workflow emulates a classic protein sequence database search with protein digestion but builds a searchable index from spectral predictions as an in-between step. We compare Mistle to popular search engines, both on a spectral and database search level, and provide evidence that this approach is more accurate than a database search using MSFragger. Mistle outperforms other spectral library search engines in terms of run time and proves to be extremely memory efficient with a 4- to 22-fold decrease in RAM usage. This makes Mistle universally applicable to large search spaces, e.g. covering comprehensive sequence databases of diverse microbiomes. AVAILABILITY AND IMPLEMENTATION: Mistle is freely available on GitHub at https://github.com/BAMeScience/Mistle.


Subject(s)
Peptides , Software , Peptides/metabolism , Search Engine/methods , Proteomics/methods , Algorithms , Tandem Mass Spectrometry/methods , Databases, Protein , Peptide Library
3.
Bioprocess Biosyst Eng ; 45(5): 791-813, 2022 May.
Article in English | MEDLINE | ID: mdl-35303143

ABSTRACT

Phototrophic microorganisms that convert carbon dioxide are being explored for their capacity to solve different environmental issues and produce bioactive compounds for human therapeutics and as food additives. Full-scale phototrophic cultivation of microalgae and cyanobacteria can be done in open ponds or closed photobioreactor systems, which have a broad range of volumes. This review focuses on laboratory-scale photobioreactors and their different designs. Illuminated microtiter plates and microfluidic devices offer an option for automated high-throughput studies with microalgae. Illuminated shake flasks are used for simple uncontrolled batch studies. The application of illuminated bubble column reactors strongly emphasizes homogenous gas distribution, while illuminated flat plate bioreactors offer high and uniform light input. Illuminated stirred-tank bioreactors facilitate the application of very well-defined reaction conditions. Closed tubular photobioreactors as well as open photobioreactors like small-scale raceway ponds and thin-layer cascades are applied as scale-down models of the respective large-scale bioreactors. A few other less common designs such as illuminated plastic bags or aquarium tanks are also used mainly because of their relatively low cost, but up-scaling of these designs is challenging with additional light-driven issues. Finally, this review covers recommendations on the criteria for photobioreactor selection and operation while up-scaling of phototrophic bioprocesses with microalgae or cyanobacteria.


Subject(s)
Cyanobacteria , Microalgae , Biomass , Carbon Dioxide , Humans , Photobioreactors/microbiology
4.
NAR Genom Bioinform ; 3(4): lqab095, 2021 Dec.
Article in English | MEDLINE | ID: mdl-34729474

ABSTRACT

Recent efforts to measure epigenetic marks across a wide variety of different cell types and tissues provide insights into the cell type-specific regulatory landscape. We use these data to study whether there exists a correlate of epigenetic signals in the DNA sequence of enhancers and explore with computational methods to what degree such sequence patterns can be used to predict cell type-specific regulatory activity. By constructing classifiers that predict in which tissues enhancers are active, we are able to identify sequence features that might be recognized by the cell in order to regulate gene expression. While classification performances vary greatly between tissues, we show examples where our classifiers correctly predict tissue-specific regulation from sequence alone. We also show that many of the informative patterns indeed harbor transcription factor footprints.

5.
J Comput Biol ; 28(6): 560-569, 2021 06.
Article in English | MEDLINE | ID: mdl-33739865

ABSTRACT

High-dimensional statistics deals with statistical inference when the number of parameters or features p exceeds the number of observations n (i.e., p ≫ n ). In this case, the parameter space must be constrained either by regularization or by selecting a small subset of m ≤ n features. Feature selection through l 1 -regularization combines the benefits of both approaches and has proven to yield good results in practice. However, the functional relation between the regularization strength λ and the number of selected features m is difficult to determine. Hence, parameters are typically estimated for all possible regularization strengths λ . These so-called regularization paths can be expensive to compute and most solutions may not even be of interest to the problem at hand. As an alternative, an algorithm is proposed that determines the l 1 -regularization strength λ iteratively for a fixed m . The algorithm can be used to compute leapfrog regularization paths by subsequently increasing m .


Subject(s)
Computational Biology/methods , Logistic Models , Software
6.
J Comput Biol ; 27(4): 442-457, 2020 04.
Article in English | MEDLINE | ID: mdl-31891534

ABSTRACT

Genome segmentation methods are powerful tools to obtain cell type or tissue-specific genome-wide annotations and are frequently used to discover regulatory elements. However, traditional segmentation methods show low predictive accuracy and their data-driven annotations have some undesirable properties. As an alternative, we developed ModHMM, a highly modular genome segmentation method. Inspired by the supra-Bayesian approach, it incorporates predictions from a set of classifiers. This allows to compute genome segmentations by utilizing state-of-the-art methodology. We demonstrate the method on ENCODE data and show that it outperforms traditional segmentation methods not only in terms of predictive performance, but also in qualitative aspects. Therefore, ModHMM is a valuable alternative to study the epigenetic and regulatory landscape across and within cell types or tissues.


Subject(s)
Computational Biology/methods , Genome/genetics , Molecular Sequence Annotation/methods , Software , Algorithms , Humans , Regulatory Sequences, Nucleic Acid/genetics
7.
Genome Biol ; 20(1): 227, 2019 11 08.
Article in English | MEDLINE | ID: mdl-31699133

ABSTRACT

We present the software Condition-specific Regulatory Units Prediction (CRUP) to infer from epigenetic marks a list of regulatory units consisting of dynamically changing enhancers with their target genes. The workflow consists of a novel pre-trained enhancer predictor that can be reliably applied across cell types and species, solely based on histone modification ChIP-seq data. Enhancers are subsequently assigned to different conditions and correlated with gene expression to derive regulatory units. We thoroughly test and then apply CRUP to a rheumatoid arthritis model, identifying enhancer-gene pairs comprising known disease genes as well as new candidate genes.


Subject(s)
Enhancer Elements, Genetic , Software , Animals , Arthritis, Experimental/genetics , Arthritis, Rheumatoid/genetics , Chromatin Immunoprecipitation Sequencing , Histone Code , Mice
8.
BMC Bioinformatics ; 20(1): 157, 2019 Mar 27.
Article in English | MEDLINE | ID: mdl-30917778

ABSTRACT

BACKGROUND: Eukaryotic gene regulation is a complex process comprising the dynamic interaction of enhancers and promoters in order to activate gene expression. In recent years, research in regulatory genomics has contributed to a better understanding of the characteristics of promoter elements and for most sequenced model organism genomes there exist comprehensive and reliable promoter annotations. For enhancers, however, a reliable description of their characteristics and location has so far proven to be elusive. With the development of high-throughput methods such as ChIP-seq, large amounts of data about epigenetic conditions have become available, and many existing methods use the information on chromatin accessibility or histone modifications to train classifiers in order to segment the genome into functional groups such as enhancers and promoters. However, these methods often do not consider prior biological knowledge about enhancers such as their diverse lengths or molecular structure. RESULTS: We developed enhancer HMM (eHMM), a supervised hidden Markov model designed to learn the molecular structure of promoters and enhancers. Both consist of a central stretch of accessible DNA flanked by nucleosomes with distinct histone modification patterns. We evaluated the performance of eHMM within and across cell types and developmental stages and found that eHMM successfully predicts enhancers with high precision and recall comparable to state-of-the-art methods, and consistently outperforms those in terms of accuracy and resolution. CONCLUSIONS: eHMM predicts active enhancers based on data from chromatin accessibility assays and a minimal set of histone modification ChIP-seq experiments. In comparison to other 'black box' methods its parameters are easy to interpret. eHMM can be used as a stand-alone tool for enhancer prediction without the need for additional training or a tuning of parameters. The high spatial precision of enhancer predictions gives valuable targets for potential knockout experiments or downstream analyses such as motif search.


Subject(s)
Enhancer Elements, Genetic , Genome , Genomics/methods , Mammals/genetics , Animals , Base Sequence , DNA Methylation/genetics , Embryonic Stem Cells/metabolism , Histones/metabolism , Humans , Markov Chains , Mice , Models, Genetic , Promoter Regions, Genetic , Reproducibility of Results
9.
PLoS Genet ; 14(11): e1007793, 2018 11.
Article in English | MEDLINE | ID: mdl-30427832

ABSTRACT

The binding of transcription factors to short recognition sequences plays a pivotal role in controlling the expression of genes. The sequence and shape characteristics of binding sites influence DNA binding specificity and have also been implicated in modulating the activity of transcription factors downstream of binding. To quantitatively assess the transcriptional activity of tens of thousands of designed synthetic sites in parallel, we developed a synthetic version of STARR-seq (synSTARR-seq). We used the approach to systematically analyze how variations in the recognition sequence of the glucocorticoid receptor (GR) affect transcriptional regulation. Our approach resulted in the identification of a novel highly active functional GR binding sequence and revealed that sequence variation both within and flanking GR's core binding site can modulate GR activity without apparent changes in DNA binding affinity. Notably, we found that the sequence composition of variants with similar activity profiles was highly diverse. In contrast, groups of variants with similar activity profiles showed specific DNA shape characteristics indicating that DNA shape may be a better predictor of activity than DNA sequence. Finally, using single cell experiments with individual enhancer variants, we obtained clues indicating that the architecture of the response element can independently tune expression mean and cell-to cell variability in gene expression (noise). Together, our studies establish synSTARR as a powerful method to systematically study how DNA sequence and shape modulate transcriptional output and noise.


Subject(s)
DNA/genetics , Sequence Analysis, DNA/methods , Transcription, Genetic , Binding Sites/genetics , DNA/chemistry , DNA/metabolism , Enhancer Elements, Genetic , Gene Expression Regulation , Genes, Reporter , Genes, Synthetic , Genetic Variation , Humans , Nucleic Acid Conformation , Protein Conformation , Receptors, Glucocorticoid/chemistry , Receptors, Glucocorticoid/genetics , Receptors, Glucocorticoid/metabolism , Response Elements , Sequence Analysis, DNA/statistics & numerical data , Transcription Factors/genetics , Transcription Factors/metabolism
10.
Bioinformatics ; 30(17): i534-40, 2014 Sep 01.
Article in English | MEDLINE | ID: mdl-25161244

ABSTRACT

MOTIVATION: The construction of statistics for summarizing posterior samples returned by a Bayesian phylogenetic study has so far been hindered by the poor geometric insights available into the space of phylogenetic trees, and ad hoc methods such as the derivation of a consensus tree makeup for the ill-definition of the usual concepts of posterior mean, while bootstrap methods mitigate the absence of a sound concept of variance. Yielding satisfactory results with sufficiently concentrated posterior distributions, such methods fall short of providing a faithful summary of posterior distributions if the data do not offer compelling evidence for a single topology. RESULTS: Building upon previous work of Billera et al., summary statistics such as sample mean, median and variance are defined as the geometric median, Fréchet mean and variance, respectively. Their computation is enabled by recently published works, and embeds an algorithm for computing shortest paths in the space of trees. Studying the phylogeny of a set of plants, where several tree topologies occur in the posterior sample, the posterior mean balances correctly the contributions from the different topologies, where a consensus tree would be biased. Comparisons of the posterior mean, median and consensus trees with the ground truth using simulated data also reveals the benefits of a sound averaging method when reconstructing phylogenetic trees. AVAILABILITY AND IMPLEMENTATION: We provide two independent implementations of the algorithm for computing Fréchet means, geometric medians and variances in the space of phylogenetic trees. TFBayes: https://github.com/pbenner/tfbayes, TrAP: https://github.com/bacak/TrAP.


Subject(s)
Models, Statistical , Phylogeny , Algorithms , Bayes Theorem
11.
J Math Psychol ; 56(3): 179-195, 2012 Jun 01.
Article in English | MEDLINE | ID: mdl-22822269

ABSTRACT

We present a predictive account on adaptive sequential sampling of stimulus-response relations in psychophysical experiments. Our discussion applies to experimental situations with ordinal stimuli when there is only weak structural knowledge available such that parametric modeling is no option. By introducing a certain form of partial exchangeability, we successively develop a hierarchical Bayesian model based on a mixture of Pólya urn processes. Suitable utility measures permit us to optimize the overall experimental sampling process. We provide several measures that are either based on simple count statistics or more elaborate information theoretic quantities. The actual computation of information theoretic utilities often turns out to be infeasible. This is not the case with our sampling method, which relies on an efficient algorithm to compute exact solutions of our posterior predictions and utility measures. Finally, we demonstrate the advantages of our framework on a hypothetical sampling problem.

12.
Phys Rev Lett ; 105(7): 074102, 2010 Aug 13.
Article in English | MEDLINE | ID: mdl-20868048

ABSTRACT

In two-dimensional parameter spaces, nonlinear systems producing solutions of a fixed periodicity form islands of a characteristic shape, called "shrimp"-shaped domains (SSDs). In simulations of electronic circuits, SSDs of different periodicities were recently found to be connected along spirals. By means of a hardware realization of the simulations, we provide a first direct proof of the real-world existence of this phenomenon. An improved description establishes a close experiment-simulation correspondence, and a simplified circuit family demonstrates the homoclinic saddle-focus origin of the phenomenon.

SELECTION OF CITATIONS
SEARCH DETAIL
...