Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
Bioinformatics ; 38(21): 4927-4933, 2022 10 31.
Artigo em Inglês | MEDLINE | ID: mdl-36094347

RESUMO

MOTIVATION: A common experimental output in biomedical science is a list of genes implicated in a given biological process or disease. The gene lists resulting from a group of studies answering the same, or similar, questions can be combined by ranking aggregation methods to find a consensus or a more reliable answer. Evaluating a ranking aggregation method on a specific type of data before using it is required to support the reliability since the property of a dataset can influence the performance of an algorithm. Such evaluation on gene lists is usually based on a simulated database because of the lack of a known truth for real data. However, simulated datasets tend to be too small compared to experimental data and neglect key features, including heterogeneity of quality, relevance and the inclusion of unranked lists. RESULTS: In this study, a group of existing methods and their variations that are suitable for meta-analysis of gene lists are compared using simulated and real data. Simulated data were used to explore the performance of the aggregation methods as a function of emulating the common scenarios of real genomic data, with various heterogeneity of quality, noise level and a mix of unranked and ranked data using 20 000 possible entities. In addition to the evaluation with simulated data, a comparison using real genomic data on the SARS-CoV-2 virus, cancer (non-small cell lung cancer) and bacteria (macrophage apoptosis) was performed. We summarize the results of our evaluation in a simple flowchart to select a ranking aggregation method, and in an automated implementation using the meta-analysis by information content algorithm to infer heterogeneity of data quality across input datasets. AVAILABILITY AND IMPLEMENTATION: The code for simulated data generation and running edited version of algorithms: https://github.com/baillielab/comparison_of_RA_methods. Code to perform an optimal selection of methods based on the results of this review, using the MAIC algorithm to infer the characteristics of an input dataset, can be downloaded here: https://github.com/baillielab/maic. An online service for running MAIC: https://baillielab.net/maic. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
COVID-19 , Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Humanos , Algoritmos , Carcinoma Pulmonar de Células não Pequenas/genética , COVID-19/genética , Neoplasias Pulmonares/genética , Reprodutibilidade dos Testes , SARS-CoV-2 , Metanálise como Assunto
2.
Ann Surg ; 275(2): e453-e462, 2022 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-32487804

RESUMO

OBJECTIVE: Acute Pancreatitis (AP) is sudden onset pancreas inflammation that causes systemic injury with a wide and markedly heterogeneous range of clinical consequences. Here, we hypothesized that this observed clinical diversity corresponds to diversity in molecular subtypes that can be identified in clinical and multiomics data. SUMMARY BACKGROUND DATA: Observational cohort study. n = 57 for the discovery cohort (clinical, transcriptomics, proteomics, and metabolomics data) and n = 312 for the validation cohort (clinical and metabolomics data). METHODS: We integrated coincident transcriptomics, proteomics, and metabolomics data at serial time points between admission to hospital and up to 48 hours after recruitment from a cohort of patients presenting with acute pancreatitis. We systematically evaluated 4 different metrics for patient similarity using unbiased mathematical, biological, and clinical measures of internal and external validity.We next compared the AP molecular endotypes with previous descriptions of endotypes in a critically ill population with acute respiratory distress syndrome (ARDS). RESULTS: Our results identify 4 distinct and stable AP molecular endotypes. We validated our findings in a second independent cohort of patients with AP.We observed that 2 endotypes in AP recapitulate disease endotypes previously reported in ARDS. CONCLUSIONS: Our results show that molecular endotypes exist in AP and reflect biological patterns that are also present in ARDS, suggesting that generalizable patterns exist in diverse presentations of critical illness.


Assuntos
Pancreatite/classificação , Pancreatite/diagnóstico , Estudos de Coortes , Humanos , Metabolômica , Proteômica
3.
Syst Biol ; 66(1): e66-e82, 2017 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-28175922

RESUMO

Bayesian inference plays an important role in phylogenetics, evolutionary biology, and in many other branches of science. It provides a principled framework for dealing with uncertainty and quantifying how it changes in the light of new evidence. For many complex models and inference problems, however, only approximate quantitative answers are obtainable. Approximate Bayesian computation (ABC) refers to a family of algorithms for approximate inference that makes a minimal set of assumptions by only requiring that sampling from a model is possible. We explain here the fundamentals of ABC, review the classical algorithms, and highlight recent developments. [ABC; approximate Bayesian computation; Bayesian inference; likelihood-free inference; phylogenetics; simulator-based models; stochastic simulation models; tree-based models.]


Assuntos
Classificação , Modelos Biológicos , Filogenia , Algoritmos , Teorema de Bayes
4.
Neural Comput ; 29(11): 2887-2924, 2017 11.
Artigo em Inglês | MEDLINE | ID: mdl-28777730

RESUMO

The statistical dependencies that independent component analysis (ICA) cannot remove often provide rich information beyond the linear independent components. It would thus be very useful to estimate the dependency structure from data. While such models have been proposed, they have usually concentrated on higher-order correlations such as energy (square) correlations. Yet linear correlations are a fundamental and informative form of dependency in many real data sets. Linear correlations are usually completely removed by ICA and related methods so they can only be analyzed by developing new methods that explicitly allow for linearly correlated components. In this article, we propose a probabilistic model of linear nongaussian components that are allowed to have both linear and energy correlations. The precision matrix of the linear components is assumed to be randomly generated by a higher-order process and explicitly parameterized by a parameter matrix. The estimation of the parameter matrix is shown to be particularly simple because using score-matching (Hyvärinen, 2005 ), the objective function is a quadratic form. Using simulations with artificial data, we demonstrate that the proposed method improves the identifiability of nongaussian components by simultaneously learning their correlation structure. Applications on simulated complex cells with natural image input, as well as spectrograms of natural audio data, show that the method finds new kinds of dependencies between the components.

5.
Neural Comput ; 26(6): 1169-97, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-24684449

RESUMO

We propose a new method for detecting changes in Markov network structure between two sets of samples. Instead of naively fitting two Markov network models separately to the two data sets and figuring out their difference, we directly learn the network structure change by estimating the ratio of Markov network models. This density-ratio formulation naturally allows us to introduce sparsity in the network structure change, which highly contributes to enhancing interpretability. Furthermore, computation of the normalization term, a critical bottleneck of the naive approach, can be remarkably mitigated. We also give the dual formulation of the optimization problem, which further reduces the computation cost for large-scale Markov networks. Through experiments, we demonstrate the usefulness of our method.


Assuntos
Algoritmos , Simulação por Computador , Aprendizagem/fisiologia , Cadeias de Markov , Redes Neurais de Computação , Humanos
6.
Elife ; 132024 Jan 23.
Artigo em Inglês | MEDLINE | ID: mdl-38261382

RESUMO

Computational models are powerful tools for understanding human cognition and behavior. They let us express our theories clearly and precisely and offer predictions that can be subtle and often counter-intuitive. However, this same richness and ability to surprise means our scientific intuitions and traditional tools are ill-suited to designing experiments to test and compare these models. To avoid these pitfalls and realize the full potential of computational modeling, we require tools to design experiments that provide clear answers about what models explain human behavior and the auxiliary assumptions those models must make. Bayesian optimal experimental design (BOED) formalizes the search for optimal experimental designs by identifying experiments that are expected to yield informative data. In this work, we provide a tutorial on leveraging recent advances in BOED and machine learning to find optimal experiments for any kind of model that we can simulate data from, and show how by-products of this procedure allow for quick and straightforward evaluation of models and their parameters against real experimental data. As a case study, we consider theories of how people balance exploration and exploitation in multi-armed bandit decision-making tasks. We validate the presented approach using simulations and a real-world experiment. As compared to experimental designs commonly used in the literature, we show that our optimal designs more efficiently determine which of a set of models best account for individual human behavior, and more efficiently characterize behavior given a preferred model. At the same time, formalizing a scientific question such that it can be adequately addressed with BOED can be challenging and we discuss several potential caveats and pitfalls that practitioners should be aware of. We provide code to replicate all analyses as well as tutorial notebooks and pointers to adapt the methodology to different experimental settings.


Assuntos
Cognição , Aprendizado de Máquina , Humanos , Teorema de Bayes , Conscientização , Simulação por Computador
7.
J R Soc Interface ; 19(192): 20220153, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35858045

RESUMO

Estimating uncertainty in model predictions is a central task in quantitative biology. Biological models at the single-cell level are intrinsically stochastic and nonlinear, creating formidable challenges for their statistical estimation which inevitably has to rely on approximations that trade accuracy for tractability. Despite intensive interest, a sweet spot in this trade-off has not been found yet. We propose a flexible procedure for uncertainty quantification in a wide class of reaction networks describing stochastic gene expression including those with feedback. The method is based on creating a tractable coarse-graining of the model that is learned from simulations, a synthetic model, to approximate the likelihood function. We demonstrate that synthetic models can substantially outperform state-of-the-art approaches on a number of non-trivial systems and datasets, yielding an accurate and computationally viable solution to uncertainty quantification in stochastic models of gene expression.


Assuntos
Algoritmos , Modelos Biológicos , Expressão Gênica , Processos Estocásticos , Incerteza
8.
Nat Commun ; 11(1): 164, 2020 01 09.
Artigo em Inglês | MEDLINE | ID: mdl-31919360

RESUMO

Host dependency factors that are required for influenza A virus infection may serve as therapeutic targets as the virus is less likely to bypass them under drug-mediated selection pressure. Previous attempts to identify host factors have produced largely divergent results, with few overlapping hits across different studies. Here, we perform a genome-wide CRISPR/Cas9 screen and devise a new approach, meta-analysis by information content (MAIC) to systematically combine our results with prior evidence for influenza host factors. MAIC out-performs other meta-analysis methods when using our CRISPR screen as validation data. We validate the host factors, WDR7, CCDC115 and TMEM199, demonstrating that these genes are essential for viral entry and regulation of V-type ATPase assembly. We also find that CMTR1, a human mRNA cap methyltransferase, is required for efficient viral cap snatching and regulation of a cell autonomous immune response, and provides synergistic protection with the influenza endonuclease inhibitor Xofluza.


Assuntos
Predisposição Genética para Doença/genética , Interações Hospedeiro-Patógeno/genética , Vírus da Influenza A/patogenicidade , Influenza Humana/genética , Influenza Humana/patologia , Células A549 , Proteínas Adaptadoras de Transdução de Sinal/genética , Antivirais/farmacologia , Sistemas CRISPR-Cas , Linhagem Celular , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas/genética , Dibenzotiepinas , Estudo de Associação Genômica Ampla , Humanos , Proteínas de Membrana/genética , Metiltransferases/metabolismo , Morfolinas , Proteínas do Tecido Nervoso/genética , Oxazinas/farmacologia , Piridinas/farmacologia , Piridonas , Tiepinas/farmacologia , Triazinas/farmacologia , ATPases Vacuolares Próton-Translocadoras/metabolismo , Internalização do Vírus
9.
Wellcome Open Res ; 4: 14, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-37744419

RESUMO

Earlier research has suggested that approximate Bayesian computation (ABC) makes it possible to fit simulator-based intractable birth-death models to investigate communicable disease outbreak dynamics with accuracy comparable to that of exact Bayesian methods. However, recent findings have indicated that key parameters, such as the reproductive number R, may remain poorly identifiable with these models. Here we show that this identifiability issue can be resolved by taking into account disease-specific characteristics of the transmission process in closer detail. Using tuberculosis (TB) in the San Francisco Bay area as a case study, we consider a model that generates genotype data from a mixture of three stochastic processes, each with its own distinct dynamics and clear epidemiological interpretation.       We show that our model allows for accurate posterior inferences about outbreak dynamics from aggregated annual case data with genotype information. As a byproduct of the inference, the model provides an estimate of the infectious population size at the time the data were collected. The acquired estimate is approximately two orders of magnitude smaller than assumed in earlier related studies, and it is much better aligned with epidemiological knowledge about active TB prevalence. Similarly, the reproductive number R related to the primary underlying transmission process is estimated to be nearly three times larger than previous estimates, which has a substantial impact on the interpretation of the fitted outbreak model.

10.
Stat Comput ; 28(2): 411-425, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-31997856

RESUMO

Increasingly complex generative models are being used across disciplines as they allow for realistic characterization of data, but a common difficulty with them is the prohibitively large computational cost to evaluate the likelihood function and thus to perform likelihood-based statistical inference. A likelihood-free inference framework has emerged where the parameters are identified by finding values that yield simulated data resembling the observed data. While widely applicable, a major difficulty in this framework is how to measure the discrepancy between the simulated and observed data. Transforming the original problem into a problem of classifying the data into simulated versus observed, we find that classification accuracy can be used to assess the discrepancy. The complete arsenal of classification methods becomes thereby available for inference of intractable generative models. We validate our approach using theory and simulations for both point estimation and Bayesian inference, and demonstrate its use on real data by inferring an individual-based epidemiological model for bacterial infections in child care centers.

11.
Genetics ; 208(3): 1247-1260, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-29330348

RESUMO

The impact of epistasis on the evolution of multi-locus traits depends on recombination. While sexually reproducing eukaryotes recombine so frequently that epistasis between polymorphisms is not considered to play a large role in short-term adaptation, many bacteria also recombine, some to the degree that their populations are described as "panmictic" or "freely recombining." However, whether this recombination is sufficient to limit the ability of selection to act on epistatic contributions to fitness is unknown. We quantify homologous recombination in five bacterial pathogens and use these parameter estimates in a multilocus model of bacterial evolution with additive and epistatic effects. We find that even for highly recombining species (e.g., Streptococcus pneumoniae or Helicobacter pylori), selection on weak interactions between distant mutations is nearly as efficient as for an asexual species, likely because homologous recombination typically transfers only short segments. However, for strong epistasis, bacterial recombination accelerates selection, with the dynamics dependent on the amount of recombination and the number of loci. Epistasis may thus play an important role in both the short- and long-term adaptive evolution of bacteria, and, unlike in eukaryotes, is not limited to strong effect sizes, closely linked loci, or other conditions that limit the impact of recombination.


Assuntos
Adaptação Biológica/genética , Bactérias/genética , Epistasia Genética , Recombinação Genética , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Seleção Genética
12.
Nat Ecol Evol ; 1(12): 1950-1960, 2017 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-29038424

RESUMO

Many bacterial species are composed of multiple lineages distinguished by extensive variation in gene content. These often cocirculate in the same habitat, but the evolutionary and ecological processes that shape these complex populations are poorly understood. Addressing these questions is particularly important for Streptococcus pneumoniae, a nasopharyngeal commensal and respiratory pathogen, because the changes in population structure associated with the recent introduction of partial-coverage vaccines have substantially reduced pneumococcal disease. Here we show that pneumococcal lineages from multiple populations each have a distinct combination of intermediate-frequency genes. Functional analysis suggested that these loci may be subject to negative frequency-dependent selection (NFDS) through interactions with other bacteria, hosts or mobile elements. Correspondingly, these genes had similar frequencies in four populations with dissimilar lineage compositions. These frequencies were maintained following substantial alterations in lineage prevalences once vaccination programmes began. Fitting a multilocus NFDS model of post-vaccine population dynamics to three genomic datasets using Approximate Bayesian Computation generated reproducible estimates of the influence of NFDS on pneumococcal evolution, the strength of which varied between loci. Simulations replicated the stable frequency of lineages unperturbed by vaccination, patterns of serotype switching and clonal replacement. This framework highlights how bacterial ecology affects the impact of clinical interventions.


Assuntos
Proteínas de Bactérias/genética , Seleção Genética , Streptococcus pneumoniae/genética , Proteínas de Bactérias/metabolismo , Vacinas Pneumocócicas/imunologia , Dinâmica Populacional , Streptococcus pneumoniae/imunologia
13.
Genetics ; 202(3): 911-8, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26739450

RESUMO

Understanding the transmission dynamics of infectious diseases is important for both biological research and public health applications. It has been widely demonstrated that statistical modeling provides a firm basis for inferring relevant epidemiological quantities from incidence and molecular data. However, the complexity of transmission dynamic models presents two challenges: (1) the likelihood function of the models is generally not computable, and computationally intensive simulation-based inference methods need to be employed, and (2) the model may not be fully identifiable from the available data. While the first difficulty can be tackled by computational and algorithmic advances, the second obstacle is more fundamental. Identifiability issues may lead to inferences that are driven more by prior assumptions than by the data themselves. We consider a popular and relatively simple yet analytically intractable model for the spread of tuberculosis based on classical IS6110 fingerprinting data. We report on the identifiability of the model, also presenting some methodological advances regarding the inference. Using likelihood approximations, we show that the reproductive value cannot be identified from the data available and that the posterior distributions obtained in previous work have likely been substantially dominated by the assumed prior distribution. Further, we show that the inferences are influenced by the assumed infectious population size, which generally has been kept fixed in previous work. We demonstrate that the infectious population size can be inferred if the remaining epidemiological parameters are already known with sufficient precision.


Assuntos
Doenças Transmissíveis/transmissão , Modelos Estatísticos , Algoritmos , Alelos , Teorema de Bayes , Simulação por Computador , Humanos , Funções Verossimilhança , Taxa de Mutação , Mycobacterium tuberculosis/genética , Mycobacterium tuberculosis/patogenicidade , Densidade Demográfica , Tuberculose/transmissão
14.
Microb Genom ; 1(5): e000038, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28348822

RESUMO

BACKGROUND: Population samples show bacterial genomes can be divided into a core of ubiquitous genes and accessory genes that are present in a fraction of isolates. The ecological significance of this variation in gene content remains unclear. However, microbiologists agree that a bacterial species should be 'genomically coherent', even though there is no consensus on how this should be determined. RESULTS: We use a parsimonious model combining diversification in both the core and accessory genome, including mutation, homologous recombination (HR) and horizontal gene transfer (HGT) introducing new loci, to produce a population of interacting clusters of strains with varying genome content. New loci introduced by HGT may then be transferred on by HR. The model fits well to a systematic population sample of 616 pneumococcal genomes, capturing the major features of the population structure with parameter values that agree well with empirical estimates. CONCLUSIONS: The model does not include explicit selection on individual genes, suggesting that crude comparisons of gene content may be a poor predictor of ecological function. We identify a clearly divergent subpopulation of pneumococci that are inconsistent with the model and may be considered genomically incoherent with the rest of the population. These strains have a distinct disease tropism and may be rationally defined as a separate species. We also find deviations from the model that may be explained by recent population bottlenecks or spatial structure.

15.
PLoS One ; 9(2): e86481, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24533049

RESUMO

Independent component and canonical correlation analysis are two general-purpose statistical methods with wide applicability. In neuroscience, independent component analysis of chromatic natural images explains the spatio-chromatic structure of primary cortical receptive fields in terms of properties of the visual environment. Canonical correlation analysis explains similarly chromatic adaptation to different illuminations. But, as we show in this paper, neither of the two methods generalizes well to explain both spatio-chromatic processing and adaptation at the same time. We propose a statistical method which combines the desirable properties of independent component and canonical correlation analysis: It finds independent components in each data set which, across the two data sets, are related to each other via linear or higher-order correlations. The new method is as widely applicable as canonical correlation analysis, and also to more than two data sets. We call it higher-order canonical correlation analysis. When applied to chromatic natural images, we found that it provides a single (unified) statistical framework which accounts for both spatio-chromatic processing and adaptation. Filters with spatio-chromatic tuning properties as in the primary visual cortex emerged and corresponding-colors psychophysics was reproduced reasonably well. We used the new method to make a theory-driven testable prediction on how the neural response to colored patterns should change when the illumination changes. We predict shifts in the responses which are comparable to the shifts reported for chromatic contrast habituation.


Assuntos
Cor , Processamento de Imagem Assistida por Computador , Modelos Estatísticos , Algoritmos , Inteligência Artificial , Percepção de Cores/fisiologia , Simulação por Computador , Humanos , Luz , Neurociências/métodos , Estimulação Luminosa/métodos , Probabilidade , Psicofísica , Córtex Visual/fisiologia
16.
J Physiol Paris ; 107(5): 369-98, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-23369823

RESUMO

An important property of visual systems is to be simultaneously both selective to specific patterns found in the sensory input and invariant to possible variations. Selectivity and invariance (tolerance) are opposing requirements. It has been suggested that they could be joined by iterating a sequence of elementary selectivity and tolerance computations. It is, however, unknown what should be selected or tolerated at each level of the hierarchy. We approach this issue by learning the computations from natural images. We propose and estimate a probabilistic model of natural images that consists of three processing layers. Two natural image data sets are considered: image patches, and complete visual scenes downsampled to the size of small patches. For both data sets, we find that in the first two layers, simple and complex cell-like computations are performed. In the third layer, we mainly find selectivity to longer contours; for patch data, we further find some selectivity to texture, while for the downsampled complete scenes, some selectivity to curvature is observed.


Assuntos
Modelos Estatísticos , Reconhecimento Visual de Modelos , Estimulação Luminosa , Bases de Dados Factuais , Humanos , Reconhecimento Visual de Modelos/fisiologia , Percepção Visual/fisiologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA