Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 65.954
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Annu Rev Cell Dev Biol ; 30: 23-37, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25000992

RESUMO

The physicist Ernest Rutherford said, "If your experiment needs statistics, you ought to have done a better experiment." Although this aphorism remains true for much of today's research in cell biology, a basic understanding of statistics can be useful to cell biologists to help in monitoring the conduct of their experiments, in interpreting the results, in presenting them in publications, and when critically evaluating research by others. However, training in statistics is often focused on the sophisticated needs of clinical researchers, psychologists, and epidemiologists, whose conclusions depend wholly on statistics, rather than the practical needs of cell biologists, whose experiments often provide evidence that is not statistical in nature. This review describes some of the basic statistical principles that may be of use to experimental biologists, but it does not cover the sophisticated statistics needed for papers that contain evidence of no other kind.


Assuntos
Biologia Celular , Estatística como Assunto , Causalidade , Interpretação Estatística de Dados , Probabilidade , Reprodutibilidade dos Testes , Projetos de Pesquisa , Distribuições Estatísticas
2.
Nat Rev Genet ; 22(7): 459-476, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-33875884

RESUMO

Single-cell omics is transforming our understanding of cell biology and disease, yet the systems-level analysis and interpretation of single-cell data faces many challenges. In this Perspective, we describe the impact that fundamental concepts from statistical mechanics, notably entropy, stochastic processes and critical phenomena, are having on single-cell data analysis. We further advocate the need for more bottom-up modelling of single-cell data and to embrace a statistical mechanics analysis paradigm to help attain a deeper understanding of single-cell systems biology.


Assuntos
Biologia Celular , Interpretação Estatística de Dados , Análise de Célula Única , Animais , Biologia Computacional , Entropia , Humanos , Modelos Estatísticos , RNA-Seq , Processos Estocásticos
3.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38557674

RESUMO

Quality control in quantitative proteomics is a persistent challenge, particularly in identifying and managing outliers. Unsupervised learning models, which rely on data structure rather than predefined labels, offer potential solutions. However, without clear labels, their effectiveness might be compromised. Single models are susceptible to the randomness of parameters and initialization, which can result in a high rate of false positives. Ensemble models, on the other hand, have shown capabilities in effectively mitigating the impacts of such randomness and assisting in accurately detecting true outliers. Therefore, we introduced SEAOP, a Python toolbox that utilizes an ensemble mechanism by integrating multi-round data management and a statistics-based decision pipeline with multiple models. Specifically, SEAOP uses multi-round resampling to create diverse sub-data spaces and employs outlier detection methods to identify candidate outliers in each space. Candidates are then aggregated as confirmed outliers via a chi-square test, adhering to a 95% confidence level, to ensure the precision of the unsupervised approaches. Additionally, SEAOP introduces a visualization strategy, specifically designed to intuitively and effectively display the distribution of both outlier and non-outlier samples. Optimal hyperparameter models of SEAOP for outlier detection were identified by using a gradient-simulated standard dataset and Mann-Kendall trend test. The performance of the SEAOP toolbox was evaluated using three experimental datasets, confirming its reliability and accuracy in handling quantitative proteomics.


Assuntos
Gerenciamento de Dados , Proteômica , Reprodutibilidade dos Testes , Controle de Qualidade , Interpretação Estatística de Dados
4.
Nature ; 577(7791): 526-530, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31915383

RESUMO

Changes in behaviour resulting from environmental influences, development and learning1-5 are commonly quantified on the basis of a few hand-picked features2-4,6,7 (for example, the average pitch of acoustic vocalizations3), assuming discrete classes of behaviours (such as distinct vocal syllables)2,3,8-10. However, such methods generalize poorly across different behaviours and model systems and may miss important components of change. Here we present a more-general account of behavioural change that is based on nearest-neighbour statistics11-13, and apply it to song development in a songbird, the zebra finch3. First, we introduce the concept of 'repertoire dating', whereby each rendition of a behaviour (for example, each vocalization) is assigned a repertoire time, reflecting when similar renditions were typical in the behavioural repertoire. Repertoire time isolates the components of vocal variability that are congruent with long-term changes due to vocal learning and development, and stratifies the behavioural repertoire into 'regressions', 'anticipations' and 'typical renditions'. Second, we obtain a holistic, yet low-dimensional, description of vocal change in terms of a stratified 'behavioural trajectory', revealing numerous previously unrecognized components of behavioural change on fast and slow timescales, as well as distinct patterns of overnight consolidation1,2,4,14,15 across the behavioral repertoire. We find that diurnal changes in regressions undergo only weak consolidation, whereas anticipations and typical renditions consolidate fully. Because of its generality, our nonparametric description of how behaviour evolves relative to itself-rather than to a potentially arbitrary, experimenter-defined goal2,3,14,16-appears well suited for comparing learning and change across behaviours and species17,18, as well as biological and artificial systems5.


Assuntos
Tentilhões/fisiologia , Aprendizagem/fisiologia , Modelos Neurológicos , Desempenho Psicomotor/fisiologia , Vocalização Animal/fisiologia , Acústica , Animais , Simulação por Computador , Interpretação Estatística de Dados , Masculino , Fatores de Tempo
5.
Nucleic Acids Res ; 52(D1): D203-D212, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37811871

RESUMO

With recent progress in mapping N7-methylguanosine (m7G) RNA methylation sites, tens of thousands of experimentally validated m7G sites have been discovered in various species, shedding light on the significant role of m7G modification in regulating numerous biological processes including disease pathogenesis. An integrated resource that enables the sharing, annotation and customized analysis of m7G data will greatly facilitate m7G studies under various physiological contexts. We previously developed the m7GHub database to host mRNA m7G sites identified in the human transcriptome. Here, we present m7GHub v.2.0, an updated resource for a comprehensive collection of m7G modifications in various types of RNA across multiple species: an m7GDB database containing 430 898 putative m7G sites identified in 23 species, collected from both widely applied next-generation sequencing (NGS) and the emerging Oxford Nanopore direct RNA sequencing (ONT) techniques; an m7GDiseaseDB hosting 156 206 m7G-associated variants (involving addition or removal of an m7G site), including 3238 disease-relevant m7G-SNPs that may function through epitranscriptome disturbance; and two enhanced analysis modules to perform interactive analyses on the collections of m7G sites (m7GFinder) and functional variants (m7GSNPer). We expect that m7Ghub v.2.0 should serve as a valuable centralized resource for studying m7G modification. It is freely accessible at: www.rnamd.org/m7GHub2.


Assuntos
Bases de Dados de Ácidos Nucleicos , Sequenciamento de Nucleotídeos em Larga Escala , Processamento Pós-Transcricional do RNA , Humanos , Interpretação Estatística de Dados , Guanosina/genética
6.
Biostatistics ; 25(3): 736-753, 2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-38123487

RESUMO

Weighting is a general and often-used method for statistical adjustment. Weighting has two objectives: first, to balance covariate distributions, and second, to ensure that the weights have minimal dispersion and thus produce a more stable estimator. A recent, increasingly common approach directly optimizes the weights toward these two objectives. However, this approach has not yet been feasible in large-scale datasets when investigators wish to flexibly balance general basis functions in an extended feature space. To address this practical problem, we describe a scalable and flexible approach to weighting that integrates a basis expansion in a reproducing kernel Hilbert space with state-of-the-art convex optimization techniques. Specifically, we use the rank-restricted Nyström method to efficiently compute a kernel basis for balancing in nearly linear time and space, and then use the specialized first-order alternating direction method of multipliers to rapidly find the optimal weights. In an extensive simulation study, we provide new insights into the performance of weighting estimators in large datasets, showing that the proposed approach substantially outperforms others in terms of accuracy and speed. Finally, we use this weighting approach to conduct a national study of the relationship between hospital profit status and heart attack outcomes in a comprehensive dataset of 1.27 million patients. We find that for-profit hospitals use interventional cardiology to treat heart attacks at similar rates as other hospitals but have higher mortality and readmission rates.


Assuntos
Infarto do Miocárdio , Humanos , Interpretação Estatística de Dados , Estudos Observacionais como Assunto/métodos , Modelos Estatísticos
7.
Biostatistics ; 25(3): 666-680, 2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-38141227

RESUMO

With rapid development of techniques to measure brain activity and structure, statistical methods for analyzing modern brain-imaging data play an important role in the advancement of science. Imaging data that measure brain function are usually multivariate high-density longitudinal data and are heterogeneous across both imaging sources and subjects, which lead to various statistical and computational challenges. In this article, we propose a group-based method to cluster a collection of multivariate high-density longitudinal data via a Bayesian mixture of smoothing splines. Our method assumes each multivariate high-density longitudinal trajectory is a mixture of multiple components with different mixing weights. Time-independent covariates are assumed to be associated with the mixture components and are incorporated via logistic weights of a mixture-of-experts model. We formulate this approach under a fully Bayesian framework using Gibbs sampling where the number of components is selected based on a deviance information criterion. The proposed method is compared to existing methods via simulation studies and is applied to a study on functional near-infrared spectroscopy, which aims to understand infant emotional reactivity and recovery from stress. The results reveal distinct patterns of brain activity, as well as associations between these patterns and selected covariates.


Assuntos
Teorema de Bayes , Humanos , Estudos Longitudinais , Encéfalo/fisiologia , Encéfalo/diagnóstico por imagem , Espectroscopia de Luz Próxima ao Infravermelho/métodos , Interpretação Estatística de Dados , Modelos Estatísticos , Lactente , Análise Multivariada , Bioestatística/métodos
8.
PLoS Comput Biol ; 20(6): e1012184, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38885265

RESUMO

Amortized simulation-based neural posterior estimation provides a novel machine learning based approach for solving parameter estimation problems. It has been shown to be computationally efficient and able to handle complex models and data sets. Yet, the available approach cannot handle the in experimental studies ubiquitous case of missing data, and might provide incorrect posterior estimates. In this work, we discuss various ways of encoding missing data and integrate them into the training and inference process. We implement the approaches in the BayesFlow methodology, an amortized estimation framework based on invertible neural networks, and evaluate their performance on multiple test problems. We find that an approach in which the data vector is augmented with binary indicators of presence or absence of values performs the most robustly. Indeed, it improved the performance also for the simpler problem of data sets with variable length. Accordingly, we demonstrate that amortized simulation-based inference approaches are applicable even with missing data, and we provide a guideline for their handling, which is relevant for a broad spectrum of applications.


Assuntos
Biologia Computacional , Simulação por Computador , Redes Neurais de Computação , Biologia Computacional/métodos , Humanos , Aprendizado de Máquina , Teorema de Bayes , Algoritmos , Interpretação Estatística de Dados
9.
BMC Bioinformatics ; 25(1): 67, 2024 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-38347472

RESUMO

BACKGROUND: Recording and analyzing microbial growth is a routine task in the life sciences. Microplate readers that record dozens to hundreds of growth curves simultaneously are increasingly used for this task raising the demand for their rapid and reliable analysis. RESULTS: Here, we present Dashing Growth Curves, an interactive web application ( http://dashing-growth-curves.ethz.ch/ ) that enables researchers to quickly visualize and analyze growth curves without the requirement for coding knowledge and independent of operating system. Growth curves can be fitted with parametric and non-parametric models or manually. The application extracts maximum growth rates as well as other features such as lag time, length of exponential growth phase and maximum population size among others. Furthermore, Dashing Growth Curves automatically groups replicate samples and generates downloadable summary plots for of all growth parameters. CONCLUSIONS: Dashing Growth Curves is an open-source web application that reduces the time required to analyze microbial growth curves from hours to minutes.


Assuntos
Software , Interpretação Estatística de Dados
10.
BMC Bioinformatics ; 25(1): 210, 2024 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-38867185

RESUMO

BACKGROUND: In the realm of biomedical research, the growing volume, diversity and quantity of data has escalated the demand for statistical analysis as it is indispensable for synthesizing, interpreting, and publishing data. Hence the need for accessible analysis tools drastically increased. StatiCAL emerges as a user-friendly solution, enabling researchers to conduct basic analyses without necessitating extensive programming expertise. RESULTS: StatiCAL includes divers functionalities: data management, visualization on variables and statistical analysis. Data management functionalities allow the user to freely add or remove variables, to select sub-population and to visualise selected data to better perform the analysis. With this tool, users can freely perform statistical analysis such as descriptive, graphical, univariate, and multivariate analysis. All of this can be performed without the need to learn R coding as the software is a graphical user interface where all the action can be performed by clicking a button. CONCLUSIONS: StatiCAL represents a valuable contribution to the field of biomedical research. By being open-access and by providing an intuitive interface with robust features, StatiCAL allow researchers to gain autonomy in conducting their projects.


Assuntos
Pesquisa Biomédica , Software , Interface Usuário-Computador , Biologia Computacional/métodos , Gerenciamento de Dados/métodos , Interpretação Estatística de Dados
11.
Am J Epidemiol ; 193(7): 1019-1030, 2024 07 08.
Artigo em Inglês | MEDLINE | ID: mdl-38400653

RESUMO

Targeted maximum likelihood estimation (TMLE) is increasingly used for doubly robust causal inference, but how missing data should be handled when using TMLE with data-adaptive approaches is unclear. Based on data (1992-1998) from the Victorian Adolescent Health Cohort Study, we conducted a simulation study to evaluate 8 missing-data methods in this context: complete-case analysis, extended TMLE incorporating an outcome-missingness model, the missing covariate missing indicator method, and 5 multiple imputation (MI) approaches using parametric or machine-learning models. We considered 6 scenarios that varied in terms of exposure/outcome generation models (presence of confounder-confounder interactions) and missingness mechanisms (whether outcome influenced missingness in other variables and presence of interaction/nonlinear terms in missingness models). Complete-case analysis and extended TMLE had small biases when outcome did not influence missingness in other variables. Parametric MI without interactions had large bias when exposure/outcome generation models included interactions. Parametric MI including interactions performed best in bias and variance reduction across all settings, except when missingness models included a nonlinear term. When choosing a method for handling missing data in the context of TMLE, researchers must consider the missingness mechanism and, for MI, compatibility with the analysis method. In many settings, a parametric MI approach that incorporates interactions and nonlinearities is expected to perform well.


Assuntos
Causalidade , Humanos , Funções Verossimilhança , Adolescente , Interpretação Estatística de Dados , Viés , Modelos Estatísticos , Simulação por Computador
12.
Am J Hum Genet ; 108(7): 1270-1282, 2021 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-34157305

RESUMO

Publicly available genetic summary data have high utility in research and the clinic, including prioritizing putative causal variants, polygenic scoring, and leveraging common controls. However, summarizing individual-level data can mask population structure, resulting in confounding, reduced power, and incorrect prioritization of putative causal variants. This limits the utility of publicly available data, especially for understudied or admixed populations where additional research and resources are most needed. Although several methods exist to estimate ancestry in individual-level data, methods to estimate ancestry proportions in summary data are lacking. Here, we present Summix, a method to efficiently deconvolute ancestry and provide ancestry-adjusted allele frequencies (AFs) from summary data. Using continental reference ancestry, African (AFR), non-Finnish European (EUR), East Asian (EAS), Indigenous American (IAM), South Asian (SAS), we obtain accurate and precise estimates (within 0.1%) for all simulation scenarios. We apply Summix to gnomAD v.2.1 exome and genome groups and subgroups, finding heterogeneous continental ancestry for several groups, including African/African American (∼84% AFR, ∼14% EUR) and American/Latinx (∼4% AFR, ∼5% EAS, ∼43% EUR, ∼46% IAM). Compared to the unadjusted gnomAD AFs, Summix's ancestry-adjusted AFs more closely match respective African and Latinx reference samples. Even on modern, dense panels of summary statistics, Summix yields results in seconds, allowing for estimation of confidence intervals via block bootstrap. Given an accompanying R package, Summix increases the utility and equity of public genetic resources, empowering novel research opportunities.


Assuntos
Interpretação Estatística de Dados , Metagenômica/métodos , Linhagem , Grupos Raciais/genética , Alelos , Simulação por Computador , Frequência do Gene , Humanos , Padrões de Herança , Software
13.
Am J Hum Genet ; 108(10): 1880-1890, 2021 10 07.
Artigo em Inglês | MEDLINE | ID: mdl-34478634

RESUMO

Haplotype phasing is the estimation of haplotypes from genotype data. We present a fast, accurate, and memory-efficient haplotype phasing method that scales to large-scale SNP array and sequence data. The method uses marker windowing and composite reference haplotypes to reduce memory usage and computation time. It incorporates a progressive phasing algorithm that identifies confidently phased heterozygotes in each iteration and fixes the phase of these heterozygotes in subsequent iterations. For data with many low-frequency variants, such as whole-genome sequence data, the method employs a two-stage phasing algorithm that phases high-frequency markers via progressive phasing in the first stage and phases low-frequency markers via genotype imputation in the second stage. This haplotype phasing method is implemented in the open-source Beagle 5.2 software package. We compare Beagle 5.2 and SHAPEIT 4.2.1 by using expanding subsets of 485,301 UK Biobank samples and 38,387 TOPMed samples. Both methods have very similar accuracy and computation time for UK Biobank SNP array data. However, for TOPMed sequence data, Beagle is more than 20 times faster than SHAPEIT, achieves similar accuracy, and scales to larger sample sizes.


Assuntos
Asma/genética , Fibrilação Atrial/genética , Interpretação Estatística de Dados , Genoma Humano , Haplótipos , Polimorfismo de Nucleotídeo Único , Software , Algoritmos , Feminino , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Masculino
14.
Am J Hum Genet ; 108(4): 669-681, 2021 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-33730541

RESUMO

Tests of association between a phenotype and a set of genes in a biological pathway can provide insights into the genetic architecture of complex phenotypes beyond those obtained from single-variant or single-gene association analysis. However, most existing gene set tests have limited power to detect gene set-phenotype association when a small fraction of the genes are associated with the phenotype and cannot identify the potentially "active" genes that might drive a gene set-based association. To address these issues, we have developed Gene set analysis Association Using Sparse Signals (GAUSS), a method for gene set association analysis that requires only GWAS summary statistics. For each significantly associated gene set, GAUSS identifies the subset of genes that have the maximal evidence of association and can best account for the gene set association. Using pre-computed correlation structure among test statistics from a reference panel, our p value calculation is substantially faster than other permutation- or simulation-based approaches. In simulations with varying proportions of causal genes, we find that GAUSS effectively controls type 1 error rate and has greater power than several existing methods, particularly when a small proportion of genes account for the gene set signal. Using GAUSS, we analyzed UK Biobank GWAS summary statistics for 10,679 gene sets and 1,403 binary phenotypes. We found that GAUSS is scalable and identified 13,466 phenotype and gene set association pairs. Within these gene sets, we identify an average of 17.2 (max = 405) genes that underlie these gene set associations.


Assuntos
Bancos de Espécimes Biológicos , Interpretação Estatística de Dados , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Transportadores de Cassetes de Ligação de ATP/genética , Simulação por Computador , Expressão Gênica/genética , Humanos , Projetos de Pesquisa , Fatores de Tempo , Reino Unido , Navegador
15.
Ann Surg ; 279(6): 907-912, 2024 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-38390761

RESUMO

OBJECTIVE: To determine the prevalence of clinical significance reporting in contemporary comparative effectiveness research (CER). BACKGROUND: In CER, a statistically significant difference between study groups may or may not be clinically significant. Misinterpreting statistically significant results could lead to inappropriate recommendations that increase health care costs and treatment toxicity. METHODS: CER studies from 2022 issues of the Annals of Surgery , Journal of the American Medical Association , Journal of Clinical Oncology , Journal of Surgical Research , and Journal of the American College of Surgeons were systematically reviewed by 2 different investigators. The primary outcome of interest was whether the authors specified what they considered to be a clinically significant difference in the "Methods." RESULTS: Of 307 reviewed studies, 162 were clinical trials and 145 were observational studies. Authors specified what they considered to be a clinically significant difference in 26 studies (8.5%). Clinical significance was defined using clinically validated standards in 25 studies and subjectively in 1 study. Seven studies (2.3%) recommended a change in clinical decision-making, all with primary outcomes achieving statistical significance. Five (71.4%) of these studies did not have clinical significance defined in their methods. In randomized controlled trials with statistically significant results, sample size was inversely correlated with effect size ( r = -0.30, P = 0.038). CONCLUSIONS: In contemporary CER, most authors do not specify what they consider to be a clinically significant difference in study outcome. Most studies recommending a change in clinical decision-making did so based on statistical significance alone, and clinical significance was usually defined with clinically validated standards.


Assuntos
Pesquisa Comparativa da Efetividade , Humanos , Interpretação Estatística de Dados , Projetos de Pesquisa , Ensaios Clínicos como Assunto
16.
Am J Physiol Heart Circ Physiol ; 326(6): H1420-H1423, 2024 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-38700473

RESUMO

The use of both sexes or genders should be considered in experimental design, analysis, and reporting. Since there is no requirement to double the sample size or to have sufficient power to study sex differences, challenges for the statistical analysis can arise. In this article, we focus on the topics of statistical power and ways to increase this power. We also discuss the choice of an appropriate design and statistical method and include a separate section on equivalence tests needed to show the absence of a relevant difference.


Assuntos
Projetos de Pesquisa , Animais , Feminino , Humanos , Masculino , Interpretação Estatística de Dados , Modelos Estatísticos , Tamanho da Amostra , Fatores Sexuais
17.
Biostatistics ; 24(2): 406-424, 2023 04 14.
Artigo em Inglês | MEDLINE | ID: mdl-34269371

RESUMO

It is becoming increasingly common for researchers to consider incorporating external information from large studies to improve the accuracy of statistical inference instead of relying on a modestly sized data set collected internally. With some new predictors only available internally, we aim to build improved regression models based on individual-level data from an "internal" study while incorporating summary-level information from "external" models. We propose a meta-analysis framework along with two weighted estimators as the composite of empirical Bayes estimators, which combines the estimates from different external models. The proposed framework is flexible and robust in the ways that (i) it is capable of incorporating external models that use a slightly different set of covariates; (ii) it is able to identify the most relevant external information and diminish the influence of information that is less compatible with the internal data; and (iii) it nicely balances the bias-variance trade-off while preserving the most efficiency gain. The proposed estimators are more efficient than the naïve analysis of the internal data and other naïve combinations of external estimators.


Assuntos
Modelos Estatísticos , Humanos , Teorema de Bayes , Interpretação Estatística de Dados , Viés
18.
Am Heart J ; 274: 23-31, 2024 08.
Artigo em Inglês | MEDLINE | ID: mdl-38701962

RESUMO

Clinicians often suspect that a treatment effect can vary across individuals. However, they usually lack "evidence-based" guidance regarding potential heterogeneity of treatment effects (HTE). Potentially actionable HTE is rarely discovered in clinical trials and is widely believed (or rationalized) by researchers to be rare. Conventional statistical methods to test for possible HTE are extremely conservative and tend to reinforce this belief. In truth, though, there is no realistic way to know whether a common, or average, effect estimated from a clinical trial is relevant for all, or even most, patients. This absence of evidence, misinterpreted as evidence of absence, may be resulting in sub-optimal treatment for many individuals. We first summarize the historical context in which current statistical methods for randomized controlled trials (RCTs) were developed, focusing on the conceptual and technical limitations that shaped, and restricted, these methods. In particular, we explain how the common-effect assumption came to be virtually unchallenged. Second, we propose a simple graphical method for exploratory data analysis that can provide useful visual evidence of possible HTE. The basic approach is to display the complete distribution of outcome data rather than relying uncritically on simple summary statistics. Modern graphical methods, unavailable when statistical methods were initially formulated a century ago, now render fine-grained interrogation of the data feasible. We propose comparing observed treatment-group data to "pseudo data" engineered to mimic that which would be expected under a particular HTE model, such as the common-effect model. A clear discrepancy between the distributions of the common-effect pseudo data and the actual treatment-effect data provides prima facie evidence of HTE to motivate additional confirmatory investigation. Artificial data are used to illustrate implications of ignoring heterogeneity in practice and how the graphical method can be useful.


Assuntos
Ensaios Clínicos Controlados Aleatórios como Assunto , Humanos , Ensaios Clínicos Controlados Aleatórios como Assunto/métodos , Medicina Baseada em Evidências/métodos , Resultado do Tratamento , Interpretação Estatística de Dados , Heterogeneidade da Eficácia do Tratamento
19.
Bioinformatics ; 39(12)2023 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-38039147

RESUMO

MOTIVATION: statistics from genome-wide association studies enable many valuable downstream analyses that are more efficient than individual-level data analysis while also reducing privacy concerns. As growing sample sizes enable better-powered analysis of gene-environment interactions, there is a need for gene-environment interaction-specific methods that manipulate and use summary statistics. RESULTS: We introduce two tools to facilitate such analysis, with a focus on statistical models containing multiple gene-exposure and/or gene-covariate interaction terms. REGEM (RE-analysis of GEM summary statistics) uses summary statistics from a single, multi-exposure genome-wide interaction study to derive analogous sets of summary statistics with arbitrary sets of exposures and interaction covariate adjustments. METAGEM (META-analysis of GEM summary statistics) extends current fixed-effects meta-analysis models to incorporate multiple exposures from multiple studies. We demonstrate the value and efficiency of these tools by exploring alternative methods of accounting for ancestry-related population stratification in genome-wide interaction study in the UK Biobank as well as by conducting a multi-exposure genome-wide interaction study meta-analysis in cohorts from the diabetes-focused ProDiGY consortium. These programs help to maximize the value of summary statistics from diverse and complex gene-environment interaction studies. AVAILABILITY AND IMPLEMENTATION: REGEM and METAGEM are open-source projects freely available at https://github.com/large-scale-gxe-methods/REGEM and https://github.com/large-scale-gxe-methods/METAGEM.


Assuntos
Interação Gene-Ambiente , Estudo de Associação Genômica Ampla , Modelos Estatísticos , Tamanho da Amostra , Interpretação Estatística de Dados , Polimorfismo de Nucleotídeo Único , Fenótipo
20.
Bioinformatics ; 39(8)2023 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-37522889

RESUMO

SUMMARY: In any population under selective pressure, a central challenge is to distinguish the genes that drive adaptation from others which, subject to population variation, harbor many neutral mutations de novo. We recently showed that such genes could be identified by supplementing information on mutational frequency with an evolutionary analysis of the likely functional impact of coding variants. This approach improved the discovery of driver genes in both lab-evolved and environmental Escherichia coli strains. To facilitate general adoption, we now developed ShinyBioHEAT, an R Shiny web-based application that enables identification of phenotype driving gene in two commonly used model bacteria, E.coli and Bacillus subtilis, with no specific computational skill requirements. ShinyBioHEAT not only supports transparent and interactive analysis of lab evolution data in E.coli and B.subtilis, but it also creates dynamic visualizations of mutational impact on protein structures, which add orthogonal checks on predicted drivers. AVAILABILITY AND IMPLEMENTATION: Code for ShinyBioHEAT is available at https://github.com/LichtargeLab/ShinyBioHEAT. The Shiny application is additionally hosted at http://bioheat.lichtargelab.org/.


Assuntos
Escherichia coli , Aplicativos Móveis , Escherichia coli/genética , Software , Mutação , Interpretação Estatística de Dados , Taxa de Mutação
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa