Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Bioinformatics ; 39(9)2023 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-37672022

RESUMEN

MOTIVATION: Genome-wide association studies (GWAS) present several computational and statistical challenges for their data analysis, including knowledge discovery, interpretability, and translation to clinical practice. RESULTS: We develop, apply, and comparatively evaluate an automated machine learning (AutoML) approach, customized for genomic data that delivers reliable predictive and diagnostic models, the set of genetic variants that are important for predictions (called a biosignature), and an estimate of the out-of-sample predictive power. This AutoML approach discovers variants with higher predictive performance compared to standard GWAS methods, computes an individual risk prediction score, generalizes to new, unseen data, is shown to better differentiate causal variants from other highly correlated variants, and enhances knowledge discovery and interpretability by reporting multiple equivalent biosignatures. AVAILABILITY AND IMPLEMENTATION: Code for this study is available at: https://github.com/mensxmachina/autoML-GWAS. JADBio offers a free version at: https://jadbio.com/sign-up/. SNP data can be downloaded from the EGA repository (https://ega-archive.org/). PRS data are found at: https://www.aicrowd.com/challenges/opensnp-height-prediction. Simulation data to study population structure can be found at: https://easygwas.ethz.ch/data/public/dataset/view/1/.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Humanos , Fenotipo , Simulación por Computador , Aprendizaje Automático
2.
Eur J Oral Sci ; 132(1): e12962, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38030576

RESUMEN

Meta-analyses may provide imprecise estimates when important meta-analysis parameters are not considered during the synthesis. The aim of this case study was to highlight the influence of meta-analysis parameters that can affect reported estimates using as an example pre-existing meta-analyses on the association between implant survival and sinus membrane perforation. PubMed was searched on 7 July 2021 for meta-analyses comparing implant failure in perforated and non-perforated sinus membranes. Primary studies identified in these meta-analyses were combined in a new random-effects model with odds ratios (ORs), confidence intervals (CIs), and prediction intervals reported. Using this new meta-analysis, further meta-analyses were then undertaken considering the clinical, methodological, and statistical heterogeneity of the primary studies, publication bias, and clustering effects. The meta-analyses with the greatest number and more homogeneous studies provided lower odds of implant failure in non-perforated sites (OR 0.49, 95 % CI = [0.26, 0.92]). However, when considering heterogeneity, publication bias, and clustering (number of implants), the confidence in these results was reduced. Interpretation of estimates reported in systematic reviews can vary depending on the assumptions made in the meta-analysis. Users of these analyses need to carefully consider the impact of heterogeneity, publication bias, and clustering, which can affect the size, direction, and interpretation of the reported estimates.


Asunto(s)
Odontología , Sesgo de Publicación , Revisiones Sistemáticas como Asunto
3.
Eur Respir J ; 58(5)2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-33888521

RESUMEN

INTRODUCTION: Understanding the psychometric properties of health-related quality of life (HRQoL) questionnaires can help inform selection in clinical trials. Our objective was to assess the psychometric properties of HRQoL questionnaires in bronchiectasis using a systematic review and meta-analysis of the literature. METHODS: A literature search was conducted. HRQoL questionnaires were assessed for psychometric properties (reliability, validity, minimal clinically important difference (MCID) and floor/ceiling effects). Meta-analyses assessed the associations of HRQoL with clinical measures and responsiveness of HRQoL in clinical trials. RESULTS: 166 studies and 12 HRQoL questionnaires were included. The Bronchiectasis Health Questionnaire (BHQ), Leicester Cough Questionnaire (LCQ), Chronic Obstructive Pulmonary Disease (COPD) Assessment Test (CAT) and Medical Outcomes Study 36-item Short-Form Health Survey (SF-36) had good internal consistency in all domains reported (Cronbach's α≥0.7) across all studies, and the Quality of Life-Bronchiectasis (QOL-B), St George's Respiratory Questionnaire (SGRQ), Chronic Respiratory Disease Questionnaire (CRDQ) and Seattle Obstructive Lung Disease Questionnaire (SOLQ) had good internal consistency in all domains in the majority of (but not all) studies. BHQ, SGRQ, LCQ and CAT had good test-retest reliability in all domains reported (intraclass correlation coefficient ≥0.7) across all studies, and QOL-B, CRDQ and SOLQ had good test-retest reliability in all domains in the majority of (but not all) studies. HRQoL questionnaires were able to discriminate between demographics, important markers of clinical status, disease severity, exacerbations and bacteriology. For HRQoL responsiveness, there was a difference between the treatment and placebo effect. CONCLUSIONS: SGRQ was the most widely used HRQoL questionnaire in bronchiectasis studies and it had good psychometric properties; however, good psychometric data are emerging on the bronchiectasis-specific HRQoL questionnaires QOL-B and BHQ. Future studies should focus on the medium- to long-term test-retest reliability, responsiveness and MCID in these HRQoL questionnaires which show potential in bronchiectasis.


Asunto(s)
Bronquiectasia , Enfermedad Pulmonar Obstructiva Crónica , Humanos , Psicometría , Calidad de Vida , Reproducibilidad de los Resultados , Encuestas y Cuestionarios
4.
Eur J Orthod ; 43(5): 583-587, 2021 10 04.
Artículo en Inglés | MEDLINE | ID: mdl-33991101

RESUMEN

BACKGROUND: At the clinical trial design stage, assumptions regarding the treatment effects to be detected should be appropriate so that the required sample size can be calculated. There is evidence in the medical literature that sample size assumption can be overoptimistic. The aim of this study was to compare the distribution of the assumed effects versus that of the observed effects as a proxy for overoptimistic treatment effect assumptions at the study design stage. MATERIALS AND METHOD: Systematic reviews (SRs) published between 1 January 2010 and 31 December 2019 containing at least one meta-analysis on continuous outcomes were identified electronically. SR and primary study level characteristics were extracted from the SRs and the individual trials. Details on the sample size calculation process and assumptions and the observed treatment effects were extracted. RESULTS: Eighty-five SRs with meta-analysis containing 347 primary trials were included. The median number of SR authors was 5 (interquartile range: 4-7). At the primary study level, the majority were single centre (78.1%), utilized a parallel design (52%), and rated as an unclear/moderate level of risk of bias (34.3%). A sample size was described in only 31.7% (110/347) of studies. From this cohort of 110 studies, in only 37 studies was the assumed clinical difference that the study was designed to detect reported (37/110). The assumed treatment effect was recalculated for the remaining 73 studies (73/110). The one-sided exact signed rank test showed a significant difference between the assumed and observed treatment effects (P < 0.001) suggesting greater values for the assumed effect sizes. CONCLUSIONS: Careful consideration of the assumptions at the design stage of orthodontic studies are necessary in order to reduce the unreliability of clinical study results and research waste.


Asunto(s)
Proyectos de Investigación , Humanos
5.
BMC Bioinformatics ; 19(1): 17, 2018 01 23.
Artículo en Inglés | MEDLINE | ID: mdl-29357817

RESUMEN

BACKGROUND: Feature selection is commonly employed for identifying collectively-predictive biomarkers and biosignatures; it facilitates the construction of small statistical models that are easier to verify, visualize, and comprehend while providing insight to the human expert. In this work we extend established constrained-based, feature-selection methods to high-dimensional "omics" temporal data, where the number of measurements is orders of magnitude larger than the sample size. The extension required the development of conditional independence tests for temporal and/or static variables conditioned on a set of temporal variables. RESULTS: The algorithm is able to return multiple, equivalent solution subsets of variables, scale to tens of thousands of features, and outperform or be on par with existing methods depending on the analysis task specifics. CONCLUSIONS: The use of this algorithm is suggested for variable selection with high-dimensional temporal data.


Asunto(s)
Algoritmos , Genómica , Modelos Lineales
6.
Plant Cell ; 27(4): 1018-33, 2015 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-25901089

RESUMEN

Diverse molecular networks underlying plant growth and development are rapidly being uncovered. Integrating these data into the spatial and temporal context of dynamic organ growth remains a technical challenge. We developed 3DCellAtlas, an integrative computational pipeline that semiautomatically identifies cell types and quantifies both 3D cellular anisotropy and reporter abundance at single-cell resolution across whole plant organs. Cell identification is no less than 97.8% accurate and does not require transgenic lineage markers or reference atlases. Cell positions within organs are defined using an internal indexing system generating cellular level organ atlases where data from multiple samples can be integrated. Using this approach, we quantified the organ-wide cell-type-specific 3D cellular anisotropy driving Arabidopsis thaliana hypocotyl elongation. The impact ethylene has on hypocotyl 3D cell anisotropy identified the preferential growth of endodermis in response to this hormone. The spatiotemporal dynamics of the endogenous DELLA protein RGA, expansin gene EXPA3, and cell expansion was quantified within distinct cell types of Arabidopsis roots. A significant regulatory relationship between RGA, EXPA3, and growth was present in the epidermis and endodermis. The use of single-cell analyses of plant development enables the dynamics of diverse regulatory networks to be integrated with 3D organ growth.


Asunto(s)
Biología Computacional/métodos , Análisis de la Célula Individual/métodos , Arabidopsis/crecimiento & desarrollo , Arabidopsis/metabolismo , Hipocótilo/crecimiento & desarrollo , Hipocótilo/metabolismo , Organogénesis de las Plantas/genética , Organogénesis de las Plantas/fisiología , Raíces de Plantas/crecimiento & desarrollo , Raíces de Plantas/metabolismo
8.
Am J Orthod Dentofacial Orthop ; 159(5): 695-696, 2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-33931224
9.
IEEE/ACM Trans Comput Biol Bioinform ; 19(2): 1214-1224, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-33035156

RESUMEN

Feature selection for predictive analytics is the problem of identifying a minimal-size subset of features that is maximally predictive of an outcome of interest. To apply to molecular data, feature selection algorithms need to be scalable to tens of thousands of features. In this paper, we propose γ-OMP, a generalisation of the highly-scalable Orthogonal Matching Pursuit feature selection algorithm. γ-OMP can handle (a)various types of outcomes, such as continuous, binary, nominal, time-to-event, (b)discrete (categorical)features, (c)different statistical-based stopping criteria, (d)several predictive models (e.g., linear or logistic regression), (e)various types of residuals, and (f)different types of association. We compare γ-OMP against LASSO, a prototypical, widely used algorithm for high-dimensional data. On both simulated data and several real gene expression datasets, γ-OMP is on par, or outperforms LASSO in binary classification (case-control data), regression (quantified outcomes), and time-to-event data (censored survival times). γ-OMP is based on simple statistical ideas, it is easy to implement and to extend, and our extensive evaluation shows that it is also effective in bioinformatics analysis settings.


Asunto(s)
Algoritmos , Biología Computacional , Estudios de Casos y Controles , Expresión Génica , Modelos Logísticos
10.
F1000Res ; 7: 1505, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-31656581

RESUMEN

Feature (or variable) selection is the process of identifying the minimal set of features with the highest predictive performance on the target variable of interest. Numerous feature selection algorithms have been developed over the years, but only few have been implemented in R as a package. The R package MXM is such an example, which not only offers a variety of feature selection algorithms, but has unique features that make it advantageous over its competitors: a) it contains feature selection algorithms that can treat numerous types of target variables, including continuous, percentages, time to event (survival), binary, nominal, ordinal, clustered, counts, left censored, etc; b) it contains a variety of regression models to plug into the feature selection algorithms; c) it includes an algorithm for detecting multiple solutions (many sets of equivalent features); and d) it includes memory efficient algorithms for high volume data, data that cannot be loaded into R. In this paper we qualitatively compare MXM with other relevant packages and discuss its advantages and disadvantages. We also provide a demonstration of its algorithms using real high-dimensional data from various applications.


Asunto(s)
Algoritmos
11.
Int J Data Sci Anal ; 6(1): 19-30, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30957008

RESUMEN

We address the problem of constraint-based causal discovery with mixed data types, such as (but not limited to) continuous, binary, multinomial, and ordinal variables. We use likelihood-ratio tests based on appropriate regression models and show how to derive symmetric conditional independence tests. Such tests can then be directly used by existing constraint-based methods with mixed data, such as the PC and FCI algorithms for learning Bayesian networks and maximal ancestral graphs, respectively. In experiments on simulated Bayesian networks, we employ the PC algorithm with different conditional independence tests for mixed data and show that the proposed approach outperforms alternatives in terms of learning accuracy.

12.
Database (Oxford) ; 20182018 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-29688366

RESUMEN

Biotechnology revolution generates a plethora of omics data with an exponential growth pace. Therefore, biological data mining demands automatic, 'high quality' curation efforts to organize biomedical knowledge into online databases. BioDataome is a database of uniformly preprocessed and disease-annotated omics data with the aim to promote and accelerate the reuse of public data. We followed the same preprocessing pipeline for each biological mart (microarray gene expression, RNA-Seq gene expression and DNA methylation) to produce ready for downstream analysis datasets and automatically annotated them with disease-ontology terms. We also designate datasets that share common samples and automatically discover control samples in case-control studies. Currently, BioDataome includes ∼5600 datasets, ∼260 000 samples spanning ∼500 diseases and can be easily used in large-scale massive experiments and meta-analysis. All datasets are publicly available for querying and downloading via BioDataome web application. We demonstrate BioDataome's utility by presenting exploratory data analysis examples. We have also developed BioDataome R package found in: https://github.com/mensxmachina/BioDataome/.Database URL: http://dataome.mensxmachina.org/.


Asunto(s)
Curaduría de Datos/métodos , Bases de Datos Genéticas , Procesamiento Automatizado de Datos/métodos , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Análisis de Secuencia por Matrices de Oligonucleótidos , Metaanálisis como Asunto
13.
Int Sch Res Notices ; 2014: 825383, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-27437470

RESUMEN

The purpose of the present paper is to assess the efficacy of confidence intervals for Rosenthal's fail-safe number. Although Rosenthal's estimator is highly used by researchers, its statistical properties are largely unexplored. First of all, we developed statistical theory which allowed us to produce confidence intervals for Rosenthal's fail-safe number. This was produced by discerning whether the number of studies analysed in a meta-analysis is fixed or random. Each case produces different variance estimators. For a given number of studies and a given distribution, we provided five variance estimators. Confidence intervals are examined with a normal approximation and a nonparametric bootstrap. The accuracy of the different confidence interval estimates was then tested by methods of simulation under different distributional assumptions. The half normal distribution variance estimator has the best probability coverage. Finally, we provide a table of lower confidence intervals for Rosenthal's estimator.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA