Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Bioinformatics ; 39(9)2023 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-37672022

RESUMEN

MOTIVATION: Genome-wide association studies (GWAS) present several computational and statistical challenges for their data analysis, including knowledge discovery, interpretability, and translation to clinical practice. RESULTS: We develop, apply, and comparatively evaluate an automated machine learning (AutoML) approach, customized for genomic data that delivers reliable predictive and diagnostic models, the set of genetic variants that are important for predictions (called a biosignature), and an estimate of the out-of-sample predictive power. This AutoML approach discovers variants with higher predictive performance compared to standard GWAS methods, computes an individual risk prediction score, generalizes to new, unseen data, is shown to better differentiate causal variants from other highly correlated variants, and enhances knowledge discovery and interpretability by reporting multiple equivalent biosignatures. AVAILABILITY AND IMPLEMENTATION: Code for this study is available at: https://github.com/mensxmachina/autoML-GWAS. JADBio offers a free version at: https://jadbio.com/sign-up/. SNP data can be downloaded from the EGA repository (https://ega-archive.org/). PRS data are found at: https://www.aicrowd.com/challenges/opensnp-height-prediction. Simulation data to study population structure can be found at: https://easygwas.ethz.ch/data/public/dataset/view/1/.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Humanos , Fenotipo , Simulación por Computador , Aprendizaje Automático
2.
IEEE/ACM Trans Comput Biol Bioinform ; 19(2): 1214-1224, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-33035156

RESUMEN

Feature selection for predictive analytics is the problem of identifying a minimal-size subset of features that is maximally predictive of an outcome of interest. To apply to molecular data, feature selection algorithms need to be scalable to tens of thousands of features. In this paper, we propose γ-OMP, a generalisation of the highly-scalable Orthogonal Matching Pursuit feature selection algorithm. γ-OMP can handle (a)various types of outcomes, such as continuous, binary, nominal, time-to-event, (b)discrete (categorical)features, (c)different statistical-based stopping criteria, (d)several predictive models (e.g., linear or logistic regression), (e)various types of residuals, and (f)different types of association. We compare γ-OMP against LASSO, a prototypical, widely used algorithm for high-dimensional data. On both simulated data and several real gene expression datasets, γ-OMP is on par, or outperforms LASSO in binary classification (case-control data), regression (quantified outcomes), and time-to-event data (censored survival times). γ-OMP is based on simple statistical ideas, it is easy to implement and to extend, and our extensive evaluation shows that it is also effective in bioinformatics analysis settings.


Asunto(s)
Algoritmos , Biología Computacional , Estudios de Casos y Controles , Expresión Génica , Modelos Logísticos
3.
NPJ Precis Oncol ; 6(1): 38, 2022 Jun 16.
Artículo en Inglés | MEDLINE | ID: mdl-35710826

RESUMEN

Fully automated machine learning (AutoML) for predictive modeling is becoming a reality, giving rise to a whole new field. We present the basic ideas and principles of Just Add Data Bio (JADBio), an AutoML platform applicable to the low-sample, high-dimensional omics data that arise in translational medicine and bioinformatics applications. In addition to predictive and diagnostic models ready for clinical use, JADBio focuses on knowledge discovery by performing feature selection and identifying the corresponding biosignatures, i.e., minimal-size subsets of biomarkers that are jointly predictive of the outcome or phenotype of interest. It also returns a palette of useful information for interpretation, clinical use of the models, and decision making. JADBio is qualitatively and quantitatively compared against Hyper-Parameter Optimization Machine Learning libraries. Results show that in typical omics dataset analysis, JADBio manages to identify signatures comprising of just a handful of features while maintaining competitive predictive performance and accurate out-of-sample performance estimation.

4.
J Steroid Biochem Mol Biol ; 197: 105505, 2020 03.
Artículo en Inglés | MEDLINE | ID: mdl-31669573

RESUMEN

Vitamin D (VitD) continues to trigger intense scientific controversy, regarding both its bi ological targets and its supplementation doses and regimens. In an effort to resolve this dispute, we mapped VitD transcriptome-wide events in humans, in order to unveil shared patterns or mechanisms with diverse pathologies/tissue profiles and reveal causal effects between VitD actions and specific human diseases, using a recently developed bioinformatics methodology. Using the similarities in analyzed transcriptome data (c-SKL method), we validated our methodology with osteoporosis as an example and further analyzed two other strong hits, specifically chronic obstructive pulmonary disease (COPD) and asthma. The latter revealed no impact of VitD on known molecular pathways. In accordance to this finding, review and meta-analysis of published data, based on an objective measure (Forced Expiratory Volume at one second, FEV1%) did not further reveal any significant effect of VitD on the objective amelioration of either condition. This study may, therefore, be regarded as the first one to explore, in an objective, unbiased and unsupervised manner, the impact of VitD levels and/or interventions in a number of human pathologies.


Asunto(s)
Asma/sangre , Biología Computacional/métodos , Enfermedad Pulmonar Obstructiva Crónica/sangre , Transcriptoma , Deficiencia de Vitamina D/sangre , Vitamina D/sangre , Vitaminas/sangre , Asma/complicaciones , Asma/genética , Suplementos Dietéticos , Humanos , Enfermedad Pulmonar Obstructiva Crónica/complicaciones , Enfermedad Pulmonar Obstructiva Crónica/genética , Vitamina D/genética , Deficiencia de Vitamina D/complicaciones , Deficiencia de Vitamina D/genética , Vitaminas/genética
5.
NPJ Syst Biol Appl ; 5: 39, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31666984

RESUMEN

Could there be unexpected similarities between different studies, diseases, or treatments, on a molecular level due to common biological mechanisms involved? To answer this question, we develop a method for computing similarities between empirical, statistical distributions of high-dimensional, low-sample datasets, and apply it on hundreds of -omics studies. The similarities lead to dataset-to-dataset networks visualizing the landscape of a large portion of biological data. Potentially interesting similarities connecting studies of different diseases are assembled in a disease-to-disease network. Exploring it, we discover numerous non-trivial connections between Alzheimer's disease and schizophrenia, asthma and psoriasis, or liver cancer and obesity, to name a few. We then present a method that identifies the molecular quantities and pathways that contribute the most to the identified similarities and could point to novel drug targets or provide biological insights. The proposed method acts as a "statistical telescope" providing a global view of the constellation of biological data; readers can peek through it at: http://datascope.csd.uoc.gr:25000/.


Asunto(s)
Biología Computacional/métodos , Métodos Epidemiológicos , Algoritmos , Análisis de Datos , Bases de Datos Factuales , Bases de Datos Genéticas , Enfermedad/genética , Epidemiología , Humanos , Modelos Estadísticos , Análisis de Sistemas
6.
Database (Oxford) ; 20182018 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-29688366

RESUMEN

Biotechnology revolution generates a plethora of omics data with an exponential growth pace. Therefore, biological data mining demands automatic, 'high quality' curation efforts to organize biomedical knowledge into online databases. BioDataome is a database of uniformly preprocessed and disease-annotated omics data with the aim to promote and accelerate the reuse of public data. We followed the same preprocessing pipeline for each biological mart (microarray gene expression, RNA-Seq gene expression and DNA methylation) to produce ready for downstream analysis datasets and automatically annotated them with disease-ontology terms. We also designate datasets that share common samples and automatically discover control samples in case-control studies. Currently, BioDataome includes ∼5600 datasets, ∼260 000 samples spanning ∼500 diseases and can be easily used in large-scale massive experiments and meta-analysis. All datasets are publicly available for querying and downloading via BioDataome web application. We demonstrate BioDataome's utility by presenting exploratory data analysis examples. We have also developed BioDataome R package found in: https://github.com/mensxmachina/BioDataome/.Database URL: http://dataome.mensxmachina.org/.


Asunto(s)
Curaduría de Datos/métodos , Bases de Datos Genéticas , Procesamiento Automatizado de Datos/métodos , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Análisis de Secuencia por Matrices de Oligonucleótidos , Metaanálisis como Asunto
7.
PLoS One ; 12(8): e0182138, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28771511

RESUMEN

Racial and ethnic differences in drug responses are now well studied and documented. Pharmacogenomics research seeks to unravel the genetic underpinnings of inter-individual variability with the aim of tailored-made theranostics and therapeutics. Taking into account the differential expression of pharmacogenes coding for key metabolic enzymes and transporters that affect drug pharmacokinetics and pharmacodynamics, we advise that data interpretation and analysis need to occur in light of geographical ancestry, if implications for drug development and global health are to be considered. Herein, we exploit ePGA, a web-based electronic Pharmacogenomics Assistant and publicly available genetic data from the 1000 Genomes Project to explore genotype to phenotype associations among the 1000 Genomes Project populations.


Asunto(s)
Genoma Humano , Metagenómica , Grupos de Población/genética , Sistema Enzimático del Citocromo P-450/genética , Bases de Datos Factuales , Frecuencia de los Genes , Estudios de Asociación Genética , Genotipo , Haplotipos , Humanos , Fenotipo , Interfaz Usuario-Computador
8.
PLoS One ; 11(9): e0162801, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27631363

RESUMEN

One of the challenges that arise from the advent of personal genomics services is to efficiently couple individual data with state of the art Pharmacogenomics (PGx) knowledge. Existing services are limited to either providing static views of PGx variants or applying a simplistic match between individual genotypes and existing PGx variants. Moreover, there is a considerable amount of haplotype variation associated with drug metabolism that is currently insufficiently addressed. Here, we present a web-based electronic Pharmacogenomics Assistant (ePGA; http://www.epga.gr/) that provides personalized genotype-to-phenotype translation, linked to state of the art clinical guidelines. ePGA's translation service matches individual genotype-profiles with PGx gene haplotypes and infers the corresponding diplotype and phenotype profiles, accompanied with summary statistics. Additional features include i) the ability to customize translation based on subsets of variants of clinical interest, and ii) to update the knowledge base with novel PGx findings. We demonstrate ePGA's functionality on genetic variation data from the 1000 Genomes Project.


Asunto(s)
Sistemas de Información , Internet , Farmacogenética , Modelos Teóricos
9.
Open Biol ; 4(7)2014 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-25030607

RESUMEN

In the post-genomic era, the rapid evolution of high-throughput genotyping technologies and the increased pace of production of genetic research data are continually prompting the development of appropriate informatics tools, systems and databases as we attempt to cope with the flood of incoming genetic information. Alongside new technologies that serve to enhance data connectivity, emerging information systems should contribute to the creation of a powerful knowledge environment for genotype-to-phenotype information in the context of translational medicine. In the area of pharmacogenomics and personalized medicine, it has become evident that database applications providing important information on the occurrence and consequences of gene variants involved in pharmacokinetics, pharmacodynamics, drug efficacy and drug toxicity will become an integral tool for researchers and medical practitioners alike. At the same time, two fundamental issues are inextricably linked to current developments, namely data sharing and data protection. Here, we discuss high-throughput and next-generation sequencing technology and its impact on pharmacogenomics research. In addition, we present advances and challenges in the field of pharmacogenomics information systems which have in turn triggered the development of an integrated electronic 'pharmacogenomics assistant'. The system is designed to provide personalized drug recommendations based on linked genotype-to-phenotype pharmacogenomics data, as well as to support biomedical researchers in the identification of pharmacogenomics-related gene variants. The provisioned services are tuned in the framework of a single-access pharmacogenomics portal.


Asunto(s)
Genómica/métodos , Farmacogenética/métodos , Genoma , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Medicina de Precisión/métodos
10.
Langmuir ; 22(5): 2329-33, 2006 Feb 28.
Artículo en Inglés | MEDLINE | ID: mdl-16489825

RESUMEN

The wetting characteristics of surfaces of polymers doped with photochromic spiropyran molecules can be tuned when irradiated with laser beams of properly chosen photon energy. The hydrophilicity is enhanced upon UV laser irradiation since the embedded nonpolar spiropyran molecules convert to their polar merocyanine isomers. The process is reversed upon green laser irradiation. Structuring of the photochromic polymeric surfaces with soft lithography enhances significantly the hydrophobicity of the system, indicating that the water droplets on the patterned features interact with air that is trapped in the microcavities, thus creating superhydrophobic air-water contact areas. Furthermore, the light-induced wettability variations of the structured surfaces are enhanced by a factor of 3 compared to those on the flat surfaces. This significant enhancement is attributed to the photoinduced reversible volume changes to the imprinted gratings, which additionally contribute to the wettability changes due to the light-induced photochromic interconversions.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA