Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 93
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Nucleic Acids Res ; 50(W1): W165-W174, 2022 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-35610037

RESUMEN

The CFM-ID 4.0 web server (https://cfmid.wishartlab.com) is an online tool for predicting, annotating and interpreting tandem mass (MS/MS) spectra of small molecules. It is specifically designed to assist researchers pursuing studies in metabolomics, exposomics and analytical chemistry. More specifically, CFM-ID 4.0 supports the: 1) prediction of electrospray ionization quadrupole time-of-flight tandem mass spectra (ESI-QTOF-MS/MS) for small molecules over multiple collision energies (10 eV, 20 eV, and 40 eV); 2) annotation of ESI-QTOF-MS/MS spectra given the structure of the compound; and 3) identification of a small molecule that generated a given ESI-QTOF-MS/MS spectrum at one or more collision energies. The CFM-ID 4.0 web server makes use of a substantially improved MS fragmentation algorithm, a much larger database of experimental and in silico predicted MS/MS spectra and improved scoring methods to offer more accurate MS/MS spectral prediction and MS/MS-based compound identification. Compared to earlier versions of CFM-ID, this new version has an MS/MS spectral prediction performance that is ∼22% better and a compound identification accuracy that is ∼35% better on a standard (CASMI 2016) testing dataset. CFM-ID 4.0 also features a neutral loss function that allows users to identify similar or substituent compounds where no match can be found using CFM-ID's regular MS/MS-to-compound identification utility. Finally, the CFM-ID 4.0 web server now offers a much more refined user interface that is easier to use, supports molecular formula identification (from MS/MS data), provides more interactively viewable data (including proposed fragment ion structures) and displays MS mirror plots for comparing predicted with observed MS/MS spectra. These improvements should make CFM-ID 4.0 much more useful to the community and should make small molecule identification much easier, faster, and more accurate.


Asunto(s)
Algoritmos , Metabolómica , Programas Informáticos , Espectrometría de Masas en Tándem , Computadores , Metabolómica/métodos , Espectrometría de Masa por Ionización de Electrospray , Espectrometría de Masas en Tándem/métodos , Internet
2.
Nucleic Acids Res ; 50(W1): W115-W123, 2022 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-35536252

RESUMEN

BioTransformer 3.0 (https://biotransformer.ca) is a freely available web server that supports accurate, rapid and comprehensive in silico metabolism prediction. It combines machine learning approaches with a rule-based system to predict small-molecule metabolism in human tissues, the human gut as well as the external environment (soil and water microbiota). Simply stated, BioTransformer takes a molecular structure as input (SMILES or SDF) and outputs an interactively sortable table of the predicted metabolites or transformation products (SMILES, PNG images) along with the enzymes that are predicted to be responsible for those reactions and richly annotated downloadable files (CSV and JSON). The entire process typically takes less than a minute. Previous versions of BioTransformer focused exclusively on predicting the metabolism of xenobiotics (such as plant natural products, drugs, cosmetics and other synthetic compounds) using a limited number of pre-defined steps and somewhat limited rule-based methods. BioTransformer 3.0 uses much more sophisticated methods and incorporates new databases, new constraints and new prediction modules to not only more accurately predict the metabolic transformation products of exogenous xenobiotics but also the transformation products of endogenous metabolites, such as amino acids, peptides, carbohydrates, organic acids, and lipids. BioTransformer 3.0 can also support customized sequential combinations of these transformations along with multiple iterations to simulate multi-step human biotransformation events. Performance tests indicate that BioTransformer 3.0 is 40-50% more accurate, far less prone to combinatorial 'explosions' and much more comprehensive in terms of metabolite coverage/capabilities than previous versions of BioTransformer.


Asunto(s)
Biología Computacional , Xenobióticos , Humanos , Biología Computacional/métodos , Biotransformación , Bases de Datos Factuales , Estructura Molecular , Xenobióticos/metabolismo
3.
Nucleic Acids Res ; 50(D1): D622-D631, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34986597

RESUMEN

The Human Metabolome Database or HMDB (https://hmdb.ca) has been providing comprehensive reference information about human metabolites and their associated biological, physiological and chemical properties since 2007. Over the past 15 years, the HMDB has grown and evolved significantly to meet the needs of the metabolomics community and respond to continuing changes in internet and computing technology. This year's update, HMDB 5.0, brings a number of important improvements and upgrades to the database. These should make the HMDB more useful and more appealing to a larger cross-section of users. In particular, these improvements include: (i) a significant increase in the number of metabolite entries (from 114 100 to 217 920 compounds); (ii) enhancements to the quality and depth of metabolite descriptions; (iii) the addition of new structure, spectral and pathway visualization tools; (iv) the inclusion of many new and much more accurately predicted spectral data sets, including predicted NMR spectra, more accurately predicted MS spectra, predicted retention indices and predicted collision cross section data and (v) enhancements to the HMDB's search functions to facilitate better compound identification. Many other minor improvements and updates to the content, the interface, and general performance of the HMDB website have also been made. Overall, we believe these upgrades and updates should greatly enhance the HMDB's ease of use and its potential applications not only in human metabolomics but also in exposomics, lipidomics, nutritional science, biochemistry and clinical chemistry.


Asunto(s)
Bases de Datos Genéticas , Metaboloma/genética , Metabolómica/clasificación , Humanos , Lipidómica/clasificación , Espectrometría de Masas , Interfaz Usuario-Computador
4.
Anal Chem ; 95(50): 18326-18334, 2023 12 19.
Artículo en Inglés | MEDLINE | ID: mdl-38048435

RESUMEN

The market for illicit drugs has been reshaped by the emergence of more than 1100 new psychoactive substances (NPS) over the past decade, posing a major challenge to the forensic and toxicological laboratories tasked with detecting and identifying them. Tandem mass spectrometry (MS/MS) is the primary method used to screen for NPS within seized materials or biological samples. The most contemporary workflows necessitate labor-intensive and expensive MS/MS reference standards, which may not be available for recently emerged NPS on the illicit market. Here, we present NPS-MS, a deep learning method capable of accurately predicting the MS/MS spectra of known and hypothesized NPS from their chemical structures alone. NPS-MS is trained by transfer learning from a generic MS/MS prediction model on a large data set of MS/MS spectra. We show that this approach enables a more accurate identification of NPS from experimentally acquired MS/MS spectra than any existing method. We demonstrate the application of NPS-MS to identify a novel derivative of phencyclidine (PCP) within an unknown powder seized in Denmark without the use of any reference standards. We anticipate that NPS-MS will allow forensic laboratories to identify more rapidly both known and newly emerging NPS. NPS-MS is available as a web server at https://nps-ms.ca/, which provides MS/MS spectra prediction capabilities for given NPS compounds. Additionally, it offers MS/MS spectra identification against a vast database comprising approximately 8.7 million predicted NPS compounds from DarkNPS and 24.5 million predicted ESI-QToF-MS/MS spectra for these compounds.


Asunto(s)
Aprendizaje Profundo , Drogas Ilícitas , Espectrometría de Masas en Tándem/métodos , Psicotrópicos/análisis , Drogas Ilícitas/análisis , Espectrometría de Masa por Ionización de Electrospray
5.
Gerontology ; 69(12): 1394-1403, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37725932

RESUMEN

INTRODUCTION: An aging population will bring a pressing challenge for the healthcare system. Insights into promoting healthy longevity can be gained by quantifying the biological aging process and understanding the roles of modifiable lifestyle and environmental factors, and chronic disease conditions. METHODS: We developed a biological age (BioAge) index by applying multiple state-of-art machine learning models based on easily accessible blood test data from the Canadian Longitudinal Study of Aging (CLSA). The BioAge gap, which is the difference between BioAge index and chronological age, was used to quantify the differential aging, i.e., the difference between biological and chronological age, of the CLSA participants. We further investigated the associations between the BioAge gap and lifestyle, environmental factors, and current and future health conditions. RESULTS: BioAge gap had strong associations with existing adverse health conditions (e.g., cancers, cardiovascular diseases, diabetes, and kidney diseases) and future disease onset (e.g., Parkinson's disease, diabetes, and kidney diseases). We identified that frequent consumption of processed meat, pork, beef, and chicken, poor outcomes in nutritional risk screening, cigarette smoking, exposure to passive smoking are associated with positive BioAge gap ("older" BioAge than expected). We also identified several modifiable factors, including eating fruits, legumes, vegetables, related to negative BioAge gap ("younger" BioAge than expected). CONCLUSIONS: Our study shows that a BioAge index based on easily accessible blood tests has the potential to quantify the differential biological aging process that can be associated with current and future adverse health events. The identified risk and protective factors for differential aging indicated by BioAge gap are informative for future research and guidelines to promote healthy longevity.


Asunto(s)
Diabetes Mellitus , Enfermedades Renales , Animales , Bovinos , Humanos , Anciano , Estudios Longitudinales , Canadá/epidemiología , Envejecimiento , Estilo de Vida
6.
Can J Psychiatry ; 68(1): 54-63, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-35892186

RESUMEN

OBJECTIVE: Opioid use disorder (OUD) is a chronic relapsing disorder with a problematic pattern of opioid use, affecting nearly 27 million people worldwide. Machine learning (ML)-based prediction of OUD may lead to early detection and intervention. However, most ML prediction studies were not based on representative data sources and prospective validations, limiting their potential to predict future new cases. In the current study, we aimed to develop and prospectively validate an ML model that could predict individual OUD cases based on representative large-scale health data. METHOD: We present an ensemble machine-learning model trained on a cross-linked Canadian administrative health data set from 2014 to 2018 (n = 699,164), with validation of model-predicted OUD cases on a hold-out sample from 2014 to 2018 (n = 174,791) and prospective prediction of OUD cases on a non-overlapping sample from 2019 (n = 316,039). We used administrative records of OUD diagnosis for each subject based on International Classification of Diseases (ICD) codes. RESULTS: With 6409 OUD cases in 2019 (mean [SD], 45.34 [14.28], 3400 males), our model prospectively predicted OUD cases at a high accuracy (balanced accuracy, 86%, sensitivity, 93%; specificity 79%). In accord with prior findings, the top risk factors for OUD in this model were opioid use indicators and a history of other substance use disorders. CONCLUSION: Our study presents an individualized prospective prediction of OUD cases by applying ML to large administrative health datasets. Such prospective predictions based on ML would be essential for potential future clinical applications in the early detection of OUD.


Asunto(s)
Analgésicos Opioides , Trastornos Relacionados con Opioides , Masculino , Humanos , Analgésicos Opioides/uso terapéutico , Canadá/epidemiología , Trastornos Relacionados con Opioides/diagnóstico , Trastornos Relacionados con Opioides/epidemiología , Trastornos Relacionados con Opioides/tratamiento farmacológico , Factores de Riesgo
7.
Can J Psychiatry ; 67(1): 39-47, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-34379019

RESUMEN

BACKGROUND: Major depressive disorder (MDD) is a common and burdensome condition that has low rates of treatment success for each individual treatment. This means that many patients require several medication switches to achieve remission; selecting an effective antidepressant is typically a sequential trial-and-error process. Machine learning techniques may be able to learn models that can predict whether a specific patient will respond to a given treatment, before it is administered. This study uses baseline clinical data to create a machine-learned model that accurately predicts remission status for a patient after desvenlafaxine (DVS) treatment. METHODS: We applied machine learning algorithms to data from 3,399 MDD patients (90% of the 3,776 subjects in 11 phase-III/IV clinical trials, each described using 92 features), to produce a model that uses 26 of these features to predict symptom remission, defined as an 8-week Hamilton Depression Rating Scale score of 7 or below. We evaluated that learned model on the remaining held-out 10% of the data (n = 377). RESULTS: Our resulting classifier, a trained linear support vector machine, had a holdout set accuracy of 69.0%, significantly greater than the probability of classifying a patient correctly by chance. We demonstrate that this learning process is stable by repeatedly sampling part of the training dataset and running the learner on this sample, then evaluating the learned model on the held-out instances of the training set; these runs had an average accuracy of 67.0% ± 1.8%. CONCLUSIONS: Our model, based on 26 clinical features, proved sufficient to predict DVS remission significantly better than chance. This may allow more accurate use of DVS without waiting 8 weeks to determine treatment outcome, and may serve as a first step toward changing psychiatric care by incorporating clinical assistive technologies using machine-learned models.


Asunto(s)
Trastorno Depresivo Mayor , Antidepresivos/uso terapéutico , Trastorno Depresivo Mayor/diagnóstico , Succinato de Desvenlafaxina/uso terapéutico , Humanos , Aprendizaje Automático , Resultado del Tratamiento
8.
BMC Health Serv Res ; 22(1): 1415, 2022 Nov 24.
Artículo en Inglés | MEDLINE | ID: mdl-36434628

RESUMEN

BACKGROUND: Hospital readmissions are one of the costliest challenges facing healthcare systems, but conventional models fail to predict readmissions well. Many existing models use exclusively manually-engineered features, which are labor intensive and dataset-specific. Our objective was to develop and evaluate models to predict hospital readmissions using derived features that are automatically generated from longitudinal data using machine learning techniques. METHODS: We studied patients discharged from acute care facilities in 2015 and 2016 in Alberta, Canada, excluding those who were hospitalized to give birth or for a psychiatric condition. We used population-level linked administrative hospital data from 2011 to 2017 to train prediction models using both manually derived features and features generated automatically from observational data. The target value of interest was 30-day all-cause hospital readmissions, with the success of prediction measured using the area under the curve (AUC) statistic. RESULTS: Data from 428,669 patients (62% female, 38% male, 27% 65 years or older) were used for training and evaluating models: 24,974 (5.83%) were readmitted within 30 days of discharge for any reason. Patients were more likely to be readmitted if they utilized hospital care more, had more physician office visits, had more prescriptions, had a chronic condition, or were 65 years old or older. The LACE readmission prediction model had an AUC of 0.66 ± 0.0064 while the machine learning model's test set AUC was 0.83 ± 0.0045, based on learning a gradient boosting machine on a combination of machine-learned and manually-derived features. CONCLUSION: Applying a machine learning model to the computer-generated and manual features improved prediction accuracy over the LACE model and a model that used only manually-derived features. Our model can be used to identify high-risk patients, for whom targeted interventions may potentially prevent readmissions.


Asunto(s)
Alta del Paciente , Readmisión del Paciente , Humanos , Masculino , Femenino , Anciano , Hospitalización , Aprendizaje Automático , Alberta/epidemiología
9.
Anal Chem ; 93(34): 11692-11700, 2021 08 31.
Artículo en Inglés | MEDLINE | ID: mdl-34403256

RESUMEN

In the field of metabolomics, mass spectrometry (MS) is the method most commonly used for identifying and annotating metabolites. As this typically involves matching a given MS spectrum against an experimentally acquired reference spectral library, this approach is limited by the coverage and size of such libraries (which typically number in the thousands). These experimental libraries can be greatly extended by predicting the MS spectra of known chemical structures (which number in the millions) to create computational reference spectral libraries. To facilitate the generation of predicted spectral reference libraries, we developed CFM-ID, a computer program that can accurately predict ESI-MS/MS spectrum for a given compound structure. CFM-ID is one of the best-performing methods for compound-to-mass-spectrum prediction and also one of the top tools for in silico mass-spectrum-to-compound identification. This work improves CFM-ID's ability to predict ESI-MS/MS spectra from compounds by (1) learning parameters from features based on the molecular topology, (2) adding a new approach to ring cleavage that models such cleavage as a sequence of simple chemical bond dissociations, and (3) expanding its hand-written rule-based predictor to cover more chemical classes, including acylcarnitines, acylcholines, flavonols, flavones, flavanones, and flavonoid glycosides. We demonstrate that this new version of CFM-ID (version 4.0) is significantly more accurate than previous CFM-ID versions in terms of both EI-MS/MS spectral prediction and compound identification. CFM-ID 4.0 is available at http://cfmid4.wishartlab.com/ as a web server and docker images can be downloaded at https://hub.docker.com/r/wishartlab/cfmid.


Asunto(s)
Flavonas , Espectrometría de Masas en Tándem , Simulación por Computador , Metabolómica , Programas Informáticos
10.
J Chem Inf Model ; 61(6): 3128-3140, 2021 06 28.
Artículo en Inglés | MEDLINE | ID: mdl-34038112

RESUMEN

In silico metabolism prediction is a cheminformatic task of autonomously predicting the set of metabolic byproducts produced from a specified molecule and a set of enzymes or reactions. Here, we describe a novel machine learned in silico cytochrome P450 (CYP450) metabolism prediction suite, called CyProduct, that accurately predicts metabolic byproducts for a specified molecule and a human CYP450 isoform. It includes three modules: (1) CypReact, a tool that predicts if the query compound reacts with a given CYP450 enzyme, (2) CypBoM, a tool that accurately predicts the "bond site" of the reaction (i.e., which specific bonds within the query molecule react with the CYP isoform), and (3) MetaboGen, a tool that generates the metabolic byproducts based on CypBoM's bond-site prediction. CyProduct predicts metabolic biotransformation products for each of the nine most important human CYP450 enzymes. CypBoM uses an important new concept called "bond of metabolism" (BoM), which extends the traditional "site of metabolism" (SoM) by specifying the information about the set of chemical bonds that is modified or formed in a metabolic reaction (rather than the specific atom). We created a BoM database for 1845 CYP450-mediated Phase I reactions, then used this to train the CypBoM Predictor to predict the reactive bond locations on substrate molecules. CypBoM Predictor's cross-validated Jaccard score for reactive bond prediction ranged from 0.380 to 0.452 over the nine CYP450 enzymes. Over variants of a test set of 68 known CYP450 substrates and 30 nonreactants, CyProduct outperformed the other packages, including ADMET Predictor, BioTransformer, and GLORY, by an average of 200% (with respect to Jaccard score) in terms of predicting metabolites. The CyProduct suite and the data sets are freely available at https://bitbucket.org/wishartlab/cyproduct/src/master/.


Asunto(s)
Sistema Enzimático del Citocromo P-450 , Programas Informáticos , Simulación por Computador , Sistema Enzimático del Citocromo P-450/metabolismo , Humanos , Oxidación-Reducción
11.
J Chem Inf Model ; 58(6): 1282-1291, 2018 06 25.
Artículo en Inglés | MEDLINE | ID: mdl-29738669

RESUMEN

In silico metabolism prediction requires first predicting whether a specific molecule will interact with one or more specific metabolizing enzymes, then predicting the result of each enzymatic reaction. Here, we provide a computational tool, CypReact, for performing this first task of reactant prediction. Specifically, CypReact takes as input an arbitrary molecule (specified as a SMILES string or a standard SDF file) and any one of the nine of the most important human cytochrome P450 (CYP450) enzymes-CYP1A2, CYP2A6, CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, CYP2E1, or CYP3A4-and accurately predicts whether the query molecule will react with that given CYP450 enzyme. Tests of CypReact, conducted over a data set of 1632 molecules (each considered a "plausible" reactant) show that it is very effective, with a (cross-validation) AUROC (area under the receiver operating characteristic curve) of 0.83-0.92. We also show that CypReact performs significantly better than other reactant prediction tools such as ADMET Predictor and (a reactant-predicting extension of) SMARTCyp, whose average AUROCs are 0.75 and 0.53, respectively. We then applied the learned CypReact models to a previously unseen set of molecules and found that our CypReact did even better and still significantly surpassed the performance of SMARTCyp and ADMET Predictor. These results suggest that CypReact could be an important component of a suite of in silico metabolism prediction tools for accurately predicting the products of Phase I, Phase II, and microbial metabolism in humans. CypReact is available at https://bitbucket.org/Leon_Ti/cypreact .


Asunto(s)
Sistema Enzimático del Citocromo P-450/metabolismo , Preparaciones Farmacéuticas/metabolismo , Algoritmos , Simulación por Computador , Descubrimiento de Drogas , Humanos , Modelos Biológicos , Programas Informáticos
16.
Cancers (Basel) ; 16(4)2024 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-38398176

RESUMEN

Recent advances in our understanding of gastric cancer biology have prompted a shift towards more personalized therapy. However, results are based on population-based survival analyses, which evaluate the average survival effects of entire treatment groups or single prognostic variables. This study uses a personalized survival modelling approach called individual survival distributions (ISDs) with the multi-task logistic regression (MTLR) model to provide novel insight into personalized survival in gastric adenocarcinoma. We performed a pooled analysis using 1043 patients from a previously characterized database annotated with molecular subtypes from the Cancer Genome Atlas, Asian Cancer Research Group, and tumour microenvironment (TME) score. The MTLR model achieved a 5-fold cross-validated concordance index of 72.1 ± 3.3%. This model found that the TME score and chemotherapy had similar survival effects over the entire study time. The TME score provided the greatest survival benefit beyond a 5-year follow-up. Stage III and Stage IV disease contributed the greatest negative effect on survival. The MTLR model weights were significantly correlated with the Cox model coefficients (Pearson coefficient = 0.86, p < 0.0001). We illustrate how ISDs can accurately predict the survival time for each patient, which is especially relevant in cases of molecular subtype heterogeneity. This study provides evidence that the TME score is principally associated with long-term survival in gastric adenocarcinoma. Additional external validation and investigation into the clinical utility of this ISD model in gastric cancer is an area of future research.

17.
J Affect Disord ; 357: 148-155, 2024 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-38670463

RESUMEN

BACKGROUND: Anxiety disorders are among the most common mental health disorders in the middle aged and older population. Because older individuals are more likely to have multiple comorbidities or increased frailty, the impact of anxiety disorders on their overall well-being is exacerbated. Early identification of anxiety disorders using machine learning (ML) can potentially mitigate the adverse consequences associated with these disorders. METHODS: We applied ML to the data from the Canadian Longitudinal Study on Aging (CLSA) to predict the onset of anxiety disorders approximately three years in the future. We used Shapley value-based methods to determine the top factor for prediction. We also investigated whether anxiety onset can be predicted by baseline depression-related predictors alone. RESULTS: Our model was able to predict anxiety onset accurately (Area under the Receiver Operating Characteristic Curve or AUC = 0.814 ± 0.016 (mean ± standard deviation), balanced accuracy = 0.741 ± 0.016, sensitivity = 0.743 ± 0.033, and specificity = 0.738 ± 0.010). The top predictive factors included prior depression or mood disorder diagnosis, high frailty, anxious personality, and low emotional stability. Depression and mood disorders are well known comorbidity of anxiety; however a prior depression or mood disorder diagnosis could not predict anxiety onset without other factors. LIMITATION: While our findings underscore the importance of a prior depression diagnosis in predicting anxiety, they also highlight that it alone is inadequate, signifying the necessity to incorporate additional predictors for improved prediction accuracy. CONCLUSION: Our study showcases promising prospects for using machine learning to develop personalized prediction models for anxiety onset in middle-aged and older adults using easy-to-access survey data.


Asunto(s)
Trastornos de Ansiedad , Aprendizaje Automático , Humanos , Femenino , Masculino , Canadá/epidemiología , Estudios Longitudinales , Anciano , Trastornos de Ansiedad/epidemiología , Trastornos de Ansiedad/diagnóstico , Trastornos de Ansiedad/psicología , Persona de Mediana Edad , Envejecimiento/psicología , Anciano de 80 o más Años , Depresión/epidemiología , Depresión/diagnóstico , Depresión/psicología , Comorbilidad , Fragilidad/diagnóstico , Fragilidad/epidemiología , Estudios Prospectivos , Ansiedad/epidemiología , Ansiedad/diagnóstico , Ansiedad/psicología
18.
Med Image Anal ; 97: 103257, 2024 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-38981282

RESUMEN

The alignment of tissue between histopathological whole-slide-images (WSI) is crucial for research and clinical applications. Advances in computing, deep learning, and availability of large WSI datasets have revolutionised WSI analysis. Therefore, the current state-of-the-art in WSI registration is unclear. To address this, we conducted the ACROBAT challenge, based on the largest WSI registration dataset to date, including 4,212 WSIs from 1,152 breast cancer patients. The challenge objective was to align WSIs of tissue that was stained with routine diagnostic immunohistochemistry to its H&E-stained counterpart. We compare the performance of eight WSI registration algorithms, including an investigation of the impact of different WSI properties and clinical covariates. We find that conceptually distinct WSI registration methods can lead to highly accurate registration performances and identify covariates that impact performances across methods. These results provide a comparison of the performance of current WSI registration methods and guide researchers in selecting and developing methods.

19.
BMC Bioinformatics ; 14: 61, 2013 Feb 22.
Artículo en Inglés | MEDLINE | ID: mdl-23432980

RESUMEN

BACKGROUND: Population stratification is a systematic difference in allele frequencies between subpopulations. This can lead to spurious association findings in the case-control genome wide association studies (GWASs) used to identify single nucleotide polymorphisms (SNPs) associated with disease-linked phenotypes. Methods such as self-declared ancestry, ancestry informative markers, genomic control, structured association, and principal component analysis are used to assess and correct population stratification but each has limitations. We provide an alternative technique to address population stratification. RESULTS: We propose a novel machine learning method, ETHNOPRED, which uses the genotype and ethnicity data from the HapMap project to learn ensembles of disjoint decision trees, capable of accurately predicting an individual's continental and sub-continental ancestry. To predict an individual's continental ancestry, ETHNOPRED produced an ensemble of 3 decision trees involving a total of 10 SNPs, with 10-fold cross validation accuracy of 100% using HapMap II dataset. We extended this model to involve 29 disjoint decision trees over 149 SNPs, and showed that this ensemble has an accuracy of ≥ 99.9%, even if some of those 149 SNP values were missing. On an independent dataset, predominantly of Caucasian origin, our continental classifier showed 96.8% accuracy and improved genomic control's λ from 1.22 to 1.11. We next used the HapMap III dataset to learn classifiers to distinguish European subpopulations (North-Western vs. Southern), East Asian subpopulations (Chinese vs. Japanese), African subpopulations (Eastern vs. Western), North American subpopulations (European vs. Chinese vs. African vs. Mexican vs. Indian), and Kenyan subpopulations (Luhya vs. Maasai). In these cases, ETHNOPRED produced ensembles of 3, 39, 21, 11, and 25 disjoint decision trees, respectively involving 31, 502, 526, 242 and 271 SNPs, with 10-fold cross validation accuracy of 86.5% ± 2.4%, 95.6% ± 3.9%, 95.6% ± 2.1%, 98.3% ± 2.0%, and 95.9% ± 1.5%. However, ETHNOPRED was unable to produce a classifier that can accurately distinguish Chinese in Beijing vs. Chinese in Denver. CONCLUSIONS: ETHNOPRED is a novel technique for producing classifiers that can identify an individual's continental and sub-continental heritage, based on a small number of SNPs. We show that its learned classifiers are simple, cost-efficient, accurate, transparent, flexible, fast, applicable to large scale GWASs, and robust to missing values.


Asunto(s)
Inteligencia Artificial , Frecuencia de los Genes , Grupos Raciales/genética , Pueblo Asiatico/genética , Población Negra/genética , Árboles de Decisión , Etnicidad/genética , Estudio de Asociación del Genoma Completo , Genotipo , Proyecto Mapa de Haplotipos , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple , Análisis de Componente Principal , Población Blanca/genética
20.
BMC Bioinformatics ; 14 Suppl 13: S3, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24266904

RESUMEN

BACKGROUND: This paper introduces and applies a genome wide predictive study to learn a model that predicts whether a new subject will develop breast cancer or not, based on her SNP profile. RESULTS: We first genotyped 696 female subjects (348 breast cancer cases and 348 apparently healthy controls), predominantly of Caucasian origin from Alberta, Canada using Affymetrix Human SNP 6.0 arrays. Then, we applied EIGENSTRAT population stratification correction method to remove 73 subjects not belonging to the Caucasian population. Then, we filtered any SNP that had any missing calls, whose genotype frequency was deviated from Hardy-Weinberg equilibrium, or whose minor allele frequency was less than 5%. Finally, we applied a combination of MeanDiff feature selection method and KNN learning method to this filtered dataset to produce a breast cancer prediction model. LOOCV accuracy of this classifier is 59.55%. Random permutation tests show that this result is significantly better than the baseline accuracy of 51.52%. Sensitivity analysis shows that the classifier is fairly robust to the number of MeanDiff-selected SNPs. External validation on the CGEMS breast cancer dataset, the only other publicly available breast cancer dataset, shows that this combination of MeanDiff and KNN leads to a LOOCV accuracy of 60.25%, which is significantly better than its baseline of 50.06%. We then considered a dozen different combinations of feature selection and learning method, but found that none of these combinations produces a better predictive model than our model. We also considered various biological feature selection methods like selecting SNPs reported in recent genome wide association studies to be associated with breast cancer, selecting SNPs in genes associated with KEGG cancer pathways, or selecting SNPs associated with breast cancer in the F-SNP database to produce predictive models, but again found that none of these models achieved accuracy better than baseline. CONCLUSIONS: We anticipate producing more accurate breast cancer prediction models by recruiting more study subjects, providing more accurate labelling of phenotypes (to accommodate the heterogeneity of breast cancer), measuring other genomic alterations such as point mutations and copy number variations, and incorporating non-genetic information about subjects such as environmental and lifestyle factors.


Asunto(s)
Neoplasias de la Mama/genética , Estudio de Asociación del Genoma Completo/métodos , Polimorfismo de Nucleótido Simple , Adulto , Algoritmos , Área Bajo la Curva , Canadá , Estudios de Casos y Controles , Femenino , Frecuencia de los Genes , Predisposición Genética a la Enfermedad , Humanos , Fenotipo , Factores de Riesgo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA