Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 48
Filtrar
1.
J Am Med Inform Assoc ; 31(3): 692-704, 2024 Feb 16.
Artículo en Inglés | MEDLINE | ID: mdl-38134953

RESUMEN

OBJECTIVES: Electronic health record (EHR) data may facilitate the identification of rare diseases in patients, such as aromatic l-amino acid decarboxylase deficiency (AADCd), an autosomal recessive disease caused by pathogenic variants in the dopa decarboxylase gene. Deficiency of the AADC enzyme results in combined severe reductions in monoamine neurotransmitters: dopamine, serotonin, epinephrine, and norepinephrine. This leads to widespread neurological complications affecting motor, behavioral, and autonomic function. The goal of this study was to use EHR data to identify previously undiagnosed patients who may have AADCd without available training cases for the disease. MATERIALS AND METHODS: A multiple symptom and related disease annotated dataset was created and used to train individual concept classifiers on annotated sentence data. A multistep algorithm was then used to combine concept predictions into a single patient rank value. RESULTS: Using an 8000-patient dataset that the algorithms had not seen before ranking, the top and bottom 200 ranked patients were manually reviewed for clinical indications of performing an AADCd diagnostic screening test. The top-ranked patients were 22.5% positively assessed for diagnostic screening, with 0% for the bottom-ranked patients. This result is statistically significant at P < .0001. CONCLUSION: This work validates the approach that large-scale rare-disease screening can be accomplished by combining predictions for relevant individual symptoms and related conditions which are much more common and for which training data is easier to create.


Asunto(s)
Errores Innatos del Metabolismo de los Aminoácidos , Descarboxilasas de Aminoácido-L-Aromático/deficiencia , Procesamiento de Lenguaje Natural , Enfermedades Raras , Humanos , Dopamina , Aprendizaje Automático
2.
JAMIA Open ; 5(2): ooac053, 2022 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-35783073

RESUMEN

Machine learning has the potential to improve identification of patients for appropriate diagnostic testing and treatment, including those who have rare diseases for which effective treatments are available, such as acute hepatic porphyria (AHP). We trained a machine learning model on 205 571 complete electronic health records from a single medical center based on 30 known cases to identify 22 patients with classic symptoms of AHP that had neither been diagnosed nor tested for AHP. We offered urine porphobilinogen testing to these patients via their clinicians. Of the 7 who agreed to testing, none were positive for AHP. We explore the reasons for this and provide lessons learned for further work evaluating machine learning to detect AHP and other rare diseases.

3.
AMIA Jt Summits Transl Sci Proc ; 2022: 406-413, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35854734

RESUMEN

Systematic reviews are extremely time-consuming. The goal of this work is to assess work savings and recall for a publication type filtering strategy that uses the output of two machine learning models, Multi-Tagger and web RCT Tagger, applied retrospectively to 10 systematic reviews on drug effectiveness. Our filtering strategy resulted in mean work savings of 33.6% and recall of 98.3%. Of 363 articles finally included in any of the systematic reviews, 7 were filtered out by our strategy, but 1 "error" was actually an article using a publication type that the SR team had not pre-specified as relevant for inclusion. Our analysis suggests that automated publication type filtering can potentially provide substantial work savings with minimal loss of included articles. Publication type filtering should be personalized for each systematic review and might be combined with other filtering or ranking methods to provide additional work savings for manual triage.

4.
Cancer Cell ; 40(8): 850-864.e9, 2022 08 08.
Artículo en Inglés | MEDLINE | ID: mdl-35868306

RESUMEN

Acute myeloid leukemia (AML) is a cancer of myeloid-lineage cells with limited therapeutic options. We previously combined ex vivo drug sensitivity with genomic, transcriptomic, and clinical annotations for a large cohort of AML patients, which facilitated discovery of functional genomic correlates. Here, we present a dataset that has been harmonized with our initial report to yield a cumulative cohort of 805 patients (942 specimens). We show strong cross-cohort concordance and identify features of drug response. Further, deconvoluting transcriptomic data shows that drug sensitivity is governed broadly by AML cell differentiation state, sometimes conditionally affecting other correlates of response. Finally, modeling of clinical outcome reveals a single gene, PEAR1, to be among the strongest predictors of patient survival, especially for young patients. Collectively, this report expands a large functional genomic resource, offers avenues for mechanistic exploration and drug development, and reveals tools for predicting outcome in AML.


Asunto(s)
Leucemia Mieloide Aguda , Diferenciación Celular , Estudios de Cohortes , Humanos , Leucemia Mieloide Aguda/tratamiento farmacológico , Leucemia Mieloide Aguda/genética , Receptores de Superficie Celular/genética , Transcriptoma
5.
JAMIA Open ; 5(1): ooac015, 2022 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-35571360

RESUMEN

Objectives: To produce a systematic review (SR), reviewers typically screen thousands of titles and abstracts of articles manually to find a small number which are read in full text to find relevant articles included in the final SR. Here, we evaluate a proposed automated probabilistic publication type screening strategy applied to the randomized controlled trial (RCT) articles (i.e., those which present clinical outcome results of RCT studies) included in a corpus of previously published Cochrane reviews. Materials and Methods: We selected a random subset of 558 published Cochrane reviews that specified RCT study only inclusion criteria, containing 7113 included articles which could be matched to PubMed identifiers. These were processed by our automated RCT Tagger tool to estimate the probability that each article reports clinical outcomes of a RCT. Results: Removing articles with low predictive scores P < 0.01 eliminated 288 included articles, of which only 22 were actually typical RCT articles, and only 18 were actually typical RCT articles that MEDLINE indexed as such. Based on our sample set, this screening strategy led to fewer than 0.05 relevant RCT articles being missed on average per Cochrane SR. Discussion: This scenario, based on real SRs, demonstrates that automated tagging can identify RCT articles accurately while maintaining very high recall. However, we also found that even SRs whose inclusion criteria are restricted to RCT studies include not only clinical outcome articles per se, but a variety of ancillary article types as well. Conclusions: This encourages further studies learning how best to incorporate automated tagging of additional publication types into SR triage workflows.

6.
AMIA Jt Summits Transl Sci Proc ; 2021: 267-275, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34457141

RESUMEN

Errors and incompleteness in electronic health record (EHR) medication lists can result in medical errors. To reduce errors in these medication lists, clinicians use patient self-reported data to reconcile EHR data. We assessed the agreement between patient self-reported medications and medications recorded in the EHR for six medication classes related to cardiovascular care and used logistic regression models to determine which patient-related factors were associated with the disagreement between these two information sources. From our 297 patients, we found self-reported medications had an overall above-average agreement with the EHR (? = .727). We observed the highest agreement level for statins (? = .831) and the lowest for other antihypertensives (? = .465). Agreement was less likely for Hispanic and male patients. We also performed an in-depth error analysis of different types of disagreement beyond medication names, which revealed that the most frequent type of disagreement was mismatched dosages.


Asunto(s)
Cardiología , Registros Electrónicos de Salud , Antihipertensivos , Humanos , Masculino
7.
JAMIA Open ; 3(3): 395-404, 2020 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-33215074

RESUMEN

OBJECTIVE: Growing numbers of academic medical centers offer patient cohort discovery tools to their researchers, yet the performance of systems for this use case is not well understood. The objective of this research was to assess patient-level information retrieval methods using electronic health records for different types of cohort definition retrieval. MATERIALS AND METHODS: We developed a test collection consisting of about 100 000 patient records and 56 test topics that characterized patient cohort requests for various clinical studies. Automated information retrieval tasks using word-based approaches were performed, varying 4 different parameters for a total of 48 permutations, with performance measured using B-Pref. We subsequently created structured Boolean queries for the 56 topics for performance comparisons. In addition, we performed a more detailed analysis of 10 topics. RESULTS: The best-performing word-based automated query parameter settings achieved a mean B-Pref of 0.167 across all 56 topics. The way a topic was structured (topic representation) had the largest impact on performance. Performance not only varied widely across topics, but there was also a large variance in sensitivity to parameter settings across the topics. Structured queries generally performed better than automated queries on measures of recall and precision but were still not able to recall all relevant patients found by the automated queries. CONCLUSION: While word-based automated methods of cohort retrieval offer an attractive solution to the labor-intensive nature of this task currently used at many medical centers, we generally found suboptimal performance in those approaches, with better performance obtained from structured Boolean queries. Future work will focus on using the test collection to develop and evaluate new approaches to query structure, weighting algorithms, and application of semantic methods.

9.
PLoS One ; 15(7): e0235574, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32614911

RESUMEN

BACKGROUND: With the growing adoption of the electronic health record (EHR) worldwide over the last decade, new opportunities exist for leveraging EHR data for detection of rare diseases. Rare diseases are often not diagnosed or delayed in diagnosis by clinicians who encounter them infrequently. One such rare disease that may be amenable to EHR-based detection is acute hepatic porphyria (AHP). AHP consists of a family of rare, metabolic diseases characterized by potentially life-threatening acute attacks and chronic debilitating symptoms. The goal of this study was to apply machine learning and knowledge engineering to a large extract of EHR data to determine whether they could be effective in identifying patients not previously tested for AHP who should receive a proper diagnostic workup for AHP. METHODS AND FINDINGS: We used an extract of the complete EHR data of 200,000 patients from an academic medical center and enriched it with records from an additional 5,571 patients containing any mention of porphyria in the record. After manually reviewing the records of all 47 unique patients with the ICD-10-CM code E80.21 (Acute intermittent [hepatic] porphyria), we identified 30 patients who were positive cases for our machine learning models, with the rest of the patients used as negative cases. We parsed the record into features, which were scored by frequency of appearance and filtered using univariate feature analysis. We manually choose features not directly tied to provider attributes or suspicion of the patient having AHP. We trained on the full dataset, with the best cross-validation performance coming from support vector machine (SVM) algorithm using a radial basis function (RBF) kernel. The trained model was applied back to the full data set and patients were ranked by margin distance. The top 100 ranked negative cases were manually reviewed for symptom complexes similar to AHP, finding four patients where AHP diagnostic testing was likely indicated and 18 patients where AHP diagnostic testing was possibly indicated. From the top 100 ranked cases of patients with mention of porphyria in their record, we identified four patients for whom AHP diagnostic testing was possibly indicated and had not been previously performed. Based solely on the reported prevalence of AHP, we would have expected only 0.002 cases out of the 200 patients manually reviewed. CONCLUSIONS: The application of machine learning and knowledge engineering to EHR data may facilitate the diagnosis of rare diseases such as AHP. Further work will recommend clinical investigation to identified patients' clinicians, evaluate more patients, assess additional feature selection and machine learning algorithms, and apply this methodology to other rare diseases. This work provides strong evidence that population-level informatics can be applied to rare diseases, greatly improving our ability to identify undiagnosed patients, and in the future improve the care of these patients and our ability study these diseases. The next step is to learn how best to apply these EHR-based machine learning approaches to benefit individual patients with a clinical study that provides diagnostic testing and clinical follow up for those identified as possibly having undiagnosed AHP.


Asunto(s)
Conocimiento , Aprendizaje Automático , Porfobilinógeno Sintasa/deficiencia , Porfirias Hepáticas/diagnóstico , Bases de Datos Factuales , Registros Electrónicos de Salud , Femenino , Humanos , Masculino , Porfirias Hepáticas/patología
10.
Database (Oxford) ; 20202020 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-32525207

RESUMEN

Clinical case reports are the 'eyewitness reports' of medicine and provide a valuable, unique, albeit noisy and underutilized type of evidence. Generally, a case report has a single main finding that represents the reason for writing up the report in the first place. However, no one has previously created an automatic way of identifying main finding sentences in case reports. We previously created a manual corpus of main finding sentences extracted from the abstracts and full text of clinical case reports. Here, we have utilized the corpus to create a machine learning-based model that automatically predicts which sentence(s) from abstracts state the main finding. The model has been evaluated on a separate manual corpus of clinical case reports and found to have good performance. This is a step toward setting up a retrieval system in which, given one case report, one can find other case reports that report the same or very similar main findings. The code and necessary files to run the main finding model can be downloaded from https://github.com/qi29/main_ finding_recognition, released under the Apache License, Version 2.0.


Asunto(s)
Minería de Datos/métodos , Aprendizaje Automático , Registros Médicos/clasificación , Humanos , Procesamiento de Lenguaje Natural , Programas Informáticos
11.
Amyloid ; 26(3): 139-147, 2019 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-31210531

RESUMEN

Objective: Amyloid A (AA) amyloidosis is found in humans and non-human primates, but quantifying disease risk prior to clinical symptoms is challenging. We applied machine learning to identify the best predictors of amyloidosis in rhesus macaques from available clinical and pathology records. To explore potential biomarkers, we also assessed whether changes in circulating serum amyloid A (SAA) or lipoprotein profiles accompany the disease. Methods: We conducted a retrospective study using 86 cases and 163 controls matched for age and sex. We performed data reduction on 62 clinical, pathological and demographic variables, and applied multivariate modelling and model selection with cross-validation. To test the performance of our final model, we applied it to a replication cohort of 2,775 macaques. Results: The strongest predictors of disease were colitis, gastrointestinal adenocarcinoma, endometriosis, arthritis, trauma, diarrhoea and number of pregnancies. Sensitivity and specificity of the risk model were predicted to be 82%, and were assessed at 79 and 72%, respectively. Total, low density lipoprotein and high density lipoprotein cholesterol levels were significantly lower, and SAA levels and triglyceride-to-HDL ratios were significantly higher in cases versus controls. Conclusion: Machine learning is a powerful approach to identifying macaques at risk of AA amyloidosis, which is accompanied by increased circulating SAA and altered lipoprotein profiles.


Asunto(s)
Amiloidosis/diagnóstico , Aprendizaje Automático/estadística & datos numéricos , Modelos Estadísticos , Proteína Amiloide A Sérica/metabolismo , Adenocarcinoma/diagnóstico , Adenocarcinoma/fisiopatología , Amiloidosis/sangre , Amiloidosis/fisiopatología , Animales , Artritis/diagnóstico , Artritis/fisiopatología , Biomarcadores/sangre , Estudios de Casos y Controles , HDL-Colesterol/sangre , LDL-Colesterol/sangre , Colitis/diagnóstico , Colitis/fisiopatología , Diarrea/diagnóstico , Diarrea/fisiopatología , Modelos Animales de Enfermedad , Endometriosis/diagnóstico , Endometriosis/fisiopatología , Femenino , Neoplasias Gastrointestinales/diagnóstico , Neoplasias Gastrointestinales/fisiopatología , Humanos , Macaca mulatta , Masculino , Estudios Retrospectivos , Factores de Riesgo , Triglicéridos/sangre , Heridas y Lesiones/diagnóstico , Heridas y Lesiones/fisiopatología
12.
J Biomed Inform ; 90: 103096, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-30654030

RESUMEN

Neural embeddings are a popular set of methods for representing words, phrases or text as a low dimensional vector (typically 50-500 dimensions). However, it is difficult to interpret these dimensions in a meaningful manner, and creating neural embeddings requires extensive training and tuning of multiple parameters and hyperparameters. We present here a simple unsupervised method for representing words, phrases or text as a low dimensional vector, in which the meaning and relative importance of dimensions is transparent to inspection. We have created a near-comprehensive vector representation of words, and selected bigrams, trigrams and abbreviations, using the set of titles and abstracts in PubMed as a corpus. This vector is used to create several novel implicit word-word and text-text similarity metrics. The implicit word-word similarity metrics correlate well with human judgement of word pair similarity and relatedness, and outperform or equal all other reported methods on a variety of biomedical benchmarks, including several implementations of neural embeddings trained on PubMed corpora. Our implicit word-word metrics capture different aspects of word-word relatedness than word2vec-based metrics and are only partially correlated (rho = 0.5-0.8 depending on task and corpus). The vector representations of words, bigrams, trigrams, abbreviations, and PubMed title + abstracts are all publicly available from http://arrowsmith.psych.uic.edu/arrowsmith_uic/word_similarity_metrics.html for release under CC-BY-NC license. Several public web query interfaces are also available at the same site, including one which allows the user to specify a given word and view its most closely related terms according to direct co-occurrence as well as different implicit similarity metrics.


Asunto(s)
Minería de Datos , PubMed , Semántica
13.
AMIA Annu Symp Proc ; 2019: 903-912, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-32308887

RESUMEN

Structured electronic health record (EHR) data are often used for quality measurement and improvement, clinical research, and other secondary uses. These data, however, are known to suffer from quality problems. There may be value in augmenting structured EHR data to improve data quality, thereby improving the reliability and validity of the conclusions drawn from those data. Focusing on five diagnoses related to cardiovascular care, this paper considers the added value of two alternative data sources: manual chart abstraction and patient self-report. We assess the overall agreement between structured EHR problem list data, abstracted EHR data, and patient self- report; and explore possible causes of disagreement between those sources. Our findings suggest that both chart abstraction and patient self-report contain significantly more diagnoses than the problem list, but that the information they capture is different. Methods for collecting and validating self-reported medical data require further consideration and exploration.


Asunto(s)
Registros Electrónicos de Salud , Almacenamiento y Recuperación de la Información , Autoinforme , Adulto , Anciano , Anciano de 80 o más Años , Exactitud de los Datos , Femenino , Humanos , Masculino , Registros Médicos Orientados a Problemas , Persona de Mediana Edad , Reproducibilidad de los Resultados , Adulto Joven
14.
Database (Oxford) ; 2018: 1-8, 2018 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-30184195

RESUMEN

The Medical Subject Heading 'Humans' is manually curated and indicates human-related studies within MEDLINE. However, newly published MEDLINE articles may take months to be indexed and non-MEDLINE articles lack consistent, transparent indexing of this feature. Therefore, for up to date and broad literature searches, there is a need for an independent automated system to identify whether a given publication is human-related, particularly when they lack Medical Subject Headings. One million MEDLINE records published in 1987-2014 were randomly selected. Text-based features from the title, abstract, author name and journal fields were extracted. A linear support vector machine was trained to estimate the probability that a given article should be indexed as Humans and was evaluated on records from 2015 to 2016. Overall accuracy was high: area under the receiver operating curve = 0.976, F1 = 95% relative to MeSH indexing. Manual review of cases of extreme disagreement with MEDLINE showed 73.5% agreement with the automated prediction. We have tagged all articles indexed in PubMed with predictive scores and have made the information publicly available at http://arrowsmith.psych.uic.edu/evidence_based_medicine/index.html. We have also made available a web-based interface to allow users to obtain predictive scores for non-MEDLINE articles. This will assist in the triage of clinical evidence for writing systematic reviews.


Asunto(s)
Automatización , Probabilidad , Publicaciones , Calibración , Bases de Datos como Asunto , Humanos , Reproducibilidad de los Resultados
15.
Data Inf Manag ; 2(1): 27-36, 2018 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-30766970

RESUMEN

Many investigators have carried out text mining of the biomedical literature for a variety of purposes, ranging from the assignment of indexing terms to the disambiguation of author names. A common approach is to define positive and negative training examples, extract features from article metadata, and employ machine learning algorithms. At present, each research group tackles each problem from scratch, and in isolation of other projects, which causes redundancy and great waste of effort. Here, we propose and describe the design of a generic platform for biomedical text mining, which can serve as a shared resource for machine learning projects, and can serve as a public repository for their outputs. We will initially focus on a specific goal, namely, classifying articles according to Publication Type, and emphasize how feature sets can be made more powerful and robust through the use of multiple, heterogeneous similarity measures as input to machine learning models. We then discuss how the generic platform can be extended to include a wide variety of other machine learning based goals and projects, and can be used as a public platform for disseminating the results of NLP tools to end-users as well.

16.
J Am Med Inform Assoc ; 24(6): 1165-1168, 2017 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-28541493

RESUMEN

OBJECTIVES: Identifying all published reports of randomized controlled trials (RCTs) is an important aim, but it requires extensive manual effort to separate RCTs from non-RCTs, even using current machine learning (ML) approaches. We aimed to make this process more efficient via a hybrid approach using both crowdsourcing and ML. METHODS: We trained a classifier to discriminate between citations that describe RCTs and those that do not. We then adopted a simple strategy of automatically excluding citations deemed very unlikely to be RCTs by the classifier and deferring to crowdworkers otherwise. RESULTS: Combining ML and crowdsourcing provides a highly sensitive RCT identification strategy (our estimates suggest 95%-99% recall) with substantially less effort (we observed a reduction of around 60%-80%) than relying on manual screening alone. CONCLUSIONS: Hybrid crowd-ML strategies warrant further exploration for biomedical curation/annotation tasks.


Asunto(s)
Colaboración de las Masas , Almacenamiento y Recuperación de la Información/métodos , Aprendizaje Automático , Ensayos Clínicos Controlados Aleatorios como Asunto , Investigación Biomédica , Bases de Datos Bibliográficas , Procesamiento de Lenguaje Natural , Curva ROC , Literatura de Revisión como Asunto , Máquina de Vectores de Soporte
17.
AMIA Annu Symp Proc ; 2017: 660-669, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-29854131

RESUMEN

Objective: Secondary use of electronic health record (EHR) data is enabled by accurate and complete retrieval of the relevant patient cohort, which requires searching both structured and unstructured data. Clinical text poses difficulties to searching, although chart notes incorporate structure that may facilitate accurate retrieval. Methods: We developed rules identifying clinical document sections, which can be indexed in search engines that allow faceted searches, such as Lucene or Essie, an NLM search engine. We developed 22 clinical cohorts and two queries for each cohort, one utilizing section headings and the other searching the whole document. We manually evaluated a subset of retrieved documents to compare query performance. Results: Querying by section had lower recall than whole-document queries (0.83 vs 0.95), higher precision (0.73 vs 0.54), and higher F1 (0.78 vs 0.69). Conclusion: This evaluation suggests that searching specific sections may improve precision under certain conditions and often with loss of recall.


Asunto(s)
Registros Electrónicos de Salud , Almacenamiento y Recuperación de la Información/métodos , Motor de Búsqueda , Indización y Redacción de Resúmenes , Humanos
18.
AMIA Annu Symp Proc ; 2016: 1229-1237, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-28269920

RESUMEN

Clinical quality measures (CQMs) are important tools for the assessment and improvement of health care quality. Federal requirements initially set forth in the American Recovery and Reinvestment Act, and advanced in subsequent stages of the requirements, codified electronic health record (EHR)-based CQM reporting, and have made automated CQM implementation a priority amongst the clinical and informatics communities. Nevertheless, the processes surrounding CQM implementation and validation remain complex, time-consuming, and largely undefined. We collected issue-tracking data during the course of an agile and rigorous collaborative project to build an analytics platform for the Knight Cardiovascular Institute at OHSU, with nine heart failure CQMs defined by the American College of Cardiology (ACC) as an exemplar. Using a mixed methods approach we provide an overview of our CQM implementation and validation process, identify major roadblocks and bottlenecks, and make recommendations for other professionals working in the area of health care quality assessment and improvement.


Asunto(s)
Registros Electrónicos de Salud , Garantía de la Calidad de Atención de Salud/métodos , Insuficiencia Cardíaca , Humanos , Calidad de la Atención de Salud , Estados Unidos , Estudios de Validación como Asunto
19.
PLoS One ; 10(10): e0139233, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26426747

RESUMEN

To assess the value of exosomal miRNAs as biomarkers for Alzheimer disease (AD), the expression of microRNAs was measured in a plasma fraction enriched in exosomes by differential centrifugation, using Illumina deep sequencing. Samples from 35 persons with a clinical diagnosis of AD dementia were compared to 35 age and sex matched controls. Although these samples contained less than 0.1 microgram of total RNA, deep sequencing gave reliable and informative results. Twenty miRNAs showed significant differences in the AD group in initial screening (miR-23b-3p, miR-24-3p, miR-29b-3p, miR-125b-5p, miR-138-5p, miR-139-5p, miR-141-3p, miR-150-5p, miR-152-3p, miR-185-5p, miR-338-3p, miR-342-3p, miR-342-5p, miR-548at-5p, miR-659-5p, miR-3065-5p, miR-3613-3p, miR-3916, miR-4772-3p, miR-5001-3p), many of which satisfied additional biological and statistical criteria, and among which a panel of seven miRNAs were highly informative in a machine learning model for predicting AD status of individual samples with 83-89% accuracy. This performance is not due to over-fitting, because a) we used separate samples for training and testing, and b) similar performance was achieved when tested on technical replicate data. Perhaps the most interesting single miRNA was miR-342-3p, which was a) expressed in the AD group at about 60% of control levels, b) highly correlated with several of the other miRNAs that were significantly down-regulated in AD, and c) was also reported to be down-regulated in AD in two previous studies. The findings warrant replication and follow-up with a larger cohort of patients and controls who have been carefully characterized in terms of cognitive and imaging data, other biomarkers (e.g., CSF amyloid and tau levels) and risk factors (e.g., apoE4 status), and who are sampled repeatedly over time. Integrating miRNA expression data with other data is likely to provide informative and robust biomarkers in Alzheimer disease.


Asunto(s)
Enfermedad de Alzheimer/genética , Biomarcadores de Tumor/metabolismo , Exosomas/genética , Regulación Neoplásica de la Expresión Génica , MicroARNs/genética , Plasma/metabolismo , Animales , Estudios de Casos y Controles , Femenino , Perfilación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Masculino , Ratones
20.
J Am Med Inform Assoc ; 22(3): 707-17, 2015 May.
Artículo en Inglés | MEDLINE | ID: mdl-25656516

RESUMEN

OBJECTIVE: For many literature review tasks, including systematic review (SR) and other aspects of evidence-based medicine, it is important to know whether an article describes a randomized controlled trial (RCT). Current manual annotation is not complete or flexible enough for the SR process. In this work, highly accurate machine learning predictive models were built that include confidence predictions of whether an article is an RCT. MATERIALS AND METHODS: The LibSVM classifier was used with forward selection of potential feature sets on a large human-related subset of MEDLINE to create a classification model requiring only the citation, abstract, and MeSH terms for each article. RESULTS: The model achieved an area under the receiver operating characteristic curve of 0.973 and mean squared error of 0.013 on the held out year 2011 data. Accurate confidence estimates were confirmed on a manually reviewed set of test articles. A second model not requiring MeSH terms was also created, and performs almost as well. DISCUSSION: Both models accurately rank and predict article RCT confidence. Using the model and the manually reviewed samples, it is estimated that about 8000 (3%) additional RCTs can be identified in MEDLINE, and that 5% of articles tagged as RCTs in Medline may not be identified. CONCLUSION: Retagging human-related studies with a continuously valued RCT confidence is potentially more useful for article ranking and review than a simple yes/no prediction. The automated RCT tagging tool should offer significant savings of time and effort during the process of writing SRs, and is a key component of a multistep text mining pipeline that we are building to streamline SR workflow. In addition, the model may be useful for identifying errors in MEDLINE publication types. The RCT confidence predictions described here have been made available to users as a web service with a user query form front end at: http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/RCT_Tagger.cgi.


Asunto(s)
Inteligencia Artificial , Almacenamiento y Recuperación de la Información/métodos , Ensayos Clínicos Controlados Aleatorios como Asunto , Literatura de Revisión como Asunto , Máquina de Vectores de Soporte , Medicina Basada en la Evidencia , Humanos , MEDLINE , Curva ROC
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA