Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Intern Med J ; 52(7): 1215-1224, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-33755285

RESUMEN

BACKGROUND: Patients with cancer are at high risk for infection, but the epidemiology of healthcare-associated Staphylococcus aureus bacteraemia (HA-SAB) and Clostridioides difficile infection (HA-CDI) in Australian cancer patients has not previously been reported. AIMS: To compare the cumulative aggregate incidence and time trends of HA-SAB and HA-CDI in a predefined cancer cohort with a mixed statewide patient population in Victoria, Australia. METHODS: All SAB and CDI events in patients admitted to Victorian healthcare facilities between 1 July 2010 and 31 December 2018 were submitted to the Victorian Healthcare Associated Infection Surveillance System Coordinating Centre. Descriptive analyses and multilevel mixed-effects Poisson regression modelling were applied to a standardised data extract. RESULTS: In total, 10 608 and 13 118 SAB and CDI events were reported across 139 Victorian healthcare facilities, respectively. Of these, 89 (85%) and 279 (88%) were healthcare-associated in the cancer cohort compared with 34% (3561/10 503) and 66% (8403/12 802) in the statewide cohort. The aggregate incidence was more than twofold higher in the cancer cohort compared with the statewide cohort for HA-SAB (2.25 (95% confidence interval (CI): 1.74-2.77) vs 1.11 (95% CI: 1.07-1.15) HA-SAB/10 000 occupied bed-days) and threefold higher for HA-CDI (6.26 (95% CI: 5.12-7.41) vs 2.31 (95% CI: 2.21-2.42) HA-CDI/10 000 occupied bed-days). Higher quarterly diminishing rates were observed in the cancer cohort than the statewide data for both infections. CONCLUSIONS: Our findings demonstrate a higher burden of HA-SAB and HA-CDI in a cancer cohort when compared with state data and highlight the need for cancer-specific targets and benchmarks to meaningfully support quality improvement.


Asunto(s)
Bacteriemia , Infecciones por Clostridium , Infección Hospitalaria , Neoplasias , Infecciones Estafilocócicas , Bacteriemia/epidemiología , Infecciones por Clostridium/epidemiología , Infección Hospitalaria/epidemiología , Atención a la Salud , Humanos , Neoplasias/epidemiología , Infecciones Estafilocócicas/epidemiología , Staphylococcus aureus , Victoria/epidemiología
2.
Support Care Cancer ; 28(12): 6023-6034, 2020 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-32291600

RESUMEN

PURPOSE: Patients with cancer are at increased risk for infection, but the relative morbidity and mortality of all infections is not well understood. The objectives of this study were to determine the prevalence, incidence, time-trends and risk of mortality of infections associated with hospital admissions in patients with haematological- and solid-tumour malignancies over 11 years. METHODS: A retrospective, longitudinal cohort study of inpatient admissions between 1 January 2007 and 31 December 2017 at the Peter MacCallum Cancer Centre was conducted using administratively coded and patient demographics data. Descriptive analyses, autoregressive integrated moving average, Kaplan-Meier and Cox regression modelling were applied. RESULTS: Of 45,116 inpatient hospitalisations consisting of 3033 haematological malignancy (HM), 18,372 solid tumour neoplasm (STN) patients and 953 autologous haematopoietic stem cell transplantation recipients, 67%, 29% and 88% were coded with ≥ 1 infection, respectively. Gastrointestinal tract and bloodstream infections were observed with the highest incidence, and bloodstream infection rates increased significantly over time in both HM- and STN-cohorts. Inpatient length of stay was significantly higher in exposed patients with coded infection compared to unexposed in HM- and STN-cohorts (22 versus 4 days [p < 0.001] and 15 versus 4 days [p < 0.001], respectively). Risk of in-hospital mortality was higher in exposed than unexposed patients in the STN-cohort (adjusted hazard ratio [aHR] 1.61 [95% CI 1.41-1.83]; p < 0.001)) and HM-cohort (aHR 1.30 [95% CI 0.90-1.90]; p = 0.166). CONCLUSION: Infection burden among cancer patients is substantial and findings reflect the need for targeted surveillance in high-risk patient groups (e.g. haematological malignancy), in whom enhanced monitoring may be required to support infection prevention strategies.


Asunto(s)
Infección Hospitalaria/epidemiología , Neoplasias/epidemiología , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Australia/epidemiología , Instituciones Oncológicas , Estudios de Cohortes , Infección Hospitalaria/diagnóstico , Femenino , Mortalidad Hospitalaria , Humanos , Incidencia , Estudios Longitudinales , Masculino , Persona de Mediana Edad , Neoplasias/complicaciones , Neoplasias/diagnóstico , Pronóstico , Estudios Retrospectivos , Adulto Joven
3.
BMC Med Inform Decis Mak ; 16 Suppl 1: 68, 2016 07 18.
Artículo en Inglés | MEDLINE | ID: mdl-27454860

RESUMEN

BACKGROUND: The Variome corpus, a small collection of published articles about inherited colorectal cancer, includes annotations of 11 entity types and 13 relation types related to the curation of the relationship between genetic variation and disease. Due to the richness of these annotations, the corpus provides a good testbed for evaluation of biomedical literature information extraction systems. METHODS: In this paper, we focus on assessing performance on extracting the relations in the corpus, using gold standard entities as a starting point, to establish a baseline for extraction of relations important for extraction of genetic variant information from the literature. We test the application of the Public Knowledge Discovery Engine for Java (PKDE4J) system, a natural language processing system designed for information extraction of entities and relations in text, on the relation extraction task using this corpus. RESULTS: For the relations which are attested at least 100 times in the Variome corpus, we realise a performance ranging from 0.78-0.84 Precision-weighted F-score, depending on the relation. We find that the PKDE4J system adapted straightforwardly to the range of relation types represented in the corpus; some extensions to the original methodology were required to adapt to the multi-relational classification context. The results are competitive with state-of-the-art relation extraction performance on more heavily studied corpora, although the analysis shows that the Recall of a co-occurrence baseline outweighs the benefit of improved Precision for many relations, indicating the value of simple semantic constraints on relations. CONCLUSIONS: This work represents the first attempt to apply relation extraction methods to the Variome corpus. The results demonstrate that automated methods have good potential to structure the information expressed in the published literature related to genetic variants, connecting mutations to genes, diseases, and patient cohorts. Further development of such approaches will facilitate more efficient biocuration of genetic variant information into structured databases, leveraging the knowledge embedded in the vast publication literature.


Asunto(s)
Neoplasias Colorrectales/genética , Minería de Datos/métodos , Bases de Datos Genéticas , Variación Genética/genética , Humanos
4.
Prev Vet Med ; 223: 106112, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38176151

RESUMEN

BACKGROUND: Temporal phenotyping of patient journeys, which capture the common sequence patterns of interventions in the treatment of a specific condition, is useful to support understanding of antimicrobial usage in veterinary patients. Identifying and describing these phenotypes can inform antimicrobial stewardship programs designed to fight antimicrobial resistance, a major health crisis affecting both humans and animals, in which veterinarians have an important role to play. OBJECTIVE: This research proposes a framework for extracting temporal phenotypes of patient journeys from clinical practice data through the application of natural language processing (NLP) and unsupervised machine learning (ML) techniques, using cat bite abscesses as a model condition. By constructing temporal phenotypes from key events, the relationship between antimicrobial administration and surgical interventions can be described, and similar treatment patterns can be grouped together to describe outcomes associated with specific antimicrobial selection. METHODS: Cases identified as having a cat bite abscess as a diagnosis were extracted from VetCompass Australia, a database of veterinary clinical records. A classifier was trained and used to label the most clinically relevant event features in each record as chosen by a group of veterinarians. The labeled records were processed into coded character strings, where each letter represents a summary of specific types of treatments performed at a given visit. The sequences of letters representing the cases were clustered based on weighted Levenshtein edit distances with KMeans+ + to identify the main variations of the patient treatment journeys, including the antimicrobials used and their duration of administration. RESULTS: A total of 13,744 records that met the selection criteria was extracted and grouped into 8436 cases. There were 9 clinically distinct event sequence patterns (temporal phenotypes) of patient journeys identified, representing the main sequences in which surgery and antimicrobial interventions are performed. Patients receiving amoxicillin and surgery had the shortest duration of antimicrobial administration (median of 3.4 days) and patients receiving cefovecin with no surgical intervention had the longest antimicrobial treatment duration (median of 27 days). CONCLUSION: Our study demonstrates methods to extract and provide an overview of temporal phenotypes of patient journeys, which can be applied to text-based clinical records for multiple species or clinical conditions. We demonstrate the effectiveness of this approach to derive real-world evidence of treatment impacts using cat bite abscesses as a model condition to describe patterns of antimicrobial therapy prescriptions and their outcomes.


Asunto(s)
Antiinfecciosos , Mordeduras y Picaduras , Humanos , Animales , Absceso/veterinaria , Procesamiento de Lenguaje Natural , Amoxicilina , Mordeduras y Picaduras/veterinaria , Análisis por Conglomerados
5.
Health Inf Manag ; : 18333583221131753, 2022 Nov 14.
Artículo en Inglés | MEDLINE | ID: mdl-36374542

RESUMEN

BACKGROUND: The Australian hospital-acquired complication (HAC) policy was introduced to facilitate negative funding adjustments in Australian hospitals using ICD-10-AM codes. OBJECTIVE: The aim of this study was to determine the positive predictive value (PPV) of the ICD-10-AM codes in the HAC framework to detect hospital-acquired pneumonia in patients with cancer and to describe any change in PPV before and after implementation of an electronic medical record (EMR) at our centre. METHOD: A retrospective case review of all coded pneumonia episodes at the Peter MacCallum Cancer Centre in Melbourne, Australia spanning two time periods (01 July 2015 to 30 June 2017 [pre-EMR period] and 01 September 2020 to 28 February 2021 [EMR period]) was performed to determine the proportion of events satisfying standardised surveillance definitions. RESULTS: HAC-coded pneumonia occurred in 3.66% (n = 151) of 41,260 separations during the study period. Of the 151 coded pneumonia separations, 27 satisfied consensus surveillance criteria, corresponding to an overall PPV of 0.18 (95% CI: 0.12, 0.25). The PPV was approximately three times higher following EMR implementation (0.34 [95% CI: 0.19, 0.53] versus 0.13 [95% CI: 0.08, 0.21]; p = .013). CONCLUSION: The current HAC definition is a poor-to-moderate classifier for hospital-acquired pneumonia in patients with cancer and, therefore, may not accurately reflect hospital-level quality improvement. Implementation of an EMR did enhance case detection, and future refinements to administratively coded data in support of robust monitoring frameworks should focus on EMR systems. IMPLICATIONS: Although ICD-10-AM data are readily available in Australian healthcare settings, these data are not sufficient for monitoring and reporting of hospital-acquired pneumonia in haematology-oncology patients.

6.
JAC Antimicrob Resist ; 4(1): dlab194, 2022 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-35156027

RESUMEN

BACKGROUND: As antimicrobial prescribers, veterinarians contribute to the emergence of MDR pathogens. Antimicrobial stewardship programmes are an effective means of reducing the rate of development of antimicrobial resistance. A key component of antimicrobial stewardship programmes is selecting an appropriate antimicrobial agent for the presenting complaint and using an appropriate dose rate for an appropriate duration. OBJECTIVES: To describe antimicrobial usage, including dose, for common indications for antimicrobial use in companion animal practice. METHODS: Natural language processing (NLP) techniques were applied to extract and analyse clinical records. RESULTS: A total of 343 668 records for dogs and 109 719 records for cats administered systemic antimicrobials from 1 January 2013 to 31 December 2017 were extracted from the database. The NLP algorithms extracted dose, duration of therapy and diagnosis completely for 133 046 (39%) of the records for dogs and 40 841 records for cats (37%). The remaining records were missing one or more of these elements in the clinical data. The most common reason for antimicrobial administration was skin disorders (n = 66 198, 25%) and traumatic injuries (n = 15 932, 19%) in dogs and cats, respectively. Dose was consistent with guideline recommendations in 73% of cases where complete clinical data were available. CONCLUSIONS: Automated extraction using NLP methods is a powerful tool to evaluate large datasets and to enable veterinarians to describe the reasons that antimicrobials are administered. However, this can only be determined when the data presented in the clinical record are complete, which was not the case in most instances in this dataset. Most importantly, the dose administered varied and was often not consistent with guideline recommendations.

7.
PLoS One ; 15(3): e0230049, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32168354

RESUMEN

Antimicrobial Resistance is a global crisis that veterinarians contribute to through their use of antimicrobials in animals. Antimicrobial stewardship has been shown to be an effective means to reduce antimicrobial resistance in hospital environments. Effective monitoring of antimicrobial usage patterns is an essential part of antimicrobial stewardship and is critical in reducing the development of antimicrobial resistance. The aim of this study is to describe how frequently antimicrobials were used in veterinary consultations and identify the most frequently used antimicrobials. Using VetCompass Australia, Natural Language Processing techniques, and the Australian Strategic Technical Advisory Group's (ASTAG) Rating system to classify the importance of antimicrobials, descriptive analysis was performed on the antimicrobials prescribed in consultations from 137 companion animal veterinary clinics in Australia between 2013 and 2017 (inclusive). Of the 4,400,519 consultations downloaded there were 595,089 consultations where antimicrobials were prescribed to dogs or cats. Antimicrobials were dispensed in 145 of every 1000 canine consultations; and 38 per 1000 consultations involved high importance rated antimicrobials. Similarly with cats, 108 per 1000 consultations had antimicrobials dispensed, and in 47 per 1000 consultations an antimicrobial of high importance rating was administered. The most common antimicrobials given to cats and dogs were cefovecin and amoxycillin clavulanate, respectively. The most common topical antimicrobial and high-rated topical antimicrobial given to dogs and cats was polymyxin B. This study provides a descriptive analysis of the antimicrobial usage patterns in Australia using methods that can be automated to inform antimicrobial use surveillance programs and promote antimicrobial stewardship.


Asunto(s)
Antibacterianos/uso terapéutico , Infecciones Bacterianas/veterinaria , Enfermedades de los Gatos/tratamiento farmacológico , Enfermedades de los Perros/tratamiento farmacológico , Utilización de Medicamentos/estadística & datos numéricos , Pautas de la Práctica en Medicina/estadística & datos numéricos , Veterinarios/estadística & datos numéricos , Animales , Infecciones Bacterianas/tratamiento farmacológico , Enfermedades de los Gatos/microbiología , Gatos , Enfermedades de los Perros/microbiología , Perros , Registros , Derivación y Consulta , Encuestas y Cuestionarios , Envío de Mensajes de Texto
8.
PLoS One ; 15(9): e0238889, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32903280

RESUMEN

BACKGROUND: Invasive fungal infection (IFI) detection requires application of complex case definitions by trained staff. Administrative coding data (ICD-10-AM) may provide a simplified method for IFI surveillance, but accuracy of case ascertainment in children with cancer is unknown. OBJECTIVE: To determine the classification performance of ICD-10-AM codes for detecting IFI using a gold-standard dataset (r-TERIFIC) of confirmed IFIs in paediatric cancer patients at a quaternary referral centre (Royal Children's Hospital) in Victoria, Australia from 1st April 2004 to 31st December 2013. METHODS: ICD-10-AM codes denoting IFI in paediatric patients (<18-years) with haematologic or solid tumour malignancies were extracted from the Victorian Admitted Episodes Dataset and linked to the r-TERIFIC dataset. Sensitivity, positive predictive value (PPV) and the F1 scores of the ICD-10-AM codes were calculated. RESULTS: Of 1,671 evaluable patients, 113 (6.76%) had confirmed IFI diagnoses according to gold-standard criteria, while 114 (6.82%) cases were identified using the codes. Of the clinical IFI cases, 68 were in receipt of ≥1 ICD-10-AM code(s) for IFI, corresponding to an overall sensitivity, PPV and F1 score of 60%, respectively. Sensitivity was highest for proven IFI (77% [95% CI: 58-90]; F1 = 47%) and invasive candidiasis (83% [95% CI: 61-95]; F1 = 76%) and lowest for other/unspecified IFI (20% [95% CI: 5.05-72%]; F1 = 5.00%). The most frequent misclassification was coding of invasive aspergillosis as invasive candidiasis. CONCLUSION: ICD-10-AM codes demonstrate moderate sensitivity and PPV to detect IFI in children with cancer. However, specific subsets of proven IFI and invasive candidiasis (codes B37.x) are more accurately coded.


Asunto(s)
Infecciones Fúngicas Invasoras/epidemiología , Neoplasias/microbiología , Australia/epidemiología , Niño , Preescolar , Current Procedural Terminology , Bases de Datos Factuales , Femenino , Humanos , Masculino , Registros Médicos , Estudios Retrospectivos , Centros de Atención Terciaria
9.
Int J Epidemiol ; 48(6): 1768-1782, 2019 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-31363780

RESUMEN

BACKGROUND: Immunocompromised patients are at increased risk of acquiring healthcare-associated infections (HAIs) and often require specialized models of care. Surveillance of HAIs is essential for effective infection-prevention programmes. However, little is known regarding standardized or specific surveillance methods currently employed for high-risk hospitalized patients. METHODS: A systematic review adopting a narrative synthesis approach of published material between 1 January 2000 and 31 March 2018 was conducted. Publications describing the application of traditional and/or electronic surveillance of HAIs in immunocompromised patient settings were identified from the Ovid MEDLINE®, Ovid Embase® and Elsevier Scopus® search engines [PROSPERO international prospective register of systematic reviews (registration ID: CRD42018093651)]. RESULTS: In total, 2708 studies were screened, of whom 17 fulfilled inclusion criteria. Inpatients diagnosed with haematological malignancies were the most-represented immunosuppressed population. The majority of studies described manual HAI surveillance utilizing internationally accepted definitions for infection. Chart review of diagnostic and pathology reports was most commonly employed for case ascertainment. Data linkage of disparate datasets was performed in two studies. The most frequently monitored infections were bloodstream infections and invasive fungal disease. No surveillance programmes applied risk adjustment for reporting surveillance outcomes. CONCLUSIONS: Targeted, tailored monitoring of HAIs in high-risk immunocompromised settings is infrequently reported in current hospital surveillance programmes. Standardized surveillance frameworks, including risk adjustment and timely data dissemination, are required to adequately support infection-prevention programmes in these populations.


Asunto(s)
Infecciones Bacterianas/epidemiología , Infección Hospitalaria/epidemiología , Huésped Inmunocomprometido , Atención a la Salud/organización & administración , Monitoreo Epidemiológico , Humanos
10.
J Clin Epidemiol ; 104: 8-14, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-30075189

RESUMEN

OBJECTIVES: We developed a free, online tool (CrowdCARE: crowdcare.unimelb.edu.au) to crowdsource research critical appraisal. The aim was to examine the validity of this approach for assessing the methodological quality of systematic reviews. STUDY DESIGN AND SETTING: In this prospective, cross-sectional study, a sample of systematic reviews (N = 71), of heterogeneous quality, was critically appraised using the Assessing the Methodological Quality of Systematic Reviews (AMSTAR) tool, in CrowdCARE, by five trained novice and two expert raters. After performing independent appraisals, experts resolved any disagreements by consensus (to produce an "expert consensus" rating, as the gold-standard approach). RESULTS: The expert consensus rating was within ±1 (on an 11-point scale) of the individual expert ratings for 82% of studies and was within ±1 of the mean novice rating for 79% of studies. There was a strong correlation (r2 = 0.89, P < 0.0001) and very good concordance (κ = 0.67, 95% CI: 0.61-0.73) between the expert consensus rating and mean novice rating. CONCLUSION: Crowdsourcing can be used to assess the quality of systematic reviews. Novices can be trained to appraise systematic reviews and, on average, achieve a high degree of accuracy relative to experts. These proof-of-concept data demonstrate the merit of crowdsourcing, compared with current gold standards of appraisal, and the potential capacity for this approach to transform evidence-based practice worldwide by sharing the appraisal load.


Asunto(s)
Colaboración de las Masas/métodos , Proyectos de Investigación/normas , Estudios Transversales , Práctica Clínica Basada en la Evidencia , Humanos , Estudios Prospectivos , Reproducibilidad de los Resultados , Revisiones Sistemáticas como Asunto
11.
J Biomed Semantics ; 7: 52, 2016 Sep 09.
Artículo en Inglés | MEDLINE | ID: mdl-27613112

RESUMEN

BACKGROUND: Gene Ontology (GO) terms represent the standard for annotation and representation of molecular functions, biological processes and cellular compartments, but a large gap exists between the way concepts are represented in the ontology and how they are expressed in natural language text. The construction of highly specific GO terms is formulaic, consisting of parts and pieces from more simple terms. RESULTS: We present two different types of manually generated rules to help capture the variation of how GO terms can appear in natural language text. The first set of rules takes into account the compositional nature of GO and recursively decomposes the terms into their smallest constituent parts. The second set of rules generates derivational variations of these smaller terms and compositionally combines all generated variants to form the original term. By applying both types of rules, new synonyms are generated for two-thirds of all GO terms and an increase in F-measure performance for recognition of GO on the CRAFT corpus from 0.498 to 0.636 is observed. Additionally, we evaluated the combination of both types of rules over one million full text documents from Elsevier; manual validation and error analysis show we are able to recognize GO concepts with reasonable accuracy (88 %) based on random sampling of annotations. CONCLUSIONS: In this work we present a set of simple synonym generation rules that utilize the highly compositional and formulaic nature of the Gene Ontology concepts. We illustrate how the generated synonyms aid in improving recognition of GO concepts on two different biomedical corpora. We discuss other applications of our rules for GO ontology quality assurance, explore the issue of overgeneration, and provide examples of how similar methodologies could be applied to other biomedical terminologies. Additionally, we provide all generated synonyms for use by the text-mining community.


Asunto(s)
Minería de Datos/métodos , Ontología de Genes , Semántica , Procesamiento de Lenguaje Natural , Reconocimiento de Normas Patrones Automatizadas
12.
Genome Biol ; 17(1): 184, 2016 09 07.
Artículo en Inglés | MEDLINE | ID: mdl-27604469

RESUMEN

BACKGROUND: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. RESULTS: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. CONCLUSIONS: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.


Asunto(s)
Biología Computacional , Proteínas/química , Programas Informáticos , Relación Estructura-Actividad , Algoritmos , Bases de Datos de Proteínas , Ontología de Genes , Humanos , Anotación de Secuencia Molecular , Proteínas/genética
13.
J Biomed Semantics ; 6: 9, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26005564

RESUMEN

Most computational methods that predict protein function do not take advantage of the large amount of information contained in the biomedical literature. In this work we evaluate both ontology term co-mention and bag-of-words features mined from the biomedical literature and analyze their impact in the context of a structured output support vector machine model, GOstruct. We find that even simple literature based features are useful for predicting human protein function (F-max: Molecular Function =0.408, Biological Process =0.461, Cellular Component =0.608). One advantage of using literature features is their ability to offer easy verification of automated predictions. We find through manual inspection of misclassifications that some false positive predictions could be biologically valid predictions based upon support extracted from the literature. Additionally, we present a "medium-throughput" pipeline that was used to annotate a large subset of co-mentions; we suggest that this strategy could help to speed up the rate at which proteins are curated.

14.
Gigascience ; 4: 41, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26380075

RESUMEN

BACKGROUND: The recently held Critical Assessment of Function Annotation challenge (CAFA2) required its participants to submit predictions for a large number of target proteins regardless of whether they have previous annotations or not. This is in contrast to the original CAFA challenge in which participants were asked to submit predictions for proteins with no existing annotations. The CAFA2 task is more realistic, in that it more closely mimics the accumulation of annotations over time. In this study we compare these tasks in terms of their difficulty, and determine whether cross-validation provides a good estimate of performance. RESULTS: The CAFA2 task is a combination of two subtasks: making predictions on annotated proteins and making predictions on previously unannotated proteins. In this study we analyze the performance of several function prediction methods in these two scenarios. Our results show that several methods (structured support vector machine, binary support vector machines and guilt-by-association methods) do not usually achieve the same level of accuracy on these two tasks as that achieved by cross-validation, and that predicting novel annotations for previously annotated proteins is a harder problem than predicting annotations for uncharacterized proteins. We also find that different methods have different performance characteristics in these tasks, and that cross-validation is not adequate at estimating performance and ranking methods. CONCLUSIONS: These results have implications for the design of computational experiments in the area of automated function prediction and can provide useful insight for the understanding and design of future CAFA competitions.


Asunto(s)
Proteínas/fisiología , Bases de Datos de Proteínas
15.
Methods Mol Biol ; 1159: 95-108, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24788263

RESUMEN

The Human Genome Project has provided science with a hugely valuable resource: the blueprints for life; the specification of all of the genes that make up a human. While the genes have all been identified and deciphered, it is proteins that are the workhorses of the human body: they are essential to virtually all cell functions and are the primary mechanism through which biological function is carried out. Hence in order to fully understand what happens at a molecular level in biological organisms, and eventually to enable development of treatments for diseases where some aspect of a biological system goes awry, we must understand the functions of proteins. However, experimental characterization of protein function cannot scale to the vast amount of DNA sequence data now available. Computational protein function prediction has therefore emerged as a problem at the forefront of modern biology (Radivojac et al., Nat Methods 10(13):221-227, 2013).Within the varied approaches to computational protein function prediction that have been explored, there are several that make use of biomedical literature mining. These methods take advantage of information in the published literature to associate specific proteins with specific protein functions. In this chapter, we introduce two main strategies for doing this: association of function terms, represented as Gene Ontology terms (Ashburner et al., Nat Genet 25(1):25-29, 2000), to proteins based on information in published articles, and a paradigm called LEAP-FS (Literature-Enhanced Automated Prediction of Functional Sites) in which literature mining is used to validate the predictions of an orthogonal computational protein function prediction method.


Asunto(s)
Minería de Datos/métodos , Ontología de Genes , Descubrimiento del Conocimiento/métodos , Proteínas/genética , Proteínas/metabolismo , Animales , Humanos , Valor Predictivo de las Pruebas
16.
PLoS One ; 7(2): e32171, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22393388

RESUMEN

We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites). The structure analysis was carried out using Dynamics Perturbation Analysis (DPA), which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites) in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions.


Asunto(s)
Biología Computacional/métodos , Minería de Datos/métodos , Animales , Sitios de Unión , Dominio Catalítico , Cristalografía por Rayos X/métodos , Bases de Datos de Proteínas , Humanos , Modelos Moleculares , Modelos Estadísticos , Conformación Molecular , Estructura Terciaria de Proteína , Proteínas/química , Análisis de Secuencia de Proteína/métodos , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA