Pesquisa | Portal Regional da BVS

1.

Corrigendum: Characterisation of mental health conditions in social media using Informed Deep Learning.

Gkotsis, George; Oellrich, Anika; Velupillai, Sumithra; Liakata, Maria; Hubbard, Tim J P; Dobson, Richard J B; Dutta, Rina.

Sci Rep ; 7: 46813, 2017 05 16.

Artigo em Inglês | MEDLINE | ID: mdl-28507325

2.

Automated PDF highlighting to support faster curation of literature for Parkinson's and Alzheimer's disease.

Wu, Honghan; Oellrich, Anika; Girges, Christine; de Bono, Bernard; Hubbard, Tim J P; Dobson, Richard J B.

Database (Oxford) ; 2017(1)2017 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-28365743

RESUMO

Neurodegenerative disorders such as Parkinson's and Alzheimer's disease are devastating and costly illnesses, a source of major global burden. In order to provide successful interventions for patients and reduce costs, both causes and pathological processes need to be understood. The ApiNATOMY project aims to contribute to our understanding of neurodegenerative disorders by manually curating and abstracting data from the vast body of literature amassed on these illnesses. As curation is labour-intensive, we aimed to speed up the process by automatically highlighting those parts of the PDF document of primary importance to the curator. Using techniques similar to those of summarisation, we developed an algorithm that relies on linguistic, semantic and spatial features. Employing this algorithm on a test set manually corrected for tool imprecision, we achieved a macro F 1 -measure of 0.51, which is an increase of 132% compared to the best bag-of-words baseline model. A user based evaluation was also conducted to assess the usefulness of the methodology on 40 unseen publications, which reveals that in 85% of cases all highlighted sentences are relevant to the curation task and in about 65% of the cases, the highlights are sufficient to support the knowledge curation task without needing to consult the full text. In conclusion, we believe that these are promising results for a step in automating the recognition of curation-relevant sentences. Refining our approach to pre-digest papers will lead to faster processing and cost reduction in the curation process. Database URL: https://github.com/KHP-Informatics/NapEasy.

Assuntos

Doença de Alzheimer , Curadoria de Dados/métodos , Mineração de Dados/métodos , Doença de Parkinson , Doença de Alzheimer/genética , Doença de Alzheimer/metabolismo , Animais , Curadoria de Dados/normas , Mineração de Dados/normas , Humanos , Doença de Parkinson/genética , Doença de Parkinson/metabolismo

3.

Characterisation of mental health conditions in social media using Informed Deep Learning.

Gkotsis, George; Oellrich, Anika; Velupillai, Sumithra; Liakata, Maria; Hubbard, Tim J P; Dobson, Richard J B; Dutta, Rina.

Sci Rep ; 7: 45141, 2017 03 22.

Artigo em Inglês | MEDLINE | ID: mdl-28327593

RESUMO

The number of people affected by mental illness is on the increase and with it the burden on health and social care use, as well as the loss of both productivity and quality-adjusted life-years. Natural language processing of electronic health records is increasingly used to study mental health conditions and risk behaviours on a large scale. However, narrative notes written by clinicians do not capture first-hand the patients' own experiences, and only record cross-sectional, professional impressions at the point of care. Social media platforms have become a source of 'in the moment' daily exchange, with topics including well-being and mental health. In this study, we analysed posts from the social media platform Reddit and developed classifiers to recognise and classify posts related to mental illness according to 11 disorder themes. Using a neural network and deep learning approach, we could automatically recognise mental illness-related posts in our balenced dataset with an accuracy of 91.08% and select the correct theme with a weighted average accuracy of 71.37%. We believe that these results are a first step in developing methods to characterise large amounts of user-generated content that could support content curation and targeted interventions.

4.

Thematic issue of the Second combined Bio-ontologies and Phenotypes Workshop.

Verspoor, Karin; Oellrich, Anika; Collier, Nigel; Groza, Tudor; Rocca-Serra, Philippe; Soldatova, Larisa; Dumontier, Michel; Shah, Nigam.

J Biomed Semantics ; 7(1): 66, 2016 12 12.

Artigo em Inglês | MEDLINE | ID: mdl-27955708

RESUMO

This special issue covers selected papers from the 18th Bio-Ontologies Special Interest Group meeting and Phenotype Day, which took place at the Intelligent Systems for Molecular Biology (ISMB) conference in Dublin in 2015. The papers presented in this collection range from descriptions of software tools supporting ontology development and annotation of objects with ontology terms, to applications of text mining for structured relation extraction involving diseases and phenotypes, to detailed proposals for new ontologies and mapping of existing ontologies. Together, the papers consider a range of representational issues in bio-ontology development, and demonstrate the applicability of bio-ontologies to support biological and clinical knowledge-based decision making and analysis.The full set of papers in the Thematic Issue is available at http://www.biomedcentral.com/collections/sig .

Assuntos

Ontologias Biológicas , Fenótipo

5.

Reporting phenotypes in mouse models when considering body size as a potential confounder.

Oellrich, Anika; Meehan, Terrence F; Parkinson, Helen; Sarntivijai, Sirarat; White, Jacqueline K; Karp, Natasha A.

J Biomed Semantics ; 7: 2, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-26865945

RESUMO

Genotype-phenotype studies aim to identify causative relationships between genes and phenotypes. The International Mouse Phenotyping Consortium is a high throughput phenotyping program whose goal is to collect phenotype data for a knockout mouse strain of every protein coding gene. The scale of the project requires an automatic analysis pipeline to detect abnormal phenotypes, and disseminate the resulting gene-phenotype annotation data into public resources. A body weight phenotype is a common result of knockout studies. As body weight correlates with many other biological traits, this challenges the interpretation of related gene-phenotype associations. Co-correlation can lead to gene-phenotype associations that are potentially misleading. Here we use statistical modelling to account for body weight as a potential confounder to assess the impact. We find that there is a considerable impact on previously established gene-phenotype associations due to an increase in sensitivity as well as the confounding effect. We investigated the existing ontologies to represent this phenotypic information and we explored ways to ontologically represent the results of the influence of confounders on gene-phenotype associations. With the scale of data being disseminated within the high throughput programs and the range of downstream studies that utilise these data, it is critical to consider how we improve the quality of the disseminated data and provide a robust ontological representation.

Assuntos

Tamanho Corporal , Ontologia Genética , Fenótipo , Animais , Feminino , Genótipo , Masculino , Camundongos , Modelos Animais , Padrões de Referência

6.

The digital revolution in phenotyping.

Oellrich, Anika; Collier, Nigel; Groza, Tudor; Rebholz-Schuhmann, Dietrich; Shah, Nigam; Bodenreider, Olivier; Boland, Mary Regina; Georgiev, Ivo; Liu, Hongfang; Livingston, Kevin; Luna, Augustin; Mallon, Ann-Marie; Manda, Prashanti; Robinson, Peter N; Rustici, Gabriella; Simon, Michelle; Wang, Liqin; Winnenburg, Rainer; Dumontier, Michel.

Brief Bioinform ; 17(5): 819-30, 2016 09.

Artigo em Inglês | MEDLINE | ID: mdl-26420780

RESUMO

Phenotypes have gained increased notoriety in the clinical and biological domain owing to their application in numerous areas such as the discovery of disease genes and drug targets, phylogenetics and pharmacogenomics. Phenotypes, defined as observable characteristics of organisms, can be seen as one of the bridges that lead to a translation of experimental findings into clinical applications and thereby support 'bench to bedside' efforts. However, to build this translational bridge, a common and universal understanding of phenotypes is required that goes beyond domain-specific definitions. To achieve this ambitious goal, a digital revolution is ongoing that enables the encoding of data in computer-readable formats and the data storage in specialized repositories, ready for integration, enabling translational research. While phenome research is an ongoing endeavor, the true potential hidden in the currently available data still needs to be unlocked, offering exciting opportunities for the forthcoming years. Here, we provide insights into the state-of-the-art in digital phenotyping, by means of representing, acquiring and analyzing phenotype data. In addition, we provide visions of this field for future research work that could enable better applications of phenotype data.

Assuntos

Fenótipo , Humanos , Armazenamento e Recuperação da Informação , Projetos de Pesquisa , Pesquisa Translacional Biomédica

7.

Special issue on bio-ontologies and phenotypes.

Soldatova, Larisa N; Collier, Nigel; Oellrich, Anika; Groza, Tudor; Verspoor, Karin; Rocca-Serra, Philippe; Dumontier, Michel; Shah, Nigam H.

J Biomed Semantics ; 6: 40, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26682035

RESUMO

The bio-ontologies and phenotypes special issue includes eight papers selected from the 11 papers presented at the Bio-Ontologies SIG (Special Interest Group) and the Phenotype Day at ISMB (Intelligent Systems for Molecular Biology) conference in Boston in 2014. The selected papers span a wide range of topics including the automated re-use and update of ontologies, quality assessment of ontological resources, and the systematic description of phenotype variation, driven by manual, semi- and fully automatic means.

Assuntos

Ontologias Biológicas , Fenótipo , Humanos

8.

PhenoMiner: from text to a database of phenotypes associated with OMIM diseases.

Collier, Nigel; Groza, Tudor; Smedley, Damian; Robinson, Peter N; Oellrich, Anika; Rebholz-Schuhmann, Dietrich.

Database (Oxford) ; 20152015.

Artigo em Inglês | MEDLINE | ID: mdl-26507285

RESUMO

Analysis of scientific and clinical phenotypes reported in the experimental literature has been curated manually to build high-quality databases such as the Online Mendelian Inheritance in Man (OMIM). However, the identification and harmonization of phenotype descriptions struggles with the diversity of human expressivity. We introduce a novel automated extraction approach called PhenoMiner that exploits full parsing and conceptual analysis. Apriori association mining is then used to identify relationships to human diseases. We applied PhenoMiner to the BMC open access collection and identified 13,636 phenotype candidates. We identified 28,155 phenotype-disorder hypotheses covering 4898 phenotypes and 1659 Mendelian disorders. Analysis showed: (i) the semantic distribution of the extracted terms against linked ontologies; (ii) a comparison of term overlap with the Human Phenotype Ontology (HP); (iii) moderate support for phenotype-disorder pairs in both OMIM and the literature; (iv) strong associations of phenotype-disorder pairs to known disease-genes pairs using PhenoDigm. The full list of PhenoMiner phenotypes (S1), phenotype-disorder associations (S2), association-filtered linked data (S3) and user database documentation (S5) is available as supplementary data and can be downloaded at http://github.com/nhcollier/PhenoMiner under a Creative Commons Attribution 4.0 license. Database URL: phenominer.mml.cam.ac.uk.

Assuntos

Mineração de Dados , Bases de Dados como Assunto , Bases de Dados Genéticas , Software , Animais , Animais Congênicos , Pressão Sanguínea , Cromossomos de Mamíferos/genética , Dieta , Humanos , Hipertensão/complicações , Hipertensão/fisiopatologia , Hipertensão/urina , Nefropatias/complicações , Nefropatias/fisiopatologia , Nefropatias/urina , Modelos Biológicos , Fenótipo , Ratos , Ratos Mutantes , Sódio/urina , Sístole

9.

A gene expression resource generated by genome-wide lacZ profiling in the mouse.

Tuck, Elizabeth; Estabel, Jeanne; Oellrich, Anika; Maguire, Anna Karin; Adissu, Hibret A; Souter, Luke; Siragher, Emma; Lillistone, Charlotte; Green, Angela L; Wardle-Jones, Hannah; Carragher, Damian M; Karp, Natasha A; Smedley, Damian; Adams, Niels C; Bussell, James N; Adams, David J; Ramírez-Solis, Ramiro; Steel, Karen P; Galli, Antonella; White, Jacqueline K.

Dis Model Mech ; 8(11): 1467-78, 2015 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-26398943

RESUMO

Knowledge of the expression profile of a gene is a critical piece of information required to build an understanding of the normal and essential functions of that gene and any role it may play in the development or progression of disease. High-throughput, large-scale efforts are on-going internationally to characterise reporter-tagged knockout mouse lines. As part of that effort, we report an open access adult mouse expression resource, in which the expression profile of 424 genes has been assessed in up to 47 different organs, tissues and sub-structures using a lacZ reporter gene. Many specific and informative expression patterns were noted. Expression was most commonly observed in the testis and brain and was most restricted in white adipose tissue and mammary gland. Over half of the assessed genes presented with an absent or localised expression pattern (categorised as 0-10 positive structures). A link between complexity of expression profile and viability of homozygous null animals was observed; inactivation of genes expressed in ≥ 21 structures was more likely to result in reduced viability by postnatal day 14 compared with more restricted expression profiles. For validation purposes, this mouse expression resource was compared with Bgee, a federated composite of RNA-based expression data sets. Strong agreement was observed, indicating a high degree of specificity in our data. Furthermore, there were 1207 observations of expression of a particular gene in an anatomical structure where Bgee had no data, indicating a large amount of novelty in our data set. Examples of expression data corroborating and extending genotype-phenotype associations and supporting disease gene candidacy are presented to demonstrate the potential of this powerful resource.

Assuntos

Perfilação da Expressão Gênica/métodos , Genes Reporter , Sequenciamento de Nucleotídeos em Larga Escala , Óperon Lac , Fatores Etários , Animais , Biologia Computacional , Bases de Dados Genéticas , Feminino , Regulação da Expressão Gênica no Desenvolvimento , Estudo de Associação Genômica Ampla , Homozigoto , Masculino , Camundongos Knockout , Mutação , Especificidade de Órgãos , Fenótipo

10.

Disease insights through cross-species phenotype comparisons.

Haendel, Melissa A; Vasilevsky, Nicole; Brush, Matthew; Hochheiser, Harry S; Jacobsen, Julius; Oellrich, Anika; Mungall, Christopher J; Washington, Nicole; Köhler, Sebastian; Lewis, Suzanna E; Robinson, Peter N; Smedley, Damian.

Mamm Genome ; 26(9-10): 548-55, 2015 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-26092691

RESUMO

New sequencing technologies have ushered in a new era for diagnosis and discovery of new causative mutations for rare diseases. However, the sheer numbers of candidate variants that require interpretation in an exome or genomic analysis are still a challenging prospect. A powerful approach is the comparison of the patient's set of phenotypes (phenotypic profile) to known phenotypic profiles caused by mutations in orthologous genes associated with these variants. The most abundant source of relevant data for this task is available through the efforts of the Mouse Genome Informatics group and the International Mouse Phenotyping Consortium. In this review, we highlight the challenges in comparing human clinical phenotypes with mouse phenotypes and some of the solutions that have been developed by members of the Monarch Initiative. These tools allow the identification of mouse models for known disease-gene associations that may otherwise have been overlooked as well as candidate genes may be prioritized for novel associations. The culmination of these efforts is the Exomiser software package that allows clinical researchers to analyse patient exomes in the context of variant frequency and predicted pathogenicity as well the phenotypic similarity of the patient to any given candidate orthologous gene.

Assuntos

Bases de Dados Genéticas , Doenças Genéticas Inatas , Animais , Biologia Computacional , Modelos Animais de Doenças , Exoma/genética , Genômica , Humanos , Camundongos , Mutação , Fenótipo

11.

Concept selection for phenotypes and diseases using learn to rank.

Collier, Nigel; Oellrich, Anika; Groza, Tudor.

J Biomed Semantics ; 6: 24, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26034558

RESUMO

BACKGROUND: Phenotypes form the basis for determining the existence of a disease against the given evidence. Much of this evidence though remains locked away in text - scientific articles, clinical trial reports and electronic patient records (EPR) - where authors use the full expressivity of human language to report their observations. RESULTS: In this paper we exploit a combination of off-the-shelf tools for extracting a machine understandable representation of phenotypes and other related concepts that concern the diagnosis and treatment of diseases. These are tested against a gold standard EPR collection that has been annotated with Unified Medical Language System (UMLS) concept identifiers: the ShARE/CLEF 2013 corpus for disorder detection. We evaluate four pipelines as stand-alone systems and then attempt to optimise semantic-type based performance using several learn-to-rank (LTR) approaches - three pairwise and one listwise. We observed that whilst overall Apache cTAKES tended to outperform other stand-alone systems on a strong recall (R = 0.57), precision was low (P = 0.09) leading to low-to-moderate F1 measure (F1 = 0.16). Moreover, there is substantial variation in system performance across semantic types for disorders. For example, the concept Findings (T033) seemed to be very challenging for all systems. Combining systems within LTR improved F1 substantially (F1 = 0.24) particularly for Disease or syndrome (T047) and Anatomical abnormality (T190). Whilst recall is improved markedly, precision remains a challenge (P = 0.15, R = 0.59).

12.

Linking gene expression to phenotypes via pathway information.

Papatheodorou, Irene; Oellrich, Anika; Smedley, Damian.

J Biomed Semantics ; 6: 17, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-25901272

RESUMO

Establishing robust links among gene expression, pathways and phenotypes is critical for understanding diseases and developing treatments. In recent years there have been many efforts to develop the computational means to traverse from genes to gene expression, model pathways and classify phenotypes. Numerous ontologies and other controlled vocabularies have been developed, as well as computational methods to combine and mine these data sets and establish connections. Here we discuss these efforts and identify areas of future work that could lead to a better integration of genes, pathways and phenotypes to provide insights into the mechanisms under which gene mutations affect expression and pathways and how these effects are manifested onto the phenotype.

13.

Automatic concept recognition using the human phenotype ontology reference and test suite corpora.

Groza, Tudor; Köhler, Sebastian; Doelken, Sandra; Collier, Nigel; Oellrich, Anika; Smedley, Damian; Couto, Francisco M; Baynam, Gareth; Zankl, Andreas; Robinson, Peter N.

Database (Oxford) ; 20152015.

Artigo em Inglês | MEDLINE | ID: mdl-25725061

RESUMO

Concept recognition tools rely on the availability of textual corpora to assess their performance and enable the identification of areas for improvement. Typically, corpora are developed for specific purposes, such as gene name recognition. Gene and protein name identification are longstanding goals of biomedical text mining, and therefore a number of different corpora exist. However, phenotypes only recently became an entity of interest for specialized concept recognition systems, and hardly any annotated text is available for performance testing and training. Here, we present a unique corpus, capturing text spans from 228 abstracts manually annotated with Human Phenotype Ontology (HPO) concepts and harmonized by three curators, which can be used as a reference standard for free text annotation of human phenotypes. Furthermore, we developed a test suite for standardized concept recognition error analysis, incorporating 32 different types of test cases corresponding to 2164 HPO concepts. Finally, three established phenotype concept recognizers (NCBO Annotator, OBO Annotator and Bio-LarK CR) were comprehensively evaluated, and results are reported against both the text corpus and the test suites. The gold standard and test suites corpora are available from http://bio-lark.org/hpo_res.html. Database URL: http://bio-lark.org/hpo_res.html.

Assuntos

Mineração de Dados/métodos , Ontologia Genética , Fenótipo , Software , Humanos

14.

An ontology approach to comparative phenomics in plants.

Oellrich, Anika; Walls, Ramona L; Cannon, Ethalinda Ks; Cannon, Steven B; Cooper, Laurel; Gardiner, Jack; Gkoutos, Georgios V; Harper, Lisa; He, Mingze; Hoehndorf, Robert; Jaiswal, Pankaj; Kalberer, Scott R; Lloyd, John P; Meinke, David; Menda, Naama; Moore, Laura; Nelson, Rex T; Pujar, Anuradha; Lawrence, Carolyn J; Huala, Eva.

Plant Methods ; 11: 10, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-25774204

RESUMO

BACKGROUND: Plant phenotype datasets include many different types of data, formats, and terms from specialized vocabularies. Because these datasets were designed for different audiences, they frequently contain language and details tailored to investigators with different research objectives and backgrounds. Although phenotype comparisons across datasets have long been possible on a small scale, comprehensive queries and analyses that span a broad set of reference species, research disciplines, and knowledge domains continue to be severely limited by the absence of a common semantic framework. RESULTS: We developed a workflow to curate and standardize existing phenotype datasets for six plant species, encompassing both model species and crop plants with established genetic resources. Our effort focused on mutant phenotypes associated with genes of known sequence in Arabidopsis thaliana (L.) Heynh. (Arabidopsis), Zea mays L. subsp. mays (maize), Medicago truncatula Gaertn. (barrel medic or Medicago), Oryza sativa L. (rice), Glycine max (L.) Merr. (soybean), and Solanum lycopersicum L. (tomato). We applied the same ontologies, annotation standards, formats, and best practices across all six species, thereby ensuring that the shared dataset could be used for cross-species querying and semantic similarity analyses. Curated phenotypes were first converted into a common format using taxonomically broad ontologies such as the Plant Ontology, Gene Ontology, and Phenotype and Trait Ontology. We then compared ontology-based phenotypic descriptions with an existing classification system for plant phenotypes and evaluated our semantic similarity dataset for its ability to enhance predictions of gene families, protein functions, and shared metabolic pathways that underlie informative plant phenotypes. CONCLUSIONS: The use of ontologies, annotation standards, shared formats, and best practices for cross-taxon phenotype data analyses represents a novel approach to plant phenomics that enhances the utility of model genetic organisms and can be readily applied to species with fewer genetic resources and less well-characterized genomes. In addition, these tools should enhance future efforts to explore the relationships among phenotypic similarity, gene function, and sequence similarity in plants, and to make genotype-to-phenotype predictions relevant to plant biology, crop improvement, and potentially even human health.

15.

Finding our way through phenotypes.

Deans, Andrew R; Lewis, Suzanna E; Huala, Eva; Anzaldo, Salvatore S; Ashburner, Michael; Balhoff, James P; Blackburn, David C; Blake, Judith A; Burleigh, J Gordon; Chanet, Bruno; Cooper, Laurel D; Courtot, Mélanie; Csösz, Sándor; Cui, Hong; Dahdul, Wasila; Das, Sandip; Dececchi, T Alexander; Dettai, Agnes; Diogo, Rui; Druzinsky, Robert E; Dumontier, Michel; Franz, Nico M; Friedrich, Frank; Gkoutos, George V; Haendel, Melissa; Harmon, Luke J; Hayamizu, Terry F; He, Yongqun; Hines, Heather M; Ibrahim, Nizar; Jackson, Laura M; Jaiswal, Pankaj; James-Zorn, Christina; Köhler, Sebastian; Lecointre, Guillaume; Lapp, Hilmar; Lawrence, Carolyn J; Le Novère, Nicolas; Lundberg, John G; Macklin, James; Mast, Austin R; Midford, Peter E; Mikó, István; Mungall, Christopher J; Oellrich, Anika; Osumi-Sutherland, David; Parkinson, Helen; Ramírez, Martín J; Richter, Stefan; Robinson, Peter N.

PLoS Biol ; 13(1): e1002033, 2015 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-25562316

RESUMO

Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bottleneck to integration across many key fields in biology, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. Here we survey the current phenomics landscape, including data resources and handling, and the progress that has been made to accurately capture relevant data descriptions for phenotypes. We present an example of the kind of integration across domains that computable phenotypes would enable, and we call upon the broader biology community, publishers, and relevant funding agencies to support efforts to surmount today's data barriers and facilitate analytical reproducibility.

Assuntos

Estudos de Associação Genética , Animais , Biologia Computacional , Curadoria de Dados , Bases de Dados Factuais/normas , Interação Gene-Ambiente , Genômica , Humanos , Fenótipo , Padrões de Referência , Reprodutibilidade dos Testes , Terminologia como Assunto

16.

Generation of silver standard concept annotations from biomedical texts with special relevance to phenotypes.

Oellrich, Anika; Collier, Nigel; Smedley, Damian; Groza, Tudor.

PLoS One ; 10(1): e0116040, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-25607983

RESUMO

Electronic health records and scientific articles possess differing linguistic characteristics that may impact the performance of natural language processing tools developed for one or the other. In this paper, we investigate the performance of four extant concept recognition tools: the clinical Text Analysis and Knowledge Extraction System (cTAKES), the National Center for Biomedical Ontology (NCBO) Annotator, the Biomedical Concept Annotation System (BeCAS) and MetaMap. Each of the four concept recognition systems is applied to four different corpora: the i2b2 corpus of clinical documents, a PubMed corpus of Medline abstracts, a clinical trails corpus and the ShARe/CLEF corpus. In addition, we assess the individual system performances with respect to one gold standard annotation set, available for the ShARe/CLEF corpus. Furthermore, we built a silver standard annotation set from the individual systems' output and assess the quality as well as the contribution of individual systems to the quality of the silver standard. Our results demonstrate that mainly the NCBO annotator and cTAKES contribute to the silver standard corpora (F1-measures in the range of 21% to 74%) and their quality (best F1-measure of 33%), independent from the type of text investigated. While BeCAS and MetaMap can contribute to the precision of silver standard annotations (precision of up to 42%), the F1-measure drops when combined with NCBO Annotator and cTAKES due to a low recall. In conclusion, the performances of individual systems need to be improved independently from the text types, and the leveraging strategies to best take advantage of individual systems' annotations need to be revised. The textual content of the PubMed corpus, accession numbers for the clinical trials corpus, and assigned annotations of the four concept recognition systems as well as the generated silver standard annotation sets are available from http://purl.org/phenotype/resources. The textual content of the ShARe/CLEF (https://sites.google.com/site/shareclefehealth/data) and i2b2 (https://i2b2.org/NLP/DataSets/) corpora needs to be requested with the individual corpus providers.

Assuntos

Mineração de Dados/normas , Fenótipo , Ontologias Biológicas , Registros Eletrônicos de Saúde , Humanos , Processamento de Linguagem Natural , PubMed/normas

17.

The influence of disease categories on gene candidate predictions from model organism phenotypes.

Oellrich, Anika; Koehler, Sebastian; Washington, Nicole; Mungall, Chris; Lewis, Suzanna; Haendel, Melissa; Robinson, Peter N; Smedley, Damian.

J Biomed Semantics ; 5(Suppl 1 Proceedings of the Bio-Ontologies Spec Interest G): S4, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25093073

RESUMO

BACKGROUND: The molecular etiology is still to be identified for about half of the currently described Mendelian diseases in humans, thereby hindering efforts to find treatments or preventive measures. Advances, such as new sequencing technologies, have led to increasing amounts of data becoming available with which to address the problem of identifying disease genes. Therefore, automated methods are needed that reliably predict disease gene candidates based on available data. We have recently developed Exomiser as a tool for identifying causative variants from exome analysis results by filtering and prioritising using a number of criteria including the phenotype similarity between the disease and mouse mutants involving the gene candidates. Initial investigations revealed a variation in performance for different medical categories of disease, due in part to a varying contribution of the phenotype scoring component. RESULTS: In this study, we further analyse the performance of our cross-species phenotype matching algorithm, and examine in more detail the reasons why disease gene filtering based on phenotype data works better for certain disease categories than others. We found that in addition to misleading phenotype alignments between species, some disease categories are still more amenable to automated predictions than others, and that this often ties in with community perceptions on how well the organism works as model. CONCLUSIONS: In conclusion, our automated disease gene candidate predictions are highly dependent on the organism used for the predictions and the disease category being studied. Future work on computational disease gene prediction using phenotype data would benefit from methods that take into account the disease category and the source of model organism data.

18.

Using association rule mining to determine promising secondary phenotyping hypotheses.

Oellrich, Anika; Jacobsen, Julius; Papatheodorou, Irene; Smedley, Damian.

Bioinformatics ; 30(12): i52-59, 2014 Jun 15.

Artigo em Inglês | MEDLINE | ID: mdl-24932005

RESUMO

MOTIVATION: Large-scale phenotyping projects such as the Sanger Mouse Genetics project are ongoing efforts to help identify the influences of genes and their modification on phenotypes. Gene-phenotype relations are crucial to the improvement of our understanding of human heritable diseases as well as the development of drugs. However, given that there are â¼: 20 000 genes in higher vertebrate genomes and the experimental verification of gene-phenotype relations requires a lot of resources, methods are needed that determine good candidates for testing. RESULTS: In this study, we applied an association rule mining approach to the identification of promising secondary phenotype candidates. The predictions rely on a large gene-phenotype annotation set that is used to find occurrence patterns of phenotypes. Applying an association rule mining approach, we could identify 1967 secondary phenotype hypotheses that cover 244 genes and 136 phenotypes. Using two automated and one manual evaluation strategies, we demonstrate that the secondary phenotype candidates possess biological relevance to the genes they are predicted for. From the results we conclude that the predicted secondary phenotypes constitute good candidates to be experimentally tested and confirmed. AVAILABILITY: The secondary phenotype candidates can be browsed through at http://www.sanger.ac.uk/resources/databases/phenodigm/gene/secondaryphenotype/list. CONTACT: ao5@sanger.ac.uk or ds5@sanger.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Mineração de Dados/métodos , Doença/genética , Fenótipo , Animais , Ataxia/genética , Genes , Humanos , Camundongos , Doenças Mitocondriais/genética , Debilidade Muscular/genética , Ubiquinona/deficiência , Ubiquinona/genética

19.

Linking tissues to phenotypes using gene expression profiles.

Oellrich, Anika; Smedley, Damian.

Database (Oxford) ; 2014: bau017, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-24634472

RESUMO

Despite great biological and computational efforts to determine the genetic causes underlying human heritable diseases, approximately half (3500) of these diseases are still without an identified genetic cause. Model organism studies allow the targeted modification of the genome and can help with the identification of genetic causes for human diseases. Targeted modifications have led to a vast amount of model organism data. However, these data are scattered across different databases, preventing an integrated view and missing out on contextual information. Once we are able to combine all the existing resources, will we be able to fully understand the causes underlying a disease and how species differ. Here, we present an integrated data resource combining tissue expression with phenotypes in mouse lines and bringing us one step closer to consequence chains from a molecular level to a resulting phenotype. Mutations in genes often manifest in phenotypes in the same tissue that the gene is expressed in. However, in other cases, a systems level approach is required to understand how perturbations to gene-networks connecting multiple tissues lead to a phenotype. Automated evaluation of the predicted tissue-phenotype associations reveals that 72-76% of the phenotypes are associated with disruption of genes expressed in the affected tissue. However, 55-64% of the individual phenotype-tissue associations show spatially separated gene expression and phenotype manifestation. For example, we see a correlation between 'total body fat' abnormalities and genes expressed in the 'brain', which fits recent discoveries linking genes expressed in the hypothalamus to obesity. Finally, we demonstrate that the use of our predicted tissue-phenotype associations can improve the detection of a known disease-gene association when combined with a disease gene candidate prediction tool. For example, JAK2, the known gene associated with Familial Erythrocytosis 1, rises from the seventh best candidate to the top hit when the associated tissues are taken into consideration. Database URL: http://www.sanger.ac.uk/resources/databases/phenodigm/phenotype/list.

Assuntos

Perfilação da Expressão Gênica , Especificidade de Órgãos/genética , Animais , Regulação da Expressão Gênica , Ontologia Genética , Genoma/genética , Humanos , Internet , Camundongos , Fenótipo , Interface Usuário-Computador

20.

Improved exome prioritization of disease genes through cross-species phenotype comparison.

Robinson, Peter N; Köhler, Sebastian; Oellrich, Anika; Wang, Kai; Mungall, Christopher J; Lewis, Suzanna E; Washington, Nicole; Bauer, Sebastian; Seelow, Dominik; Krawitz, Peter; Gilissen, Christian; Haendel, Melissa; Smedley, Damian.

Genome Res ; 24(2): 340-8, 2014 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-24162188

RESUMO

Numerous new disease-gene associations have been identified by whole-exome sequencing studies in the last few years. However, many cases remain unsolved due to the sheer number of candidate variants remaining after common filtering strategies such as removing low quality and common variants and those deemed unlikely to be pathogenic. The observation that each of our genomes contains about 100 genuine loss-of-function variants makes identification of the causative mutation problematic when using these strategies alone. We propose using the wealth of genotype to phenotype data that already exists from model organism studies to assess the potential impact of these exome variants. Here, we introduce PHenotypic Interpretation of Variants in Exomes (PHIVE), an algorithm that integrates the calculation of phenotype similarity between human diseases and genetically modified mouse models with evaluation of the variants according to allele frequency, pathogenicity, and mode of inheritance approaches in our Exomiser tool. Large-scale validation of PHIVE analysis using 100,000 exomes containing known mutations demonstrated a substantial improvement (up to 54.1-fold) over purely variant-based (frequency and pathogenicity) methods with the correct gene recalled as the top hit in up to 83% of samples, corresponding to an area under the ROC curve of >95%. We conclude that incorporation of phenotype data can play a vital role in translational bioinformatics and propose that exome sequencing projects should systematically capture clinical phenotypes to take advantage of the strategy presented here.

Assuntos

Exoma/genética , Estudos de Associação Genética , Predisposição Genética para Doença , Polimorfismo de Nucleotídeo Único/genética , Algoritmos , Animais , Biologia Computacional , Bases de Dados Genéticas , Humanos , Camundongos , Fenótipo , Análise de Sequência de DNA , Software

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA