Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 79
Filtrar
1.
Development ; 151(3)2024 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-38230566

RESUMO

Research in model organisms is central to the characterization of signaling pathways in multicellular organisms. Here, we present the comprehensive and systematic curation of 17 Drosophila signaling pathways using the Gene Ontology framework to establish a dynamic resource that has been incorporated into FlyBase, providing visualization and data integration tools to aid research projects. By restricting to experimental evidence reported in the research literature and quantifying the amount of such evidence for each gene in a pathway, we captured the landscape of empirical knowledge of signaling pathways in Drosophila.


Assuntos
Bases de Dados Genéticas , Drosophila , Animais , Drosophila/genética , Ontologia Genética , Transdução de Sinais , Drosophila melanogaster/genética
2.
Plant J ; 116(4): 1097-1117, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37824297

RESUMO

We have developed a compendium and interactive platform, named Stress Combinations and their Interactions in Plants Database (SCIPDb; http://www.nipgr.ac.in/scipdb.php), which offers information on morpho-physio-biochemical (phenome) and molecular (transcriptome and metabolome) responses of plants to different stress combinations. SCIPDb is a plant stress informatics hub for data mining on phenome, transcriptome, trait-gene ontology, and data-driven research for advancing mechanistic understanding of combined stress biology. We analyzed global phenome data from 939 studies to delineate the effects of various stress combinations on yield in major crops and found that yield was substantially affected under abiotic-abiotic stresses. Transcriptome datasets from 36 studies hosted in SCIPDb identified novel genes, whose roles have not been earlier established in combined stress. Integretome analysis under combined drought-heat stress pinpointed carbohydrate, amino acid, and energy metabolism pathways as the crucial metabolic, proteomic, and transcriptional components in plant tolerance to combined stress. These examples illustrate the application of SCIPDb in identifying novel genes and pathways involved in combined stress tolerance. Further, we showed the application of this database in identifying novel candidate genes and pathways for combined drought and pathogen stress tolerance. To our knowledge, SCIPDb is the only publicly available platform offering combined stress-specific omics big data visualization tools, such as an interactive scrollbar, stress matrix, radial tree, global distribution map, meta-phenome analysis, search, BLAST, transcript expression pattern table, Manhattan plot, and co-expression network. These tools facilitate a better understanding of the mechanisms underlying plant responses to combined stresses.


Assuntos
Plantas , Proteômica , Plantas/genética , Transcriptoma , Estresse Fisiológico/genética , Fenótipo , Secas , Regulação da Expressão Gênica de Plantas/genética
3.
Genet Med ; 26(4): 101083, 2024 04.
Artigo em Inglês | MEDLINE | ID: mdl-38281099

RESUMO

PURPOSE: The American College of Medical Genetics and Genomics and the Association for Molecular Pathology have outlined a schema that allows for systematic classification of variant pathogenicity. Although gnomAD is generally accepted as a reliable source of population frequency data and ClinGen has provided guidance on the utility of specific bioinformatic predictors, there is no consensus source for identifying publications relevant to a variant. Multiple tools are available to aid in the identification of relevant variant literature, including manually curated databases and literature search engines. We set out to determine the utility of 4 literature mining tools used for ascertainment to inform the discussion of the use of these tools. METHODS: Four literature mining tools including the Human Gene Mutation Database, Mastermind, ClinVar, and LitVar 2.0 were used to identify relevant variant literature for 50 RYR1 variants. Sensitivity and precision were determined for each tool. RESULTS: Sensitivity among the 4 tools ranged from 0.332 to 0.687. Precision ranged from 0.389 to 0.906. No single tool retrieved all relevant publications. CONCLUSION: At the current time, the use of multiple tools is necessary to completely identify the literature relevant to curate a variant.


Assuntos
Mineração de Dados , Variação Genética , Canal de Liberação de Cálcio do Receptor de Rianodina , Humanos , Frequência do Gene , Testes Genéticos , Variação Genética/genética , Mutação , Canal de Liberação de Cálcio do Receptor de Rianodina/genética
4.
Brief Bioinform ; 22(2): 1848-1859, 2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-32313939

RESUMO

The fast accumulation of biological data calls for their integration, analysis and exploitation through more systematic approaches. The generation of novel, relevant hypotheses from this enormous quantity of data remains challenging. Logical models have long been used to answer a variety of questions regarding the dynamical behaviours of regulatory networks. As the number of published logical models increases, there is a pressing need for systematic model annotation, referencing and curation in community-supported and standardised formats. This article summarises the key topics and future directions of a meeting entitled 'Annotation and curation of computational models in biology', organised as part of the 2019 [BC]2 conference. The purpose of the meeting was to develop and drive forward a plan towards the standardised annotation of logical models, review and connect various ongoing projects of experts from different communities involved in the modelling and annotation of molecular biological entities, interactions, pathways and models. This article defines a roadmap towards the annotation and curation of logical models, including milestones for best practices and minimum standard requirements.


Assuntos
Biologia Computacional/métodos , Modelos Biológicos , Guias de Prática Clínica como Assunto , Reprodutibilidade dos Testes
5.
Funct Integr Genomics ; 22(6): 1403-1410, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36109405

RESUMO

Knowledgebase for rice sheath blight information (KRiShI) is a manually curated user-friendly knowledgebase for rice sheath blight (SB) disease that allows users to efficiently mine, visualize, search, benchmark, download, and update meaningful data and information related to SB using its easy and interactive interface. KRiShI collects and integrates widely scattered and unstructured information from various scientific literatures, stores it under a single window, and makes it available to the community in a user-friendly manner. From basic information, best management practices, host resistance, differentially expressed genes, proteins, metabolites, resistance genes, pathways, and OMICS scale experiments, KRiShI presents these in the form of easy and comprehensive tables, diagrams, and pictures. The "Search" tab allows users to verify if their input rice gene id(s) are Rhizoctonia solani (R. solani) responsive and/or resistant. KRiShI will serve as a valuable resource for easy and quick access to data and information related to rice SB disease for both the researchers and the farmers. To encourage community curation a submission facility is made available. KRiShI can be found at http://www.tezu.ernet.in/krishi .


Assuntos
Oryza , Oryza/genética , Doenças das Plantas/genética , Bases de Conhecimento
6.
Mol Syst Biol ; 17(10): e10387, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34664389

RESUMO

We need to effectively combine the knowledge from surging literature with complex datasets to propose mechanistic models of SARS-CoV-2 infection, improving data interpretation and predicting key targets of intervention. Here, we describe a large-scale community effort to build an open access, interoperable and computable repository of COVID-19 molecular mechanisms. The COVID-19 Disease Map (C19DMap) is a graphical, interactive representation of disease-relevant molecular mechanisms linking many knowledge sources. Notably, it is a computational resource for graph-based analyses and disease modelling. To this end, we established a framework of tools, platforms and guidelines necessary for a multifaceted community of biocurators, domain experts, bioinformaticians and computational biologists. The diagrams of the C19DMap, curated from the literature, are integrated with relevant interaction and text mining databases. We demonstrate the application of network analysis and modelling approaches by concrete examples to highlight new testable hypotheses. This framework helps to find signatures of SARS-CoV-2 predisposition, treatment response or prioritisation of drug candidates. Such an approach may help deal with new waves of COVID-19 or similar pandemics in the long-term perspective.


Assuntos
COVID-19/imunologia , Biologia Computacional/métodos , Bases de Dados Factuais , SARS-CoV-2/imunologia , Software , Antivirais/uso terapêutico , COVID-19/genética , COVID-19/virologia , Gráficos por Computador , Citocinas/genética , Citocinas/imunologia , Mineração de Dados/estatística & dados numéricos , Regulação da Expressão Gênica , Interações entre Hospedeiro e Microrganismos/genética , Interações entre Hospedeiro e Microrganismos/imunologia , Humanos , Imunidade Celular/efeitos dos fármacos , Imunidade Humoral/efeitos dos fármacos , Imunidade Inata/efeitos dos fármacos , Linfócitos/efeitos dos fármacos , Linfócitos/imunologia , Linfócitos/virologia , Redes e Vias Metabólicas/genética , Redes e Vias Metabólicas/imunologia , Células Mieloides/efeitos dos fármacos , Células Mieloides/imunologia , Células Mieloides/virologia , Mapeamento de Interação de Proteínas , SARS-CoV-2/efeitos dos fármacos , SARS-CoV-2/genética , SARS-CoV-2/patogenicidade , Transdução de Sinais , Fatores de Transcrição/genética , Fatores de Transcrição/imunologia , Proteínas Virais/genética , Proteínas Virais/imunologia , Tratamento Farmacológico da COVID-19
7.
Brief Bioinform ; 20(2): 659-670, 2019 03 25.
Artigo em Inglês | MEDLINE | ID: mdl-29688273

RESUMO

The Disease Maps Project builds on a network of scientific and clinical groups that exchange best practices, share information and develop systems biomedicine tools. The project aims for an integrated, highly curated and user-friendly platform for disease-related knowledge. The primary focus of disease maps is on interconnected signaling, metabolic and gene regulatory network pathways represented in standard formats. The involvement of domain experts ensures that the key disease hallmarks are covered and relevant, up-to-date knowledge is adequately represented. Expert-curated and computer readable, disease maps may serve as a compendium of knowledge, allow for data-supported hypothesis generation or serve as a scaffold for the generation of predictive mathematical models. This article summarizes the 2nd Disease Maps Community meeting, highlighting its important topics and outcomes. We outline milestones on the roadmap for the future development of disease maps, including creating and maintaining standardized disease maps; sharing parts of maps that encode common human disease mechanisms; providing technical solutions for complexity management of maps; and Web tools for in-depth exploration of such maps. A dedicated discussion was focused on mathematical modeling approaches, as one of the main goals of disease map development is the generation of mathematically interpretable representations to predict disease comorbidity or drug response and to suggest drug repositioning, altogether supporting clinical decisions.


Assuntos
Redes Reguladoras de Genes , Predisposição Genética para Doença , Biologia Computacional , Humanos , Modelos Estatísticos , Pesquisa Translacional Biomédica
8.
J Proteome Res ; 19(12): 4782-4794, 2020 12 04.
Artigo em Inglês | MEDLINE | ID: mdl-33064489

RESUMO

In the context of the Human Proteome Project, we built an inventory of 412 functionally unannotated human proteins for which experimental evidence at the protein level exists (uPE1) and which are highly expressed in tissues involved in human male reproduction. We implemented a strategy combining literature mining, bioinformatics tools to collate annotation and experimental information from specific molecular public resources, and efficient visualization tools to put these unknown proteins into their biological context (protein complexes, tissue and subcellular location, expression pattern). The gathered knowledge allowed pinpointing five uPE1 for which a function has recently been proposed and which should be updated in protein knowledge bases. Furthermore, this bioinformatics strategy allowed to build new functional hypotheses for five other uPE1s in link with phenotypic traits that are specific to male reproductive function such as ciliogenesis/flagellum formation in germ cells (CCDC112 and TEX9), chromatin remodeling (C3orf62) and spermatozoon maturation (CCDC183). We also discussed the enigmatic case of MAGEB proteins, a poorly documented cancer/testis antigen subtype. Tools used and computational outputs produced during this study are freely accessible via ProteoRE (http://www.proteore.org), a Galaxy-based instance, for reuse purposes. We propose these five uPE1s should be investigated in priority by expert laboratories and hope that this inventory and shared resources will stimulate the interest of the community of reproductive biology.


Assuntos
Proteoma , Proteômica , Biologia Computacional , Humanos , Bases de Conhecimento , Masculino , Proteoma/genética , Reprodução
9.
Am J Hum Genet ; 100(6): 895-906, 2017 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-28552198

RESUMO

With advances in genomic sequencing technology, the number of reported gene-disease relationships has rapidly expanded. However, the evidence supporting these claims varies widely, confounding accurate evaluation of genomic variation in a clinical setting. Despite the critical need to differentiate clinically valid relationships from less well-substantiated relationships, standard guidelines for such evaluation do not currently exist. The NIH-funded Clinical Genome Resource (ClinGen) has developed a framework to define and evaluate the clinical validity of gene-disease pairs across a variety of Mendelian disorders. In this manuscript we describe a proposed framework to evaluate relevant genetic and experimental evidence supporting or contradicting a gene-disease relationship and the subsequent validation of this framework using a set of representative gene-disease pairs. The framework provides a semiquantitative measurement for the strength of evidence of a gene-disease relationship that correlates to a qualitative classification: "Definitive," "Strong," "Moderate," "Limited," "No Reported Evidence," or "Conflicting Evidence." Within the ClinGen structure, classifications derived with this framework are reviewed and confirmed or adjusted based on clinical expertise of appropriate disease experts. Detailed guidance for utilizing this framework and access to the curation interface is available on our website. This evidence-based, systematic method to assess the strength of gene-disease relationships will facilitate more knowledgeable utilization of genomic variants in clinical and research settings.


Assuntos
Estudos de Associação Genética , Predisposição Genética para Doença , Genômica , Humanos , Reprodutibilidade dos Testes
10.
Med Health Care Philos ; 23(3): 497-504, 2020 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-32524312

RESUMO

Data-intensive science comes with increased risks concerning quality and reliability of data, and while trust in science has traditionally been framed as a matter of scientists being expected to adhere to certain technical and moral norms for behaviour, emerging discourses of open science present openness and transparency as substitutes for established trust mechanisms. By ensuring access to all available information, quality becomes a matter of informed judgement by the users, and trust no longer seems necessary. This strategy does not, however, take into consideration the networks of professionals already enabling data-intensive science by providing high-quality data. In the life sciences, biological data- and knowledge bases managed by expert biocurators have become crucial for data-intensive research. In this paper, I will use the case of biocurators to argue that openness and transparency will not diminish the need for trust in data-intensive science. On the contrary, data-intensive science requires a reconfiguration of existing trust mechanisms in order to include those who take care of and manage scientific data after its production.


Assuntos
Sistemas de Gerenciamento de Base de Dados/organização & administração , Bases de Dados Factuais/normas , Ciência/normas , Confiança , Sistemas de Gerenciamento de Base de Dados/normas , Humanos , Disseminação de Informação
11.
J Proteome Res ; 18(12): 4143-4153, 2019 12 06.
Artigo em Inglês | MEDLINE | ID: mdl-31517492

RESUMO

Using neXtProt release 2019-01-11, we manually curated a list of 1837 functionally uncharacterized human proteins. Using OrthoList 2, we found that 270 of them have homologues in Caenorhabditis elegans, including 60 with a one-to-one orthology relationship. According to annotations extracted from WormBase, the vast majority of these 60 worm genes have RNAi experimental data or mutant alleles, but manual inspection shows that only 15% have phenotypes that could be interpreted in terms of a specific function. One third of the worm orthologs have protein-protein interaction data, and two of these interactions are conserved in humans. The combination of phenotypic, protein-protein interaction, and gene expression data provides functional hypotheses for 8 uncharacterized human proteins. Experimental validation in human or orthologs is necessary before they can be considered for annotation.


Assuntos
Proteínas de Caenorhabditis elegans , Bases de Dados de Proteínas , Proteínas/metabolismo , Animais , Expressão Gênica , Humanos , Proteínas de Membrana/genética , Proteínas de Membrana/metabolismo , Camundongos , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Fenótipo , Mapas de Interação de Proteínas , Proteínas/genética , Interferência de RNA , Homologia de Sequência de Aminoácidos
12.
RNA ; 23(10): 1479-1492, 2017 10.
Artigo em Inglês | MEDLINE | ID: mdl-28701522

RESUMO

This article describes the creation of the first expert manually curated noncoding RNA interaction networks for S. cerevisiae The RNA-RNA and RNA-protein interaction networks have been carefully extracted from the experimental literature and made available through the IntAct database (www.ebi.ac.uk/intact). We provide an initial network analysis and compare their properties to the much larger protein-protein interaction network. We find that the proteins that bind to ncRNAs in the network contain only a small proportion of classical RNA binding domains. We also see an enrichment of WD40 domains suggesting their direct involvement in ncRNA interactions. We discuss the challenges in collecting noncoding RNA interaction data and the opportunities for worldwide collaboration to fill the unmet need for this data.


Assuntos
Biologia Computacional/métodos , Redes Reguladoras de Genes , RNA não Traduzido/genética , Saccharomyces cerevisiae/genética , Ontologia Genética , RNA Fúngico , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo
13.
BMC Bioinformatics ; 19(1): 248, 2018 06 28.
Artigo em Inglês | MEDLINE | ID: mdl-29954318

RESUMO

BACKGROUND: For automated reading of scientific publications to extract useful information about molecular mechanisms it is critical that genes, proteins and other entities be correctly associated with uniform identifiers, a process known as named entity linking or "grounding." Correct grounding is essential for resolving relationships among mined information, curated interaction databases, and biological datasets. The accuracy of this process is largely dependent on the availability of machine-readable resources associating synonyms and abbreviations commonly found in biomedical literature with uniform identifiers. RESULTS: In a task involving automated reading of ∼215,000 articles using the REACH event extraction software we found that grounding was disproportionately inaccurate for multi-protein families (e.g., "AKT") and complexes with multiple subunits (e.g."NF- κB"). To address this problem we constructed FamPlex, a manually curated resource defining protein families and complexes as they are commonly encountered in biomedical text. In FamPlex the gene-level constituents of families and complexes are defined in a flexible format allowing for multi-level, hierarchical membership. To create FamPlex, text strings corresponding to entities were identified empirically from literature and linked manually to uniform identifiers; these identifiers were also mapped to equivalent entries in multiple related databases. FamPlex also includes curated prefix and suffix patterns that improve named entity recognition and event extraction. Evaluation of REACH extractions on a test corpus of ∼54,000 articles showed that FamPlex significantly increased grounding accuracy for families and complexes (from 15 to 71%). The hierarchical organization of entities in FamPlex also made it possible to integrate otherwise unconnected mechanistic information across families, subfamilies, and individual proteins. Applications of FamPlex to the TRIPS/DRUM reading system and the Biocreative VI Bioentity Normalization Task dataset demonstrated the utility of FamPlex in other settings. CONCLUSION: FamPlex is an effective resource for improving named entity recognition, grounding, and relationship resolution in automated reading of biomedical text. The content in FamPlex is available in both tabular and Open Biomedical Ontology formats at https://github.com/sorgerlab/famplex under the Creative Commons CC0 license and has been integrated into the TRIPS/DRUM and REACH reading systems.


Assuntos
Mineração de Dados/métodos , Proteínas/metabolismo , Humanos
14.
Hum Mutat ; 39(11): 1614-1622, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30311389

RESUMO

Genome-scale sequencing creates vast amounts of genomic data, increasing the challenge of clinical sequence variant interpretation. The demand for high-quality interpretation requires multiple specialties to join forces to accelerate the interpretation of sequence variant pathogenicity. With over 600 international members including clinicians, researchers, and laboratory diagnosticians, the Clinical Genome Resource (ClinGen), funded by the National Institutes of Health, is forming expert groups to systematically evaluate variants in clinically relevant genes. Here, we describe the first ClinGen variant curation expert panels (VCEPs), development of consistent and streamlined processes for establishing new VCEPs, and creation of standard operating procedures for VCEPs to define application of the ACMG/AMP guidelines for sequence variant interpretation in specific genes or diseases. Additionally, ClinGen has created user interfaces to enhance reliability of curation and a Sequence Variant Interpretation Working Group (SVI WG) to harmonize guideline specifications and ensure consistency between groups. The expansion of VCEPs represents the primary mechanism by which curation of a substantial fraction of genomic variants can be accelerated and ultimately undertaken systematically and comprehensively. We welcome groups to utilize our resources and become involved in our effort to create a publicly accessible, centralized resource for clinically relevant genes and variants.


Assuntos
Variação Genética/genética , Genoma Humano/genética , Biologia Computacional , Bases de Dados Genéticas , Genômica , Humanos , Mutação/genética , Sociedades Médicas , Software , Estados Unidos
15.
J Proteome Res ; 17(12): 4211-4226, 2018 12 07.
Artigo em Inglês | MEDLINE | ID: mdl-30191714

RESUMO

20,230 protein-coding genes have been predicted from the analysis of the human genome (neXtProt release 2018-01-17), and about 10% of them are still lacking functional annotation, either predicted by bioinformatics tools or captured from experimental reports. A systematic exploration of the available literature on uncharacterized human genes/proteins led to proposal of functional annotations for 113 proteins and to consolidation of a list of 1,862 uncharacterized human proteins. The advanced search functionality of neXtProt was used extensively in order to examine the landscape of the uncharacterized human proteome in terms of subcellular locations, protein-protein interactions, tissue expression, association with diseases, and 3D structure. Finally, a deep data mining in various publicly available resources allowed building functional hypotheses for 26 uncharacterized human proteins validated at protein level (uPE1). These hypotheses cover the fields of cilia biology, male reproduction, metabolism, nervous system, immunity, inflammation, RNA metabolism, and chromatin biology. They will require experimental validation before they can be considered for annotation. Despite technological progresses, the pace of human protein characterization studies is still slow. It could be accelerated by a better integration of existing knowledge resources and by initiating large collaborative projects involving specialists of different biology fields. We hope that our analysis will contribute to set up the ground for such collaborative approaches and will be exploited by the HUPO Human Proteome Project teams committed to characterize uPE1 proteins.


Assuntos
Anotação de Sequência Molecular , Proteoma/genética , Biologia Computacional , Mineração de Dados , Genoma Humano/genética , Humanos , Métodos , Proteoma/análise
16.
Glycobiology ; 28(1): 3-8, 2018 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-29040563

RESUMO

CAZypedia was initiated in 2007 to create a comprehensive, living encyclopedia of the carbohydrate-active enzymes (CAZymes) and associated carbohydrate-binding modules involved in the synthesis, modification and degradation of complex carbohydrates. CAZypedia is closely connected with the actively curated CAZy database, which provides a sequence-based foundation for the biochemical, mechanistic and structural characterization of these diverse proteins. Now celebrating its 10th anniversary online, CAZypedia is a successful example of dynamic, community-driven and expert-based biocuration. CAZypedia is an open-access resource available at URL http://www.cazypedia.org.


Assuntos
Carboidratos/química , Esterases/metabolismo , Glicosiltransferases/metabolismo , Polissacarídeo-Liases/metabolismo , Carboidratos/história , Bases de Dados de Proteínas , Esterases/química , Esterases/história , Glicosiltransferases/química , Glicosiltransferases/história , História do Século XXI , Polissacarídeo-Liases/química , Polissacarídeo-Liases/história
17.
BMC Genomics ; 19(1): 54, 2018 01 16.
Artigo em Inglês | MEDLINE | ID: mdl-29338683

RESUMO

BACKGROUND: Without knowledge of their genomic sequences, it is impossible to make functional models of the bacteria that make up human and animal microbiota. Unfortunately, the vast majority of publicly available genomes are only working drafts, an incompleteness that causes numerous problems and constitutes a major obstacle to genotypic and phenotypic interpretation. In this work, we began with an example from the class Bacteroidia in the phylum Bacteroidetes, which is preponderant among human orodigestive microbiota. We successfully identify the genetic loci responsible for assembly breaks and misassemblies and demonstrate the importance and usefulness of long-read sequencing and curated reannotation. RESULTS: We showed that the fragmentation in Bacteroidia draft genomes assembled from massively parallel sequencing linearly correlates with genomic repeats of the same or greater size than the reads. We also demonstrated that some of these repeats, especially the long ones, correspond to misassembled loci in three reference Porphyromonas gingivalis genomes marked as circularized (thus complete or finished). We prove that even at modest coverage (30X), long-read resequencing together with PCR contiguity verification (rrn operons and an integrative and conjugative element or ICE) can be used to identify and correct the wrongly combined or assembled regions. Finally, although time-consuming and labor-intensive, consistent manual biocuration of three P. gingivalis strains allowed us to compare and correct the existing genomic annotations, resulting in a more accurate interpretation of the genomic differences among these strains. CONCLUSIONS: In this study, we demonstrate the usefulness and importance of long-read sequencing in verifying published genomes (even when complete) and generating assemblies for new bacterial strains/species with high genomic plasticity. We also show that when combined with biological validation processes and diligent biocurated annotation, this strategy helps reduce the propagation of errors in shared databases, thus limiting false conclusions based on incomplete or misleading information.


Assuntos
Genoma Bacteriano , Genômica , Porphyromonas gingivalis/genética , Sequências Repetitivas de Ácido Nucleico , Bacteroidetes/genética , DNA Bacteriano/química , Genômica/normas , Anotação de Sequência Molecular , Padrões de Referência , Sequenciamento Completo do Genoma/normas
18.
RNA ; 22(5): 667-76, 2016 May.
Artigo em Inglês | MEDLINE | ID: mdl-26917558

RESUMO

MicroRNA regulation of developmental and cellular processes is a relatively new field of study, and the available research data have not been organized to enable its inclusion in pathway and network analysis tools. The association of gene products with terms from the Gene Ontology is an effective method to analyze functional data, but until recently there has been no substantial effort dedicated to applying Gene Ontology terms to microRNAs. Consequently, when performing functional analysis of microRNA data sets, researchers have had to rely instead on the functional annotations associated with the genes encoding microRNA targets. In consultation with experts in the field of microRNA research, we have created comprehensive recommendations for the Gene Ontology curation of microRNAs. This curation manual will enable provision of a high-quality, reliable set of functional annotations for the advancement of microRNA research. Here we describe the key aspects of the work, including development of the Gene Ontology to represent this data, standards for describing the data, and guidelines to support curators making these annotations. The full microRNA curation guidelines are available on the GO Consortium wiki (http://wiki.geneontology.org/index.php/MicroRNA_GO_annotation_manual).


Assuntos
Guias como Assunto , MicroRNAs/genética , Animais , Inativação Gênica , Humanos , Camundongos
19.
BMC Med Inform Decis Mak ; 16 Suppl 1: 68, 2016 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-27454860

RESUMO

BACKGROUND: The Variome corpus, a small collection of published articles about inherited colorectal cancer, includes annotations of 11 entity types and 13 relation types related to the curation of the relationship between genetic variation and disease. Due to the richness of these annotations, the corpus provides a good testbed for evaluation of biomedical literature information extraction systems. METHODS: In this paper, we focus on assessing performance on extracting the relations in the corpus, using gold standard entities as a starting point, to establish a baseline for extraction of relations important for extraction of genetic variant information from the literature. We test the application of the Public Knowledge Discovery Engine for Java (PKDE4J) system, a natural language processing system designed for information extraction of entities and relations in text, on the relation extraction task using this corpus. RESULTS: For the relations which are attested at least 100 times in the Variome corpus, we realise a performance ranging from 0.78-0.84 Precision-weighted F-score, depending on the relation. We find that the PKDE4J system adapted straightforwardly to the range of relation types represented in the corpus; some extensions to the original methodology were required to adapt to the multi-relational classification context. The results are competitive with state-of-the-art relation extraction performance on more heavily studied corpora, although the analysis shows that the Recall of a co-occurrence baseline outweighs the benefit of improved Precision for many relations, indicating the value of simple semantic constraints on relations. CONCLUSIONS: This work represents the first attempt to apply relation extraction methods to the Variome corpus. The results demonstrate that automated methods have good potential to structure the information expressed in the published literature related to genetic variants, connecting mutations to genes, diseases, and patient cohorts. Further development of such approaches will facilitate more efficient biocuration of genetic variant information into structured databases, leveraging the knowledge embedded in the vast publication literature.


Assuntos
Neoplasias Colorretais/genética , Mineração de Dados/métodos , Bases de Dados Genéticas , Variação Genética/genética , Humanos
20.
Genesis ; 53(8): 450-7, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-25997651

RESUMO

Saccharomyces Genome Database is an online resource dedicated to managing information about the biology and genetics of the model organism, yeast (Saccharomyces cerevisiae). This information is derived primarily from scientific publications through a process of human curation that involves manual extraction of data and their organization into a comprehensive system of knowledge. This system provides a foundation for further analysis of experimental data coming from research on yeast as well as other organisms. In this review we will demonstrate how biocuration and biocurators add a key component, the biological context, to our understanding of how genes, proteins, genomes and cells function and interact. We will explain the role biocurators play in sifting through the wealth of biological data to incorporate and connect key information. We will also discuss the many ways we assist researchers with their various research needs. We hope to convince the reader that manual curation is vital in converting the flood of data into organized and interconnected knowledge, and that biocurators play an essential role in the integration of scientific information into a coherent model of the cell.


Assuntos
Curadoria de Dados , Bases de Dados Genéticas , Saccharomyces/genética , Animais , Humanos , Saccharomyces/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa