RESUMO
Studies in multiple solid tumor types have demonstrated the prognostic significance of ctDNA analysis after curative intent surgery. A combined analysis of data across completed studies could further our understanding of circulating tumor DNA (ctDNA) as a prognostic marker and inform future trial design. We combined individual patient data from three independent cohort studies of nonmetastatic colorectal cancer (CRC). Plasma samples were collected 4 to 10 weeks after surgery. Mutations in ctDNA were assayed using a massively parallel sequencing technique called SafeSeqS. We analyzed 485 CRC patients (230 Stage II colon, 96 Stage III colon, and 159 locally advanced rectum). ctDNA was detected after surgery in 59 (12%) patients overall (11.0%, 12.5% and 13.8% for samples taken at 4-6, 6-8 and 8-10 weeks; P = .740). ctDNA detection was associated with poorer 5-year recurrence-free (38.6% vs 85.5%; P < .001) and overall survival (64.6% vs 89.4%; P < .001). The predictive accuracy of postsurgery ctDNA for recurrence was higher than that of individual clinicopathologic risk features. Recurrence risk increased exponentially with increasing ctDNA mutant allele frequency (MAF) (hazard ratio, 1.2, 2.5 and 5.8 for MAF of 0.1%, 0.5% and 1%). Postsurgery ctDNA was detected in 3 of 20 (15%) patients with locoregional and 27 of 60 (45%) with distant recurrence (P = .018). This analysis demonstrates a consistent long-term impact of ctDNA as a prognostic marker across nonmetastatic CRC, where ctDNA outperforms other clinicopathologic risk factors and MAF further stratifies recurrence risk. ctDNA is a better predictor of distant vs locoregional recurrence.
Assuntos
Biomarcadores Tumorais/genética , DNA Tumoral Circulante/genética , Neoplasias Colorretais/genética , Mutação , Adulto , Idoso , Idoso de 80 Anos ou mais , Biomarcadores Tumorais/sangue , DNA Tumoral Circulante/sangue , Estudos de Coortes , Neoplasias Colorretais/sangue , Neoplasias Colorretais/cirurgia , Feminino , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Estimativa de Kaplan-Meier , Masculino , Pessoa de Meia-Idade , Recidiva Local de Neoplasia , Prognóstico , Modelos de Riscos Proporcionais , Adulto JovemRESUMO
Research in the past decade has demonstrated that a single reference genome is not representative of a species' diversity. MaizeGDB introduces a pan-genomic approach to hosting genomic data, leveraging the large number of diverse maize genomes and their associated datasets to quickly and efficiently connect genomes, gene models, expression, epigenome, sequence variation, structural variation, transposable elements, and diversity data across genomes so that researchers can easily track the structural and functional differences of a locus and its orthologs across maize. We believe our framework is unique and provides a template for any genomic database poised to host large-scale pan-genomic data.
Assuntos
Confiabilidade dos Dados , Coleta de Dados/métodos , Bases de Dados como Assunto , Genoma de Planta , Genômica , Zea mays/genética , Variação GenéticaRESUMO
Since its 2015 update, MaizeGDB, the Maize Genetics and Genomics database, has expanded to support the sequenced genomes of many maize inbred lines in addition to the B73 reference genome assembly. Curation and development efforts have targeted high quality datasets and tools to support maize trait analysis, germplasm analysis, genetic studies, and breeding. MaizeGDB hosts a wide range of data including recent support of new data types including genome metadata, RNA-seq, proteomics, synteny, and large-scale diversity. To improve access and visualization of data types several new tools have been implemented to: access large-scale maize diversity data (SNPversity), download and compare gene expression data (qTeller), visualize pedigree data (Pedigree Viewer), link genes with phenotype images (MaizeDIG), and enable flexible user-specified queries to the MaizeGDB database (MaizeMine). MaizeGDB also continues to be the community hub for maize research, coordinating activities and providing technical support to the maize research community. Here we report the changes MaizeGDB has made within the last three years to keep pace with recent software and research advances, as well as the pan-genomic landscape that cheaper and better sequencing technologies have made possible. MaizeGDB is accessible online at https://www.maizegdb.org.
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Genoma de Planta/genética , Genômica/métodos , Zea mays/genética , Regulação da Expressão Gênica de Plantas , Variação Genética , Armazenamento e Recuperação da Informação/métodos , Internet , Polimorfismo de Nucleotídeo Único , Proteômica/métodos , Interface Usuário-Computador , Zea mays/metabolismoRESUMO
MaizeGDB is a highly curated, community-oriented database and informatics service to researchers focused on the crop plant and model organism Zea mays ssp. mays. Although some form of the maize community database has existed over the last 25 years, there have only been two major releases. In 1991, the original maize genetics database MaizeDB was created. In 2003, the combined contents of MaizeDB and the sequence data from ZmDB were made accessible as a single resource named MaizeGDB. Over the next decade, MaizeGDB became more sequence driven while still maintaining traditional maize genetics datasets. This enabled the project to meet the continued growing and evolving needs of the maize research community, yet the interface and underlying infrastructure remained unchanged. In 2015, the MaizeGDB team completed a multi-year effort to update the MaizeGDB resource by reorganizing existing data, upgrading hardware and infrastructure, creating new tools, incorporating new data types (including diversity data, expression data, gene models, and metabolic pathways), and developing and deploying a modern interface. In addition to coordinating a data resource, the MaizeGDB team coordinates activities and provides technical support to the maize research community. MaizeGDB is accessible online at http://www.maizegdb.org.
Assuntos
Bases de Dados Genéticas , Zea mays/genética , Expressão Gênica , Genes de Plantas , Variação Genética , Genoma de Planta , Redes e Vias Metabólicas , Modelos Genéticos , Software , Interface Usuário-Computador , Zea mays/metabolismoRESUMO
The Plant Ontology (PO; http://www.plantontology.org/) is a publicly available, collaborative effort to develop and maintain a controlled, structured vocabulary ('ontology') of terms to describe plant anatomy, morphology and the stages of plant development. The goals of the PO are to link (annotate) gene expression and phenotype data to plant structures and stages of plant development, using the data model adopted by the Gene Ontology. From its original design covering only rice, maize and Arabidopsis, the scope of the PO has been expanded to include all green plants. The PO was the first multispecies anatomy ontology developed for the annotation of genes and phenotypes. Also, to our knowledge, it was one of the first biological ontologies that provides translations (via synonyms) in non-English languages such as Japanese and Spanish. As of Release #18 (July 2012), there are about 2.2 million annotations linking PO terms to >110,000 unique data objects representing genes or gene models, proteins, RNAs, germplasm and quantitative trait loci (QTLs) from 22 plant species. In this paper, we focus on the plant anatomical entity branch of the PO, describing the organizing principles, resources available to users and examples of how the PO is integrated into other plant genomics databases and web portals. We also provide two examples of comparative analyses, demonstrating how the ontology structure and PO-annotated data can be used to discover the patterns of expression of the LEAFY (LFY) and terpene synthase (TPS) gene homologs.
Assuntos
Genoma de Planta , Genômica/métodos , Plantas/anatomia & histologia , Plantas/genética , Software , Alquil e Aril Transferases/genética , Bases de Dados Genéticas , Flores/genética , Internet , Anotação de Sequência Molecular , Família Multigênica , Fenótipo , Folhas de Planta/anatomia & histologia , Proteínas de Plantas/genéticaRESUMO
Maize is a major cereal crop and an important model system for basic biological research. Knowledge gained from maize research can also be used to genetically improve its grass relatives such as sorghum, wheat, and rice. The primary objective of the Maize Genome Sequencing Consortium (MGSC) was to generate a reference genome sequence that was integrated with both the physical and genetic maps. Using a previously published integrated genetic and physical map, combined with in-coming maize genomic sequence, new sequence-based genetic markers, and an optical map, we dynamically picked a minimum tiling path (MTP) of 16,910 bacterial artificial chromosome (BAC) and fosmid clones that were used by the MGSC to sequence the maize genome. The final MTP resulted in a significantly improved physical map that reduced the number of contigs from 721 to 435, incorporated a total of 8,315 mapped markers, and ordered and oriented the majority of FPC contigs. The new integrated physical and genetic map covered 2,120 Mb (93%) of the 2,300-Mb genome, of which 405 contigs were anchored to the genetic map, totaling 2,103.4 Mb (99.2% of the 2,120 Mb physical map). More importantly, 336 contigs, comprising 94.0% of the physical map ( approximately 1,993 Mb), were ordered and oriented. Finally we used all available physical, sequence, genetic, and optical data to generate a golden path (AGP) of chromosome-based pseudomolecules, herein referred to as the B73 Reference Genome Sequence version 1 (B73 RefGen_v1).
Assuntos
Genoma de Planta/genética , Zea mays/genética , Algoritmos , Sequência de Bases , Cromossomos Artificiais Bacterianos/genética , Cromossomos de Plantas/genética , Clonagem Molecular , Mapeamento de Sequências Contíguas , Marcadores Genéticos , Dados de Sequência Molecular , Fenômenos Ópticos , Mapeamento Físico do Cromossomo , Análise de Sequência de DNA , Homologia de Sequência do Ácido NucleicoRESUMO
BACKGROUND: The ability to search for and precisely compare similar phenotypic appearances within and across species has vast potential in plant science and genetic research. The difficulty in doing so lies in the fact that many visual phenotypic data, especially visually observed phenotypes that often times cannot be directly measured quantitatively, are in the form of text annotations, and these descriptions are plagued by semantic ambiguity, heterogeneity, and low granularity. Though several bio-ontologies have been developed to standardize phenotypic (and genotypic) information and permit comparisons across species, these semantic issues persist and prevent precise analysis and retrieval of information. A framework suitable for the modeling and analysis of precise computable representations of such phenotypic appearances is needed. RESULTS: We have developed a new framework called the Computable Visually Observed Phenotype Ontological Framework for plants. This work provides a novel quantitative view of descriptions of plant phenotypes that leverages existing bio-ontologies and utilizes a computational approach to capture and represent domain knowledge in a machine-interpretable form. This is accomplished by means of a robust and accurate semantic mapping module that automatically maps high-level semantics to low-level measurements computed from phenotype imagery. The framework was applied to two different plant species with semantic rules mined and an ontology constructed. Rule quality was evaluated and showed high quality rules for most semantics. This framework also facilitates automatic annotation of phenotype images and can be adopted by different plant communities to aid in their research. CONCLUSIONS: The Computable Visually Observed Phenotype Ontological Framework for plants has been developed for more efficient and accurate management of visually observed phenotypes, which play a significant role in plant genomics research. The uniqueness of this framework is its ability to bridge the knowledge of informaticians and plant science researchers by translating descriptions of visually observed phenotypes into standardized, machine-understandable representations, thus enabling the development of advanced information retrieval and phenotype annotation analysis tools for the plant science community.
Assuntos
Fenótipo , Plantas/anatomia & histologia , Plantas/genética , Vocabulário Controlado , Algoritmos , Bases de Dados Genéticas , Frutas/anatomia & histologia , Genômica , Genótipo , Semântica , Zea mays/anatomia & histologia , Zea mays/genéticaRESUMO
SUMMARY: Methods to automatically integrate sequence information with physical and genetic maps are scarce. The Locus Lookup tool enables researchers to define windows of genomic sequence likely to contain loci of interest where only genetic or physical mapping associations are reported. Using the Locus Lookup tool, researchers will be able to locate specific genes more efficiently that will ultimately help them develop a better maize plant. With the availability of the well-documented source code, the tool can be easily adapted to other biological systems. AVAILABILITY: The Locus Lookup tool is available on the web at http://maizegdb.org/cgi-bin/locus_lookup.cgi. It is implemented in PHP, Oracle and Apache, with all major browsers supported. Source code is freely available for download at http://ftp.maizegdb.org/open_source/locus_lookup/.
Assuntos
Biologia Computacional/métodos , Genoma de Planta , Software , Zea mays/genética , Bases de Dados Genéticas , Internet , Análise de Sequência de DNA , Interface Usuário-ComputadorRESUMO
The Plant Ontology Consortium (POC, http://www.plantontology.org) is a collaborative effort among model plant genome databases and plant researchers that aims to create, maintain and facilitate the use of a controlled vocabulary (ontology) for plants. The ontology allows users to ascribe attributes of plant structure (anatomy and morphology) and developmental stages to data types, such as genes and phenotypes, to provide a semantic framework to make meaningful cross-species and database comparisons. The POC builds upon groundbreaking work by the Gene Ontology Consortium (GOC) by adopting and extending the GOC's principles, existing software and database structure. Over the past year, POC has added hundreds of ontology terms to associate with thousands of genes and gene products from Arabidopsis, rice and maize, which are available through a newly updated web-based browser (http://www.plantontology.org/amigo/go.cgi) for viewing, searching and querying. The Consortium has also implemented new functionalities to facilitate the application of PO in genomic research and updated the website to keep the contents current. In this report, we present a brief description of resources available from the website, changes to the interfaces, data updates, community activities and future enhancement.
Assuntos
Bases de Dados Genéticas , Genoma de Planta , Desenvolvimento Vegetal , Plantas/anatomia & histologia , Vocabulário Controlado , Genes de Plantas , Internet , Plantas/genética , Interface Usuário-ComputadorRESUMO
Maize (Zea mays L.) is one of the most important cereal crops and a model for the study of genetics, evolution, and domestication. To better understand maize genome organization and to build a framework for genome sequencing, we constructed a sequence-ready fingerprinted contig-based physical map that covers 93.5% of the genome, of which 86.1% is aligned to the genetic map. The fingerprinted contig map contains 25,908 genic markers that enabled us to align nearly 73% of the anchored maize genome to the rice genome. The distribution pattern of expressed sequence tags correlates to that of recombination. In collinear regions, 1 kb in rice corresponds to an average of 3.2 kb in maize, yet maize has a 6-fold genome size expansion. This can be explained by the fact that most rice regions correspond to two regions in maize as a result of its recent polyploid origin. Inversions account for the majority of chromosome structural variations during subsequent maize diploidization. We also find clear evidence of ancient genome duplication predating the divergence of the progenitors of maize and rice. Reconstructing the paleoethnobotany of the maize genome indicates that the progenitors of modern maize contained ten chromosomes.
Assuntos
Evolução Molecular , Genoma de Planta , Zea mays/genética , Mapeamento Cromossômico , Cromossomos Artificiais Bacterianos/genética , Cromossomos de Plantas/genética , Impressões Digitais de DNA , DNA de Plantas/genética , Grão Comestível/genética , Duplicação Gênica , Rearranjo Gênico , Oryza/genética , Filogenia , Especificidade da EspécieAssuntos
Disciplinas das Ciências Biológicas/métodos , Disciplinas das Ciências Biológicas/tendências , Biologia Computacional/tendências , Bases de Dados Factuais/tendências , Armazenamento e Recuperação da Informação/tendências , Internet/tendências , Animais , Escolha da Profissão , Biologia Computacional/educação , Biologia Computacional/métodos , Bases de Dados Factuais/estatística & dados numéricos , Educação de Pós-Graduação , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet/estatística & dados numéricos , Editoração/tendênciasRESUMO
MaizeGDB is the Maize Genetics and Genomics Database. Available at MaizeGDB are diverse data that support maize research including maps, gene product information, loci and their various alleles, phenotypes (both naturally occurring and as a result of directed mutagenesis), stocks, sequences, molecular markers, references and contact information for maize researchers worldwide. Also available through MaizeGDB are various community support service bulletin boards including the Editorial Board's list of high-impact papers, information about the Annual Maize Genetics Conference and the Jobs board where employment opportunities are posted. Reported here are data updates, improvements to interfaces and changes to standard operating procedures that have been made during the past 2 years. MaizeGDB is freely available and can be accessed online at http://www.maizegdb.org.
Assuntos
Bases de Dados Genéticas , Zea mays/genética , Mapeamento Cromossômico , Genômica , Internet , Interface Usuário-ComputadorRESUMO
Importance: Adjuvant chemotherapy in patients with stage III colon cancer prevents recurrence by eradicating minimal residual disease. However, which patients remain at high risk of recurrence after completing standard adjuvant treatment cannot currently be determined. Postsurgical circulating tumor DNA (ctDNA) analysis can detect minimal residual disease and is associated with recurrence in colorectal cancers. Objective: To determine whether serial postsurgical and postchemotherapy ctDNA analysis could provide a real-time indication of adjuvant therapy efficacy in stage III colon cancer. Design, Setting, and Participants: This multicenter, Australian, population-based cohort biomarker study recruited 100 consecutive patients with newly diagnosed stage III colon cancer planned for 24 weeks of adjuvant chemotherapy from November 1, 2014, through May 31, 2017. Patients with another malignant neoplasm diagnosed within the last 3 years were excluded. Median duration of follow-up was 28.9 months (range, 11.6-46.4 months). Physicians were blinded to ctDNA results. Data were analyzed from December 10, 2018, through June 23, 2019. Exposures: Serial plasma samples were collected after surgery and after chemotherapy. Somatic mutations in individual patients' tumors were identified via massively parallel sequencing of 15 genes commonly mutated in colorectal cancer. Personalized assays were designed to quantify ctDNA. Main Outcomes and Measures: Detection of ctDNA and recurrence-free interval (RFI). Results: After 4 exclusions, 96 eligible patients were eligible; median patient age was 64 years (range, 26-82 years); 49 (51%) were men. At least 1 somatic mutation was identified in the tumor tissue of all 96 evaluable patients. Circulating tumor DNA was detectable in 20 of 96 (21%) postsurgical samples and was associated with inferior recurrence-free survival (hazard ratio [HR], 3.8; 95% CI, 2.4-21.0; P < .001). Circulating tumor DNA was detectable in 15 of 88 (17%) postchemotherapy samples. The estimated 3-year RFI was 30% when ctDNA was detectable after chemotherapy and 77% when ctDNA was undetectable (HR, 6.8; 95% CI, 11.0-157.0; P < .001). Postsurgical ctDNA status remained independently associated with RFI after adjusting for known clinicopathologic risk factors (HR, 7.5; 95% CI, 3.5-16.1; P < .001). Conclusions and Relevance: Results suggest that ctDNA analysis after surgery is a promising prognostic marker in stage III colon cancer. Postchemotherapy ctDNA analysis may define a patient subset that remains at high risk of recurrence despite completing standard adjuvant treatment. This high-risk population presents a unique opportunity to explore additional therapeutic approaches.
Assuntos
Biomarcadores Tumorais/genética , Quimioterapia Adjuvante/métodos , DNA Tumoral Circulante/sangue , Neoplasias do Colo/tratamento farmacológico , Neoplasias do Colo/cirurgia , Mutação , Adulto , Idoso , Idoso de 80 Anos ou mais , Austrália , Biomarcadores Tumorais/sangue , Neoplasias do Colo/genética , Neoplasias do Colo/patologia , Intervalo Livre de Doença , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Pessoa de Meia-Idade , Estadiamento de Neoplasias , Neoplasia Residual , Medicina de Precisão , Prognóstico , Fatores de Risco , Resultado do TratamentoRESUMO
BACKGROUND: Little is known about hemodynamics in adult, out-of-hospital (OHCA) patients following return of spontaneous circulation (ROSC). A 1994 study when "high-dose epinephrine" use was common showed consistently elevated systemic vascular resistance (SVR) lasting ≥6â¯h in 49 adult patients after return of spontaneous circulation (ROSC). STUDY AIM: To characterize hemodynamic abnormalities in adult OHCA patients soon after ROSC. Our hypothesis was that, unlike the consistently high SVR values reported when "high-dose" epinephrine was in common use, there would be a more heterogenous distribution of SVR values using current adrenergic therapy. METHODS: We included adult, OHCA patients transported by paramedics to the Emergency Department (ED) post-ROSC. Children, prisoners, pregnant women, and those with ongoing CPR or arrest due to traumatic injury were excluded. Hemodynamics were recorded non-invasively as soon as feasible after ED arrival but were not used to influence therapy, which was guided by clinical judgment of treating ED physicians. RESULTS: Hemodynamics were recorded on 30 patients 20 [16,25] minutes after ED arrival: 50% had a normal SVR, 30% had a high SVR, and 20% had a low SVR. There was no difference in survival to admission among groups, although there was a difference among groups in survival to discharge. Comparing the low SVR group vs the combined normal and high group revealed a trend for fewer 0/6 (0%) low vs. 10/24 (42%) normal or high SVR patients surviving to hospital discharge (pâ¯=â¯.053). CONCLUSION: A heterogeneous range of hemodynamic states exist post-ROSC rather than consistent vasoconstriction. Adequately powered, randomized clinical trials will be needed to determine whether noninvasively-derived, hemodynamic-directed therapy can play a role in improving neurologically-intact survival following OHCA in adults.
Assuntos
Reanimação Cardiopulmonar/estatística & dados numéricos , Serviço Hospitalar de Emergência/estatística & dados numéricos , Parada Cardíaca Extra-Hospitalar/mortalidade , Resistência Vascular , Reanimação Cardiopulmonar/métodos , Epinefrina/uso terapêutico , Feminino , Hemodinâmica , Humanos , Masculino , Pessoa de Meia-Idade , Parada Cardíaca Extra-Hospitalar/tratamento farmacológico , Parada Cardíaca Extra-Hospitalar/fisiopatologia , Estudos Prospectivos , Fatores de Tempo , Tempo para o Tratamento , Vasoconstritores/uso terapêuticoRESUMO
BACKGROUND: Molecular markers serve three important functions in physical map assembly. First, they provide anchor points to genetic maps facilitating functional genomic studies. Second, they reduce the overlap required for BAC contig assembly from 80 to 50 percent. Finally, they validate assemblies based solely on BAC fingerprints. We employed a six-dimensional BAC pooling strategy in combination with a high-throughput PCR-based screening method to anchor the maize genetic and physical maps. RESULTS: A total of 110,592 maize BAC clones (approximately 6x haploid genome equivalents) were pooled into six different matrices, each containing 48 pools of BAC DNA. The quality of the BAC DNA pools and their utility for identifying BACs containing target genomic sequences was tested using 254 PCR-based STS markers. Five types of PCR-based STS markers were screened to assess potential uses for the BAC pools. An average of 4.68 BAC clones were identified per marker analyzed. These results were integrated with BAC fingerprint data generated by the Arizona Genomics Institute (AGI) and the Arizona Genomics Computational Laboratory (AGCoL) to assemble the BAC contigs using the FingerPrinted Contigs (FPC) software and contribute to the construction and anchoring of the physical map. A total of 234 markers (92.5%) anchored BAC contigs to their genetic map positions. The results can be viewed on the integrated map of maize 12. CONCLUSION: This BAC pooling strategy is a rapid, cost effective method for genome assembly and anchoring. The requirement for six replicate positive amplifications makes this a robust method for use in large genomes with high amounts of repetitive DNA such as maize. This strategy can be used to physically map duplicate loci, provide order information for loci in a small genetic interval or with no genetic recombination, and loci with conflicting hybridization-based information.
Assuntos
Cromossomos Artificiais Bacterianos , Genoma de Planta , Reação em Cadeia da Polimerase/métodos , Sequências Repetitivas de Ácido Nucleico , Zea mays/genética , Primers do DNA , DNA de Plantas/genética , Marcadores Genéticos , Fatores de Transcrição/genéticaRESUMO
There are thousands of maize mutants, which are invaluable resources for plant research. Geneticists use them to study underlying mechanisms of biochemistry, cell biology, cell development, and cell physiology. To streamline the understanding of such complex processes, researchers need the most current versions of genetic and physical maps, tools with the ability to recognize novel phenotypes or classify known phenotypes, and an intimate knowledge of the biochemical processes generating physiological and phenotypic effects. They must also know how all of these factors change and differ among species, diverse alleles, germplasms, and environmental conditions. While there are robust databases, such as MaizeGDB, for some of these types of raw data, other crucial components are missing. Moreover, the management of visually observed mutant phenotypes is still in its infant stage, let alone the complex query methods that can draw upon high-level and aggregated information to answer the questions of geneticists. In this paper, we address the scientific challenge and propose to develop a robust framework for managing the knowledge of visually observed phenotypes, mining the correlation of visual characteristics with genetic maps, and discovering the knowledge relating to cross-species conservation of visual and genetic patterns. The ultimate goal of this research is to allow a geneticist to submit phenotypic and genomic information on a mutant to a knowledge base and ask, "What genes or environmental factors cause this visually observed phenotype?".
Assuntos
Mutação , Fenótipo , Zea mays/genética , Biologia Computacional , Bases de Dados Genéticas , Genes de Plantas , Processamento de Imagem Assistida por Computador , Bases de Conhecimento , Zea mays/anatomia & histologiaRESUMO
The Maize Genetics and Genomics Database (MaizeGDB) team prepared a survey to identify breeders' needs for visualizing pedigrees, diversity data and haplotypes in order to prioritize tool development and curation efforts at MaizeGDB. The survey was distributed to the maize research community on behalf of the Maize Genetics Executive Committee in Summer 2015. The survey garnered 48 responses from maize researchers, of which more than half were self-identified as breeders. The survey showed that the maize researchers considered their top priorities for visualization as: (i) displaying single nucleotide polymorphisms in a given region for a given list of lines, (ii) showing haplotypes for a given list of lines and (iii) presenting pedigree relationships visually. The survey also asked which populations would be most useful to display. The following two populations were on top of the list: (i) 3000 publicly available maize inbred lines used in Romay et al. (Comprehensive genotyping of the USA national maize inbred seed bank. Genome Biol, 2013;14:R55) and (ii) maize lines with expired Plant Variety Protection Act (ex-PVP) certificates. Driven by this strong stakeholder input, MaizeGDB staff are currently working in four areas to improve its interface and web-based tools: (i) presenting immediate progenies of currently available stocks at the MaizeGDB Stock pages, (ii) displaying the most recent ex-PVP lines described in the Germplasm Resources Information Network (GRIN) on the MaizeGDB Stock pages, (iii) developing network views of pedigree relationships and (iv) visualizing genotypes from SNP-based diversity datasets. These survey results can help other biological databases to direct their efforts according to user preferences as they serve similar types of data sets for their communities. Database URL: https://www.maizegdb.org.
Assuntos
Bases de Dados Genéticas , Variação Genética , Haplótipos , Anotação de Sequência Molecular/métodos , Interface Usuário-Computador , Navegador , Zea mays/genética , Anotação de Sequência Molecular/normasRESUMO
BACKGROUND: As metabolic pathway resources become more commonly available, researchers have unprecedented access to information about their organism of interest. Despite efforts to ensure consistency between various resources, information content and quality can vary widely. Two maize metabolic pathway resources for the B73 inbred line, CornCyc 4.0 and MaizeCyc 2.2, are based on the same gene model set and were developed using Pathway Tools software. These resources differ in their initial enzymatic function assignments and in the extent of manual curation. We present an in-depth comparison between CornCyc and MaizeCyc to demonstrate the effect of initial computational enzymatic function assignments on the quality and content of metabolic pathway resources. RESULTS: These two resources are different in their content. MaizeCyc contains GO annotations for over 21,000 genes that CornCyc is missing. CornCyc contains on average 1.6 transcripts per gene, while MaizeCyc contains almost no alternate splicing. MaizeCyc also does not match CornCyc's breadth in representing the metabolic domain; MaizeCyc has fewer compounds, reactions, and pathways than CornCyc. CornCyc's computational predictions are more accurate than those in MaizeCyc when compared to experimentally determined function assignments, demonstrating the relative strength of the enzymatic function assignment pipeline used to generate CornCyc. CONCLUSIONS: Our results show that the quality of initial enzymatic function assignments primarily determines the quality of the final metabolic pathway resource. Therefore, biologists should pay close attention to the methods and information sources used to develop a metabolic pathway resource to gauge the utility of using such functional assignments to construct hypotheses for experimental studies.
Assuntos
Biologia Computacional , Zea mays/metabolismo , Anotação de Sequência Molecular , Proteínas de Plantas/metabolismo , Zea mays/enzimologiaRESUMO
Gene function curation via Gene Ontology (GO) annotation is a common task among Model Organism Database groups. Owing to its manual nature, this task is considered one of the bottlenecks in literature curation. There have been many previous attempts at automatic identification of GO terms and supporting information from full text. However, few systems have delivered an accuracy that is comparable with humans. One recognized challenge in developing such systems is the lack of marked sentence-level evidence text that provides the basis for making GO annotations. We aim to create a corpus that includes the GO evidence text along with the three core elements of GO annotations: (i) a gene or gene product, (ii) a GO term and (iii) a GO evidence code. To ensure our results are consistent with real-life GO data, we recruited eight professional GO curators and asked them to follow their routine GO annotation protocols. Our annotators marked up more than 5000 text passages in 200 articles for 1356 distinct GO terms. For evidence sentence selection, the inter-annotator agreement (IAA) results are 9.3% (strict) and 42.7% (relaxed) in F1-measures. For GO term selection, the IAAs are 47% (strict) and 62.9% (hierarchical). Our corpus analysis further shows that abstracts contain â¼ 10% of relevant evidence sentences and 30% distinct GO terms, while the Results/Experiment section has nearly 60% relevant sentences and >70% GO terms. Further, of those evidence sentences found in abstracts, less than one-third contain enough experimental detail to fulfill the three core criteria of a GO annotation. This result demonstrates the need of using full-text articles for text mining GO annotations. Through its use at the BioCreative IV GO (BC4GO) task, we expect our corpus to become a valuable resource for the BioNLP research community. Database URL: http://www.biocreative.org/resources/corpora/bc-iv-go-task-corpus/.