Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
1.
Nucleic Acids Res ; 52(D1): D938-D949, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-38000386

ABSTRACT

Bridging the gap between genetic variations, environmental determinants, and phenotypic outcomes is critical for supporting clinical diagnosis and understanding mechanisms of diseases. It requires integrating open data at a global scale. The Monarch Initiative advances these goals by developing open ontologies, semantic data models, and knowledge graphs for translational research. The Monarch App is an integrated platform combining data about genes, phenotypes, and diseases across species. Monarch's APIs enable access to carefully curated datasets and advanced analysis tools that support the understanding and diagnosis of disease for diverse applications such as variant prioritization, deep phenotyping, and patient profile-matching. We have migrated our system into a scalable, cloud-based infrastructure; simplified Monarch's data ingestion and knowledge graph integration systems; enhanced data mapping and integration standards; and developed a new user interface with novel search and graph navigation features. Furthermore, we advanced Monarch's analytic tools by developing a customized plugin for OpenAI's ChatGPT to increase the reliability of its responses about phenotypic data, allowing us to interrogate the knowledge in the Monarch graph using state-of-the-art Large Language Models. The resources of the Monarch Initiative can be found at monarchinitiative.org and its corresponding code repository at github.com/monarch-initiative/monarch-app.


Subject(s)
Databases, Factual , Disease , Genes , Phenotype , Humans , Internet , Databases, Factual/standards , Software , Genes/genetics , Disease/genetics
2.
Bioinformatics ; 39(7)2023 07 01.
Article in English | MEDLINE | ID: mdl-37389415

ABSTRACT

MOTIVATION: Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of KGs is lacking. RESULTS: Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of KGs. Features include a simple, modular extract-transform-load pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research. KG-Hub is equipped with tooling to easily analyze and manipulate KGs. KG-Hub is also tightly integrated with graph machine learning (ML) tools which allow automated graph ML, including node embeddings and training of models for link prediction and node classification. AVAILABILITY AND IMPLEMENTATION: https://kghub.org.


Subject(s)
Biological Ontologies , COVID-19 , Humans , Pattern Recognition, Automated , Rare Diseases , Machine Learning
3.
Proc Natl Acad Sci U S A ; 106(30): 12273-8, 2009 Jul 28.
Article in English | MEDLINE | ID: mdl-19597147

ABSTRACT

Rice, the primary source of dietary calories for half of humanity, is the first crop plant for which a high-quality reference genome sequence from a single variety was produced. We used resequencing microarrays to interrogate 100 Mb of the unique fraction of the reference genome for 20 diverse varieties and landraces that capture the impressive genotypic and phenotypic diversity of domesticated rice. Here, we report the distribution of 160,000 nonredundant SNPs. Introgression patterns of shared SNPs revealed the breeding history and relationships among the 20 varieties; some introgressed regions are associated with agronomic traits that mark major milestones in rice improvement. These comprehensive SNP data provide a foundation for deep exploration of rice diversity and gene-trait relationships and their use for future rice improvement.


Subject(s)
Genetic Variation , Genome, Plant/genetics , Oryza/genetics , Polymorphism, Single Nucleotide , Chromosome Mapping , Chromosomes, Plant/genetics , Gene Frequency , Genotype , Molecular Sequence Data , Oryza/classification , Phylogeny , Quantitative Trait Loci/genetics , Sequence Analysis, DNA , Species Specificity
4.
PLoS One ; 16(3): e0231916, 2021.
Article in English | MEDLINE | ID: mdl-33755673

ABSTRACT

AVAILABILITY: The API and associated software is open source and currently available for access at https://github.com/NCATS-Tangerine/translator-knowledge-beacon.


Subject(s)
Knowledge , Software , Databases, Factual , Internet
5.
Nucleic Acids Res ; 36(Database issue): D943-6, 2008 Jan.
Article in English | MEDLINE | ID: mdl-17933772

ABSTRACT

The Generation Challenge Programme (GCP; www.generationcp.org) has developed an online resource documenting stress-responsive genes comparatively across plant species. This public resource is a compendium of protein families, phylogenetic trees, multiple sequence alignments (MSA) and associated experimental evidence. The central objective of this resource is to elucidate orthologous and paralogous relationships between plant genes that may be involved in response to environmental stress, mainly abiotic stresses such as water deficit ('drought'). The web-based graphical user interface (GUI) of the resource includes query and visualization tools that allow diverse searches and browsing of the underlying project database. The web interface can be accessed at http://dayhoff.generationcp.org.


Subject(s)
Crops, Agricultural/genetics , Databases, Genetic , Genes, Plant , Crops, Agricultural/metabolism , Dehydration , Environment , Gene Expression Profiling , Internet , Phylogeny , Plant Proteins/chemistry , Plant Proteins/classification , Sequence Alignment , User-Computer Interface
6.
Plant Physiol ; 139(2): 637-42, 2005 Oct.
Article in English | MEDLINE | ID: mdl-16219924

ABSTRACT

Ambiguous germplasm identification; difficulty in tracing pedigree information; and lack of integration between genetic resources, characterization, breeding, evaluation, and utilization data are constraints in developing knowledge-intensive crop improvement programs. To address these constraints, the International Crop Information System (www.icis.cgiar.org), a database system for the management and integration of global information on genetic resources and crop improvement for any crop, was developed by genetic resource specialists, crop scientists, and information technicians associated with the Consultative Group for International Agricultural Research and collaborative partners. The International Rice Information System (www.iris.irri.org) is the rice (Oryza species) implementation of the International Crop Information System. New components are now being added to the International Rice Information System to handle the diversity of rice functional genomics data including genomic sequence data, molecular genetic data, expression data, and proteomic information. Users access information in the database through stand-alone programs and Web interfaces, which offer specialized applications and customized views to researchers with different interests.


Subject(s)
Databases, Genetic , Information Systems , Oryza/genetics , Breeding , Computational Biology , Internet , Management Information Systems , Meta-Analysis as Topic , Software
7.
Bioinformatics ; 20(2): 155-60, 2004 Jan 22.
Article in English | MEDLINE | ID: mdl-14734305

ABSTRACT

MOTIVATION: The high content of repetitive sequences in the genomes of many higher eukaryotes renders the task of annotating them computationally intensive. Presently, the only widely accepted method of searching and annotating transposable elements (TEs) in large genomic sequences is the use of the RepeatMasker program, which identifies new copies of TEs by pairwise sequence comparisons with a library of known TEs. Profile hidden Markov models (HMMs) have been used successfully in discovering distant homologs of known proteins in large protein databases, but this approach has only rarely been applied to known model TE families in genomic DNA. RESULTS: We used a combination of computational approaches to annotate the TEs in the finished genome of Oryza sativa ssp. japonica. In this paper, we discuss the strengths and the weaknesses of the annotation methods used. These approaches included: the default configuration of RepeatMasker using cross_match, an implementation of the Smith-Waterman-Gotoh algorithm; RepeatMasker using WU-BLAST for similarity searching; and the HMMER package, used to search for TEs with profile HMMs. All the results were converted into GFF format and post-processed using a set of Perl scripts. RepeatMasker was used in the case of most TE families. The WU-BLAST implementation of RepeatMasker was found to be manifold faster than cross_match with only a slight loss in sensitivity and was thus used to obtain the final set of data. HMMER was used in the annotation of the Mutator-like element (MULE) superfamily and the miniature inverted-repeat transposable element (MITE) polyphyletic group of families, for which large libraries of elements were available and which could be divided into well-defined families. The HMMER search algorithm was extremely slow for models over 1000 bp in length, so MULE families with members over 1000 bp long were processed with RepeatMasker instead. The main disadvantage of HMMER in this application is that, since it was developed with protein sequences in mind, it does not search the negative DNA strand. With the exception of TE families with essentially palindromic sequences, reverse complement models had to be created and run to compensate for this shortcoming. We conclude that a modification of RepeatMasker to incorporate libraries of profile HMMs in searches could improve the ability to detect degenerated copies of TEs. AVAILABILITY: The Perl scripts and TE sequences used in construction of the RepeatMasker library and the profile HMMs are available upon request.


Subject(s)
Algorithms , DNA Transposable Elements/genetics , Documentation , Gene Expression Profiling/methods , Genome, Plant , Oryza/genetics , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Databases, Nucleic Acid , Models, Genetic , Models, Statistical , Software
8.
Bioinformatics ; 19 Suppl 1: i63-5, 2003.
Article in English | MEDLINE | ID: mdl-12855438

ABSTRACT

The International Rice Information System (IRIS, http://www.iris.irri.org) is the rice implementation of the International Crop Information System (ICIS, http://www.icis.cgiar.org), a database system for the management and integration of global information on genetic resources and germplasm improvement for any crop. Building upon the germplasm genealogy and field data components of ICIS, IRIS is being extended to handle diverse rice genomics data including: genetic mapping, genome annotation, genotype, mutant, transcripteome, proteome and metabolomic data. Users can access information in the database through stand-alone programs and WWW interfaces offering specialist views to researchers with different interests.


Subject(s)
Database Management Systems , Databases, Genetic , Information Storage and Retrieval/methods , Oryza/genetics , Oryza/metabolism , Software , User-Computer Interface , Gene Expression Profiling/methods , Genotype , Information Dissemination/methods , Internationality , Oryza/classification , Phenotype , Plant Proteins/classification , Plant Proteins/genetics , Plant Proteins/metabolism , Software Design , Systems Integration
SELECTION OF CITATIONS
SEARCH DETAIL