Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
1.
Nature ; 583(7815): 265-270, 2020 07.
Article in English | MEDLINE | ID: mdl-32581361

ABSTRACT

Cancers arise through the acquisition of oncogenic mutations and grow by clonal expansion1,2. Here we reveal that most mutagenic DNA lesions are not resolved into a mutated DNA base pair within a single cell cycle. Instead, DNA lesions segregate, unrepaired, into daughter cells for multiple cell generations, resulting in the chromosome-scale phasing of subsequent mutations. We characterize this process in mutagen-induced mouse liver tumours and show that DNA replication across persisting lesions can produce multiple alternative alleles in successive cell divisions, thereby generating both multiallelic and combinatorial genetic diversity. The phasing of lesions enables accurate measurement of strand-biased repair processes, quantification of oncogenic selection and fine mapping of sister-chromatid-exchange events. Finally, we demonstrate that lesion segregation is a unifying property of exogenous mutagens, including UV light and chemotherapy agents in human cells and tumours, which has profound implications for the evolution and adaptation of cancer genomes.


Subject(s)
Chromosome Segregation/genetics , Evolution, Molecular , Genome/genetics , Neoplasms/genetics , Alleles , Animals , DNA Repair , DNA Replication , ErbB Receptors/metabolism , Humans , Liver Neoplasms/genetics , Liver Neoplasms/pathology , Male , Mice , Mutation , Neoplasms/pathology , Selection, Genetic , Signal Transduction , Sister Chromatid Exchange , Transcription, Genetic , raf Kinases/metabolism , ras Proteins/metabolism
2.
Nucleic Acids Res ; 2024 Oct 22.
Article in English | MEDLINE | ID: mdl-39436012

ABSTRACT

The IPD-MHC Database project (http://www.ebi.ac.uk/ipd/mhc/) serves as a comprehensive and expertly curated repository for major histocompatibility complex (MHC) sequences from non-human species, providing the necessary infrastructure and tools to study the function and evolution of this highly polymorphic genomic region. In its latest version, the IPD-MHC database has expanded both in content and in the tools for data visualization and comparison. The database now hosts over 18 000 MHC alleles from 125 species, organized into eleven taxonomic groups, all manually curated and named by the Comparative MHC Nomenclature Committee. A cetacean section has recently been included, offering researchers valuable data to study the immune system of whales, dolphins, and porpoises, as well establishing the official nomenclature platform for the Cetacea Leukocyte Antigens (CeLA). In response to user demand and reflecting broader trends in bioinformatics and immunogenetics, IPD-MHC now includes the predicted tertiary structure of over 8000 alleles and allows comparison and visualisation of allele variation within and between species at single residue resolution. These latest developments maintain the critically important link between official nomenclature of curated alleles and the ability to analyse this complex polymorphism using the most up to date methods within a single repository.

3.
Nucleic Acids Res ; 50(D1): D837-D847, 2022 01 07.
Article in English | MEDLINE | ID: mdl-34788826

ABSTRACT

Since 2005, the Pathogen-Host Interactions Database (PHI-base) has manually curated experimentally verified pathogenicity, virulence and effector genes from fungal, bacterial and protist pathogens, which infect animal, plant, fish, insect and/or fungal hosts. PHI-base (www.phi-base.org) is devoted to the identification and presentation of phenotype information on pathogenicity and effector genes and their host interactions. Specific gene alterations that did not alter the in host interaction phenotype are also presented. PHI-base is invaluable for comparative analyses and for the discovery of candidate targets in medically and agronomically important species for intervention. Version 4.12 (September 2021) contains 4387 references, and provides information on 8411 genes from 279 pathogens, tested on 228 hosts in 18, 190 interactions. This provides a 24% increase in gene content since Version 4.8 (September 2019). Bacterial and fungal pathogens represent the majority of the interaction data, with a 54:46 split of entries, whilst protists, protozoa, nematodes and insects represent 3.6% of entries. Host species consist of approximately 54% plants and 46% others of medical, veterinary and/or environmental importance. PHI-base data is disseminated to UniProtKB, FungiDB and Ensembl Genomes. PHI-base will migrate to a new gene-centric version (version 5.0) in early 2022. This major development is briefly described.


Subject(s)
Databases, Factual , Host-Pathogen Interactions/genetics , Phenotype , User-Computer Interface , Animals , Apicomplexa/classification , Apicomplexa/genetics , Apicomplexa/pathogenicity , Bacteria/classification , Bacteria/genetics , Bacteria/pathogenicity , Diplomonadida/classification , Diplomonadida/genetics , Diplomonadida/pathogenicity , Fungi/classification , Fungi/genetics , Fungi/pathogenicity , Insecta/classification , Insecta/genetics , Insecta/pathogenicity , Internet , Nematoda/classification , Nematoda/genetics , Nematoda/pathogenicity , Phylogeny , Plants/microbiology , Plants/parasitology , Virulence
4.
Nucleic Acids Res ; 50(D1): D765-D770, 2022 01 07.
Article in English | MEDLINE | ID: mdl-34634797

ABSTRACT

The COVID-19 pandemic has seen unprecedented use of SARS-CoV-2 genome sequencing for epidemiological tracking and identification of emerging variants. Understanding the potential impact of these variants on the infectivity of the virus and the efficacy of emerging therapeutics and vaccines has become a cornerstone of the fight against the disease. To support the maximal use of genomic information for SARS-CoV-2 research, we launched the Ensembl COVID-19 browser; the first virus to be encompassed within the Ensembl platform. This resource incorporates a new Ensembl gene set, multiple variant sets, and annotation from several relevant resources aligned to the reference SARS-CoV-2 assembly. Since the first release in May 2020, the content has been regularly updated using our new rapid release workflow, and tools such as the Ensembl Variant Effect Predictor have been integrated. The Ensembl COVID-19 browser is freely available at https://covid-19.ensembl.org.


Subject(s)
COVID-19/virology , Databases, Genetic , SARS-CoV-2/genetics , Web Browser , Coronaviridae/genetics , Genetic Variation , Genome, Viral , Humans , Molecular Sequence Annotation
5.
Nucleic Acids Res ; 50(D1): D996-D1003, 2022 01 07.
Article in English | MEDLINE | ID: mdl-34791415

ABSTRACT

Ensembl Genomes (https://www.ensemblgenomes.org) provides access to non-vertebrate genomes and analysis complementing vertebrate resources developed by the Ensembl project (https://www.ensembl.org). The two resources collectively present genome annotation through a consistent set of interfaces spanning the tree of life presenting genome sequence, annotation, variation, transcriptomic data and comparative analysis. Here, we present our largest increase in plant, metazoan and fungal genomes since the project's inception creating one of the world's most comprehensive genomic resources and describe our efforts to reduce genome redundancy in our Bacteria portal. We detail our new efforts in gene annotation, our emerging support for pangenome analysis, our efforts to accelerate data dissemination through the Ensembl Rapid Release resource and our new AlphaFold visualization. Finally, we present details of our future plans including updates on our integration with Ensembl, and how we plan to improve our support for the microbial research community. Software and data are made available without restriction via our website, online tools platform and programmatic interfaces (available under an Apache 2.0 license). Data updates are synchronised with Ensembl's release cycle.


Subject(s)
Databases, Genetic , Genomics , Internet , Software , Animals , Computational Biology , Genome, Bacterial/genetics , Genome, Fungal/genetics , Genome, Plant/genetics , Plants/classification , Plants/genetics , Vertebrates/classification , Vertebrates/genetics
6.
Hum Mutat ; 43(8): 986-997, 2022 08.
Article in English | MEDLINE | ID: mdl-34816521

ABSTRACT

The Ensembl Variant Effect Predictor (VEP) is a freely available, open-source tool for the annotation and filtering of genomic variants. It predicts variant molecular consequences using the Ensembl/GENCODE or RefSeq gene sets. It also reports phenotype associations from databases such as ClinVar, allele frequencies from studies including gnomAD, and predictions of deleteriousness from tools such as Sorting Intolerant From Tolerant and Combined Annotation Dependent Depletion. Ensembl VEP includes filtering options to customize variant prioritization. It is well supported and updated roughly quarterly to incorporate the latest gene, variant, and phenotype association information. Ensembl VEP analysis can be performed using a highly configurable, extensible command-line tool, a Representational State Transfer application programming interface, and a user-friendly web interface. These access methods are designed to suit different levels of bioinformatics experience and meet different needs in terms of data size, visualization, and flexibility. In this tutorial, we will describe performing variant annotation using the Ensembl VEP web tool, which enables sophisticated analysis through a simple interface.


Subject(s)
Genomics , Software , Computational Biology , Databases, Genetic , Gene Frequency , Humans , Molecular Sequence Annotation , Phenotype
7.
Bioinformatics ; 38(1): 299-300, 2021 12 22.
Article in English | MEDLINE | ID: mdl-34260694

ABSTRACT

MOTIVATION: Reference sequences are essential in creating a baseline of knowledge for many common bioinformatics methods, especially those using genomic sequencing. RESULTS: We have created refget, a Global Alliance for Genomics and Health API specification to access reference sequences and sub-sequences using an identifier derived from the sequence itself. We present four reference implementations across in-house and cloud infrastructure, a compliance suite and a web report used to ensure specification conformity across implementations. AVAILABILITY AND IMPLEMENTATION: The refget specification can be found at: https://w3id.org/ga4gh/refget. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genomics , Software
8.
Nucleic Acids Res ; 48(D1): D682-D688, 2020 01 08.
Article in English | MEDLINE | ID: mdl-31691826

ABSTRACT

The Ensembl (https://www.ensembl.org) is a system for generating and distributing genome annotation such as genes, variation, regulation and comparative genomics across the vertebrate subphylum and key model organisms. The Ensembl annotation pipeline is capable of integrating experimental and reference data from multiple providers into a single integrated resource. Here, we present 94 newly annotated and re-annotated genomes, bringing the total number of genomes offered by Ensembl to 227. This represents the single largest expansion of the resource since its inception. We also detail our continued efforts to improve human annotation, developments in our epigenome analysis and display, a new tool for imputing causal genes from genome-wide association studies and visualisation of variation within a 3D protein model. Finally, we present information on our new website. Both software and data are made available without restriction via our website, online tools platform and programmatic interfaces (available under an Apache 2.0 license) and data updates made available four times a year.


Subject(s)
Computational Biology/methods , Databases, Genetic , Epigenome , Molecular Sequence Annotation , Algorithms , Animals , Computer Graphics , Databases, Protein , Genetic Variation , Genome-Wide Association Study , Genomics , Histones/metabolism , Humans , Imaging, Three-Dimensional , Internet , Ligands , Search Engine , Software , Species Specificity , Transcriptome , User-Computer Interface , Web Browser
9.
Hum Mutat ; 39(11): 1686-1689, 2018 11.
Article in English | MEDLINE | ID: mdl-30311379

ABSTRACT

The Clinical Genome Resource (ClinGen)'s work to develop a knowledge base to support the understanding of genes and variants for use in precision medicine and research depends on robust, broadly applicable, and adaptable technical standards for sharing data and information. To forward this goal, ClinGen has joined with the Global Alliance for Genomics and Health (GA4GH) to support the development of open, freely-available technical standards and regulatory frameworks for secure and responsible sharing of genomic and health-related data. In its capacity as one of the 15 inaugural GA4GH "Driver Projects," ClinGen is providing input on the key standards needs of the global genomics community, and has committed to participate on GA4GH Work Streams to support the development of: (1) a standard model for computer-readable variant representation; (2) a data model for linking variant data to annotations; (3) a specification to enable sharing of genomic variant knowledge and associated clinical interpretations; and (4) a set of best practices for use of phenotype and disease ontologies. ClinGen's participation as a GA4GH Driver Project will provide a robust environment to test drive emerging genomic knowledge sharing standards and prove their utility among the community, while accelerating the construction of the ClinGen evidence base.


Subject(s)
Genome, Human/genetics , Information Dissemination/methods , Computational Biology , Databases, Genetic , Genetic Variation , Genomics , Humans , Precision Medicine
11.
Pac Symp Biocomput ; 28: 383-394, 2023.
Article in English | MEDLINE | ID: mdl-36540993

ABSTRACT

As the diversity of genomic variation data increases with our growing understanding of the role of variation in health and disease, it is critical to develop standards for precise inter-system exchange of these data for research and clinical applications. The Global Alliance for Genomics and Health (GA4GH) Variation Representation Specification (VRS) meets this need through a technical terminology and information model for disambiguating and concisely representing variation concepts. Here we discuss the recent Genotype model in VRS, which may be used to represent the allelic composition of a genetic locus. We demonstrate the use of the Genotype model and the constituent Haplotype model for the precise and interoperable representation of pharmacogenomic diplotypes, HGVS variants, and VCF records using VRS and discuss how this can be leveraged to enable interoperable exchange and search operations between assayed variation and genomic knowledgebases.


Subject(s)
Computational Biology , Genetic Variation , Humans , Databases, Genetic , Genomics , Genotype
12.
Nat Genet ; 53(9): 1290-1299, 2021 09.
Article in English | MEDLINE | ID: mdl-34493866

ABSTRACT

Many gene expression quantitative trait locus (eQTL) studies have published their summary statistics, which can be used to gain insight into complex human traits by downstream analyses, such as fine mapping and co-localization. However, technical differences between these datasets are a barrier to their widespread use. Consequently, target genes for most genome-wide association study (GWAS) signals have still not been identified. In the present study, we present the eQTL Catalogue ( https://www.ebi.ac.uk/eqtl ), a resource of quality-controlled, uniformly re-computed gene expression and splicing QTLs from 21 studies. We find that, for matching cell types and tissues, the eQTL effect sizes are highly reproducible between studies. Although most QTLs were shared between most bulk tissues, we identified a greater diversity of cell-type-specific QTLs from purified cell types, a subset of which also manifested as new disease co-localizations. Our summary statistics are freely available to enable the systematic interpretation of human GWAS associations across many cell types and tissues.


Subject(s)
Databases, Genetic , Gene Expression Regulation/genetics , Quantitative Trait Loci/genetics , Quantitative Trait, Heritable , CD4-Positive T-Lymphocytes/cytology , Datasets as Topic , Genome-Wide Association Study , Humans , Multifactorial Inheritance/genetics , Polymorphism, Single Nucleotide/genetics
13.
Cell Genom ; 1(2)2021 Nov 10.
Article in English | MEDLINE | ID: mdl-35311178

ABSTRACT

Maximizing the personal, public, research, and clinical value of genomic information will require the reliable exchange of genetic variation data. We report here the Variation Representation Specification (VRS, pronounced "verse"), an extensible framework for the computable representation of variation that complements contemporary human-readable and flat file standards for genomic variation representation. VRS provides semantically precise representations of variation and leverages this design to enable federated identification of biomolecular variation with globally consistent and unique computed identifiers. The VRS framework includes a terminology and information model, machine-readable schema, data sharing conventions, and a reference implementation, each of which is intended to be broadly useful and freely available for community use. VRS was developed by a partnership among national information resource providers, public initiatives, and diagnostic testing laboratories under the auspices of the Global Alliance for Genomics and Health (GA4GH).

14.
Front Microbiol ; 10: 2477, 2019.
Article in English | MEDLINE | ID: mdl-31787936

ABSTRACT

Accurate and comprehensive annotation of genomic sequences underpins advances in managing plant disease. However, important plant pathogens still have incomplete and inconsistent gene sets and lack dedicated funding or teams to improve this annotation. This paper describes a collaborative approach to gene curation to address this shortcoming. In the first instance, over 40 members of the Botrytis cinerea community from eight countries, with training and infrastructural support from Ensembl Fungi, used the gene editing tool Apollo to systematically review the entire gene set (11,707 protein coding genes) in 6-7 months. This has subsequently been checked and disseminated. Following this, a similar project for another pathogen, Blumeria graminis f. sp. hordei, also led to a completely redefined gene set. Currently, we are working with the Zymoseptoria tritici community to enable them to achieve the same. While the tangible outcome of these projects is improved gene sets, it is apparent that the inherent agreement and ownership of a single gene set by research teams as they undergo this curation process are consequential to the acceleration of research in the field. With the generation of large data sets increasingly affordable, there is value in unifying both the divergent data sets and their associated research teams, pooling time, expertise, and resources. Community-driven annotation efforts can pave the way for a new kind of collaboration among pathogen research communities to generate well-annotated reference data sets, beneficial not just for the genome being examined but for related species and the refinement of automatic gene prediction tools.

SELECTION OF CITATIONS
SEARCH DETAIL