Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 231
Filtrar
1.
Genome Biol ; 25(1): 146, 2024 06 06.
Artículo en Inglés | MEDLINE | ID: mdl-38844976

RESUMEN

BACKGROUND: DNA methylation is an important epigenetic modification which has numerous roles in modulating genome function. Its levels are spatially correlated across the genome, typically high in repressed regions but low in transcription factor (TF) binding sites and active regulatory regions. However, the mechanisms establishing genome-wide and TF binding site methylation patterns are still unclear. RESULTS: Here we use a comparative approach to investigate the association of DNA methylation to TF binding evolution in mammals. Specifically, we experimentally profile DNA methylation and combine this with published occupancy profiles of five distinct TFs (CTCF, CEBPA, HNF4A, ONECUT1, FOXA1) in the liver of five mammalian species (human, macaque, mouse, rat, dog). TF binding sites are lowly methylated, but they often also have intermediate methylation levels. Furthermore, biding sites are influenced by the methylation status of CpGs in their wider binding regions even when CpGs are absent from the core binding motif. Employing a classification and clustering approach, we extract distinct and species-conserved patterns of DNA methylation levels at TF binding regions. CEBPA, HNF4A, ONECUT1, and FOXA1 share the same methylation patterns, while CTCF's differ. These patterns characterize alternative functions and chromatin landscapes of TF-bound regions. Leveraging our phylogenetic framework, we find DNA methylation gain upon evolutionary loss of TF occupancy, indicating coordinated evolution. Furthermore, each methylation pattern has its own evolutionary trajectory reflecting its genomic contexts. CONCLUSIONS: Our epigenomic analyses indicate a role for DNA methylation in TF binding changes across species including that specific DNA methylation profiles characterize TF binding and are associated with their regulatory activity, chromatin contexts, and evolutionary trajectories.


Asunto(s)
Metilación de ADN , Evolución Molecular , Factores de Transcripción , Animales , Sitios de Unión , Humanos , Factores de Transcripción/metabolismo , Factores de Transcripción/genética , Ratones , Ratas , Islas de CpG , Perros , Factor Nuclear 3-alfa del Hepatocito/metabolismo , Factor Nuclear 3-alfa del Hepatocito/genética , Unión Proteica , Hígado/metabolismo , Factor Nuclear 4 del Hepatocito/metabolismo , Factor Nuclear 4 del Hepatocito/genética , Proteínas Potenciadoras de Unión a CCAAT/metabolismo , Proteínas Potenciadoras de Unión a CCAAT/genética
2.
Mol Ecol ; : e17257, 2023 Dec 27.
Artículo en Inglés | MEDLINE | ID: mdl-38149334

RESUMEN

The question of how local adaptation takes place remains a fundamental question in evolutionary biology. The variation of allele frequencies in genes under selection over environmental gradients remains mainly theoretical and its empirical assessment would help understanding how adaptation happens over environmental clines. To bring new insights to this issue we set up a broad framework which aimed to compare the adaptive trajectories over environmental clines in two domesticated mammal species co-distributed in diversified landscapes. We sequenced the genomes of 160 sheep and 161 goats extensively managed along environmental gradients, including temperature, rainfall, seasonality and altitude, to identify genes and biological processes shaping local adaptation. Allele frequencies at putatively adaptive loci were rarely found to vary gradually along environmental gradients, but rather displayed a discontinuous shift at the extremities of environmental clines. Of the 430 candidate adaptive genes identified, only 6 were orthologous between sheep and goats and those responded differently to environmental pressures, suggesting different putative mechanisms involved in local adaptation in these two closely related species. Interestingly, the genomes of the 2 species were impacted differently by the environment, genes related to signatures of selection were most related to altitude, slope and rainfall seasonality for sheep, and summer temperature and spring rainfall for goats. The diversity of candidate adaptive pathways may result from a high number of biological functions involved in the adaptations to multiple eco-climatic gradients, and a differential role of climatic drivers on the two species, despite their co-distribution along the same environmental gradients. This study describes empirical examples of clinal variation in putatively adaptive alleles with different patterns in allele frequency distributions over continuous environmental gradients, thus showing the diversity of genetic responses in adaptive landscapes and opening new horizons for understanding genomics of adaptation in mammalian species and beyond.

3.
bioRxiv ; 2023 Nov 06.
Artículo en Inglés | MEDLINE | ID: mdl-37986808

RESUMEN

Mapping the functional human genome and impact of genetic variants is often limited to European-descendent population samples. To aid in overcoming this limitation, we measured gene expression using RNA sequencing in lymphoblastoid cell lines (LCLs) from 599 individuals from six African populations to identify novel transcripts including those not represented in the hg38 reference genome. We used whole genomes from the 1000 Genomes Project and 164 Maasai individuals to identify 8,881 expression and 6,949 splicing quantitative trait loci (eQTLs/sQTLs), and 2,611 structural variants associated with gene expression (SV-eQTLs). We further profiled chromatin accessibility using ATAC-Seq in a subset of 100 representative individuals, to identity chromatin accessibility quantitative trait loci (caQTLs) and allele-specific chromatin accessibility, and provide predictions for the functional effect of 78.9 million variants on chromatin accessibility. Using this map of eQTLs and caQTLs we fine-mapped GWAS signals for a range of complex diseases. Combined, this work expands global functional genomic data to identify novel transcripts, functional elements and variants, understand population genetic history of molecular quantitative trait loci, and further resolve the genetic basis of multiple human traits and disease.

4.
Genome Biol ; 24(1): 223, 2023 10 05.
Artículo en Inglés | MEDLINE | ID: mdl-37798615

RESUMEN

Crop pangenomes made from individual cultivar assemblies promise easy access to conserved genes, but genome content variability and inconsistent identifiers hamper their exploration. To address this, we define pangenes, which summarize a species coding potential and link back to original annotations. The protocol get_pangenes performs whole genome alignments (WGA) to call syntenic gene models based on coordinate overlaps. A benchmark with small and large plant genomes shows that pangenes recapitulate phylogeny-based orthologies and produce complete soft-core gene sets. Moreover, WGAs support lift-over and help confirm gene presence-absence variation. Source code and documentation: https://github.com/Ensembl/plant-scripts .


Asunto(s)
Genoma de Planta , Programas Informáticos
5.
Nature ; 621(7978): 344-354, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37612512

RESUMEN

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.


Asunto(s)
Cromosomas Humanos Y , Genómica , Análisis de Secuencia de ADN , Humanos , Secuencia de Bases , Cromosomas Humanos Y/genética , ADN Satélite/genética , Variación Genética/genética , Genética de Población , Genómica/métodos , Genómica/normas , Heterocromatina/genética , Familia de Multigenes/genética , Estándares de Referencia , Duplicaciones Segmentarias en el Genoma/genética , Análisis de Secuencia de ADN/normas , Secuencias Repetidas en Tándem/genética , Telómero/genética
6.
Front Plant Sci ; 14: 1103035, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37521909

RESUMEN

The DNA Features pipeline is the analysis pipeline at EMBL-EBI that annotates repeat elements, including transposable elements. With Ensembl's goal to stay at the cutting edge of genome annotation, we proved that this pipeline needed an update. We then created a new analysis that allowed the Ensembl database to store the repeat classification from the PGSB repeat classification (Recat). This new dataset was then fetched using Perl scripts and used to prove that the pipeline modification induced a gain in sensitivity. Finally, we performed a comparative analysis of transposable element distribution in all plant species available, raising new questions about transposable elements in certain branches of the taxonomic tree.

7.
Nat Methods ; 20(8): 1159-1169, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37443337

RESUMEN

The detection of circular RNA molecules (circRNAs) is typically based on short-read RNA sequencing data processed using computational tools. Numerous such tools have been developed, but a systematic comparison with orthogonal validation is missing. Here, we set up a circRNA detection tool benchmarking study, in which 16 tools detected more than 315,000 unique circRNAs in three deeply sequenced human cell types. Next, 1,516 predicted circRNAs were validated using three orthogonal methods. Generally, tool-specific precision is high and similar (median of 98.8%, 96.3% and 95.5% for qPCR, RNase R and amplicon sequencing, respectively) whereas the sensitivity and number of predicted circRNAs (ranging from 1,372 to 58,032) are the most significant differentiators. Of note, precision values are lower when evaluating low-abundance circRNAs. We also show that the tools can be used complementarily to increase detection sensitivity. Finally, we offer recommendations for future circRNA detection and validation.


Asunto(s)
Benchmarking , ARN Circular , Humanos , ARN Circular/genética , ARN/genética , ARN/metabolismo , Análisis de Secuencia de ARN/métodos
8.
Nucleic Acids Res ; 51(D1): D1053-D1060, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36350643

RESUMEN

It is 24 years since the IPD-IMGT/HLA Database, http://www.ebi.ac.uk/ipd/imgt/hla/, was first released, providing the HLA community with a searchable repository of highly curated HLA sequences. The database now contains over 35 000 alleles of the human Major Histocompatibility Complex (MHC) named by the WHO Nomenclature Committee for Factors of the HLA System. This complex contains the most polymorphic genes in the human genome and is now considered hyperpolymorphic. The IPD-IMGT/HLA Database provides a stable and user-friendly repository for this information. Uptake of Next Generation Sequencing technology in recent years has driven an increase in the number of alleles and the length of sequences submitted. As the size of the database has grown the traditional methods of accessing and presenting this data have been challenged, in response, we have developed a suite of tools providing an enhanced user experience to our traditional web-based users while creating new programmatic access for our bioinformatics user base. This suite of tools is powered by the IPD-API, an Application Programming Interface (API), providing scalable and flexible access to the database. The IPD-API provides a stable platform for our future development allowing us to meet the future challenges of the HLA field and needs of the community.


Asunto(s)
Bases de Datos Genéticas , Antígenos HLA , Humanos , Antígenos HLA/genética , Antígenos de Histocompatibilidad/genética , Complejo Mayor de Histocompatibilidad/genética , Programas Informáticos , Alelos
9.
Sci Rep ; 12(1): 20791, 2022 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-36456625

RESUMEN

We searched a database of single-gene knockout (KO) mice produced by the International Mouse Phenotyping Consortium (IMPC) to identify candidate ciliopathy genes. We first screened for phenotypes in mouse lines with both ocular and renal or reproductive trait abnormalities. The STRING protein interaction tool was used to identify interactions between known cilia gene products and those encoded by the genes in individual knockout mouse strains in order to generate a list of "candidate ciliopathy genes." From this list, 32 genes encoded proteins predicted to interact with known ciliopathy proteins. Of these, 25 had no previously described roles in ciliary pathobiology. Histological and morphological evidence of phenotypes found in ciliopathies in knockout mouse lines are presented as examples (genes Abi2, Wdr62, Ap4e1, Dync1li1, and Prkab1). Phenotyping data and descriptions generated on IMPC mouse line are useful for mechanistic studies, target discovery, rare disease diagnosis, and preclinical therapeutic development trials. Here we demonstrate the effective use of the IMPC phenotype data to uncover genes with no previous role in ciliary biology, which may be clinically relevant for identification of novel disease genes implicated in ciliopathies.


Asunto(s)
Ciliopatías , Ratones , Animales , Ratones Noqueados , Ciliopatías/genética , Técnicas de Inactivación de Genes , Cilios/genética , Bases de Datos Factuales , Proteínas del Tejido Nervioso , Proteínas de Ciclo Celular
12.
Nature ; 604(7906): 437-446, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35444317

RESUMEN

The human reference genome is the most widely used resource in human genetics and is due for a major update. Its current structure is a linear composite of merged haplotypes from more than 20 people, with a single individual comprising most of the sequence. It contains biases and errors within a framework that does not represent global human genomic variation. A high-quality reference with global representation of common variants, including single-nucleotide variants, structural variants and functional elements, is needed. The Human Pangenome Reference Consortium aims to create a more sophisticated and complete human reference genome with a graph-based, telomere-to-telomere representation of global genomic diversity. Here we leverage innovations in technology, study design and global partnerships with the goal of constructing the highest-possible quality human pangenome reference. Our goal is to improve data representation and streamline analyses to enable routine assembly of complete diploid genomes. With attention to ethical frameworks, the human pangenome reference will contain a more accurate and diverse representation of global genomic variation, improve gene-disease association studies across populations, expand the scope of genomics research to the most repetitive and polymorphic regions of the genome, and serve as the ultimate genetic resource for future biomedical research and precision medicine.


Asunto(s)
Genoma Humano , Genómica , Genoma Humano/genética , Haplotipos/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Secuencia de ADN
13.
Nature ; 604(7905): 310-315, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35388217

RESUMEN

Comprehensive genome annotation is essential to understand the impact of clinically relevant variants. However, the absence of a standard for clinical reporting and browser display complicates the process of consistent interpretation and reporting. To address these challenges, Ensembl/GENCODE1 and RefSeq2 launched a joint initiative, the Matched Annotation from NCBI and EMBL-EBI (MANE) collaboration, to converge on human gene and transcript annotation and to jointly define a high-value set of transcripts and corresponding proteins. Here, we describe the MANE transcript sets for use as universal standards for variant reporting and browser display. The MANE Select set identifies a representative transcript for each human protein-coding gene, whereas the MANE Plus Clinical set provides additional transcripts at loci where the Select transcripts alone are not sufficient to report all currently known clinical variants. Each MANE transcript represents an exact match between the exonic sequences of an Ensembl/GENCODE transcript and its counterpart in RefSeq such that the identifiers can be used synonymously. We have now released MANE Select transcripts for 97% of human protein-coding genes, including all American College of Medical Genetics and Genomics Secondary Findings list v3.0 (ref. 3) genes. MANE transcripts are accessible from major genome browsers and key resources. Widespread adoption of these transcript sets will increase the consistency of reporting, facilitate the exchange of data regardless of the annotation source and help to streamline clinical interpretation.


Asunto(s)
Biología Computacional , Bases de Datos Genéticas , Genómica , Genoma , Humanos , Difusión de la Información , Anotación de Secuencia Molecular , National Library of Medicine (U.S.) , Estados Unidos
16.
Proc Natl Acad Sci U S A ; 119(4)2022 01 25.
Artículo en Inglés | MEDLINE | ID: mdl-35042802

RESUMEN

A global international initiative, such as the Earth BioGenome Project (EBP), requires both agreement and coordination on standards to ensure that the collective effort generates rapid progress toward its goals. To this end, the EBP initiated five technical standards committees comprising volunteer members from the global genomics scientific community: Sample Collection and Processing, Sequencing and Assembly, Annotation, Analysis, and IT and Informatics. The current versions of the resulting standards documents are available on the EBP website, with the recognition that opportunities, technologies, and challenges may improve or change in the future, requiring flexibility for the EBP to meet its goals. Here, we describe some highlights from the proposed standards, and areas where additional challenges will need to be met.


Asunto(s)
Secuencia de Bases/genética , Eucariontes/genética , Genómica/normas , Animales , Biodiversidad , Genómica/métodos , Humanos , Estándares de Referencia , Valores de Referencia , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ADN/normas
17.
Methods Mol Biol ; 2443: 27-55, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35037199

RESUMEN

Ensembl Plants ( http://plants.ensembl.org ) offers genome-scale information for plants, with four releases per year. As of release 47 (April 2020) it features 79 species and includes genome sequence, gene models, and functional annotation. Comparative analyses help reconstruct the evolutionary history of gene families, genomes, and components of polyploid genomes. Some species have gene expression baseline reports or variation across genotypes. While the data can be accessed through the Ensembl genome browser, here we review specifically how our plant genomes can be interrogated programmatically and the data downloaded in bulk. These access routes are generally consistent across Ensembl for other non-plant species, including plant pathogens, pests, and pollinators.


Asunto(s)
Bases de Datos Genéticas , Genómica , Genoma de Planta , Anotación de Secuencia Molecular , Plantas/genética , Programas Informáticos
18.
Hum Mutat ; 43(8): 986-997, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-34816521

RESUMEN

The Ensembl Variant Effect Predictor (VEP) is a freely available, open-source tool for the annotation and filtering of genomic variants. It predicts variant molecular consequences using the Ensembl/GENCODE or RefSeq gene sets. It also reports phenotype associations from databases such as ClinVar, allele frequencies from studies including gnomAD, and predictions of deleteriousness from tools such as Sorting Intolerant From Tolerant and Combined Annotation Dependent Depletion. Ensembl VEP includes filtering options to customize variant prioritization. It is well supported and updated roughly quarterly to incorporate the latest gene, variant, and phenotype association information. Ensembl VEP analysis can be performed using a highly configurable, extensible command-line tool, a Representational State Transfer application programming interface, and a user-friendly web interface. These access methods are designed to suit different levels of bioinformatics experience and meet different needs in terms of data size, visualization, and flexibility. In this tutorial, we will describe performing variant annotation using the Ensembl VEP web tool, which enables sophisticated analysis through a simple interface.


Asunto(s)
Genómica , Programas Informáticos , Biología Computacional , Bases de Datos Genéticas , Frecuencia de los Genes , Humanos , Anotación de Secuencia Molecular , Fenotipo
19.
Nucleic Acids Res ; 50(D1): D980-D987, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34791407

RESUMEN

The European Genome-phenome Archive (EGA - https://ega-archive.org/) is a resource for long term secure archiving of all types of potentially identifiable genetic, phenotypic, and clinical data resulting from biomedical research projects. Its mission is to foster hosted data reuse, enable reproducibility, and accelerate biomedical and translational research in line with the FAIR principles. Launched in 2008, the EGA has grown quickly, currently archiving over 4,500 studies from nearly one thousand institutions. The EGA operates a distributed data access model in which requests are made to the data controller, not to the EGA, therefore, the submitter keeps control on who has access to the data and under which conditions. Given the size and value of data hosted, the EGA is constantly improving its value chain, that is, how the EGA can contribute to enhancing the value of human health data by facilitating its submission, discovery, access, and distribution, as well as leading the design and implementation of standards and methods necessary to deliver the value chain. The EGA has become a key GA4GH Driver Project, leading multiple development efforts and implementing new standards and tools, and has been appointed as an ELIXIR Core Data Resource.


Asunto(s)
Confidencialidad/legislación & jurisprudencia , Genoma Humano , Difusión de la Información/métodos , Fenómica/organización & administración , Investigación Biomédica Traslacional/métodos , Conjuntos de Datos como Asunto , Genotipo , Historia del Siglo XX , Historia del Siglo XXI , Humanos , Difusión de la Información/ética , Metadatos/ética , Metadatos/estadística & datos numéricos , Fenómica/historia , Fenotipo
20.
Nucleic Acids Res ; 50(D1): D11-D19, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34850134

RESUMEN

The European Bioinformatics Institute (EMBL-EBI) maintains a comprehensive range of freely available and up-to-date molecular data resources, which includes over 40 resources covering every major data type in the life sciences. This year's service update for EMBL-EBI includes new resources, PGS Catalog and AlphaFold DB, and updates on existing resources, including the COVID-19 Data Platform, trRosetta and RoseTTAfold models introduced in Pfam and InterPro, and the launch of Genome Integrations with Function and Sequence by UniProt and Ensembl. Furthermore, we highlight projects through which EMBL-EBI has contributed to the development of community-driven data standards and guidelines, including the Recommended Metadata for Biological Images (REMBI), and the BioModels Reproducibility Scorecard. Training is one of EMBL-EBI's core missions and a key component of the provision of bioinformatics services to users: this year's update includes many of the improvements that have been developed to EMBL-EBI's online training offering.


Asunto(s)
Biología Computacional/educación , Biología Computacional/métodos , Bases de Datos Factuales , Academias e Institutos , Inteligencia Artificial , COVID-19 , Bases de Datos Factuales/economía , Bases de Datos Factuales/estadística & datos numéricos , Bases de Datos Farmacéuticas , Bases de Datos de Proteínas , Europa (Continente) , Genoma Humano , Humanos , Almacenamiento y Recuperación de la Información , ARN no Traducido/genética , SARS-CoV-2/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA