Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 182
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
2.
Genome Biol ; 21(1): 5, 2020 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-31910870

RESUMO

BACKGROUND: CTCF binding contributes to the establishment of a higher-order genome structure by demarcating the boundaries of large-scale topologically associating domains (TADs). However, despite the importance and conservation of TADs, the role of CTCF binding in their evolution and stability remains elusive. RESULTS: We carry out an experimental and computational study that exploits the natural genetic variation across five closely related species to assess how CTCF binding patterns stably fixed by evolution in each species contribute to the establishment and evolutionary dynamics of TAD boundaries. We perform CTCF ChIP-seq in multiple mouse species to create genome-wide binding profiles and associate them with TAD boundaries. Our analyses reveal that CTCF binding is maintained at TAD boundaries by a balance of selective constraints and dynamic evolutionary processes. Regardless of their conservation across species, CTCF binding sites at TAD boundaries are subject to stronger sequence and functional constraints compared to other CTCF sites. TAD boundaries frequently harbor dynamically evolving clusters containing both evolutionarily old and young CTCF sites as a result of the repeated acquisition of new species-specific sites close to conserved ones. The overwhelming majority of clustered CTCF sites colocalize with cohesin and are significantly closer to gene transcription start sites than nonclustered CTCF sites, suggesting that CTCF clusters particularly contribute to cohesin stabilization and transcriptional regulation. CONCLUSIONS: Dynamic conservation of CTCF site clusters is an apparently important feature of CTCF binding evolution that is critical to the functional stability of a higher-order chromatin structure.


Assuntos
Fator de Ligação a CCCTC/genética , Fator de Ligação a CCCTC/metabolismo , Cromatina/metabolismo , Evolução Molecular , Camundongos/genética , Animais , Genoma
3.
Nucleic Acids Res ; 48(D1): D948-D955, 2020 Jan 08.
Artigo em Inglês | MEDLINE | ID: mdl-31667505

RESUMO

The IPD-IMGT/HLA Database, http://www.ebi.ac.uk/ipd/imgt/hla/, currently contains over 25 000 allele sequence for 45 genes, which are located within the Major Histocompatibility Complex (MHC) of the human genome. This region is the most polymorphic region of the human genome, and the levels of polymorphism seen exceed most other genes. Some of the genes have several thousand variants and are now termed hyperpolymorphic, rather than just simply polymorphic. The IPD-IMGT/HLA Database has provided a stable, highly accessible, user-friendly repository for this information, providing the scientific and medical community access to the many variant sequences of this gene system, that are critical for the successful outcome of transplantation. The number of currently known variants, and dramatic increase in the number of new variants being identified has necessitated a dedicated resource with custom tools for curation and publication. The challenge for the database is to continue to provide a highly curated database of sequence variants, while supporting the increased number of submissions and complexity of sequences. In order to do this, traditional methods of accessing and presenting data will be challenged, and new methods will need to be utilized to keep pace with new discoveries.

4.
Nucleic Acids Res ; 48(D1): D941-D947, 2020 Jan 08.
Artigo em Inglês | MEDLINE | ID: mdl-31584097

RESUMO

To sustain and develop the largest fully open human genomic resources the International Genome Sample Resource (IGSR) (https://www.internationalgenome.org) was established. It is built on the foundation of the 1000 Genomes Project, which created the largest openly accessible catalogue of human genomic variation developed from samples spanning five continents. IGSR (i) maintains access to 1000 Genomes Project resources, (ii) updates 1000 Genomes Project resources to the GRCh38 human reference assembly, (iii) adds new data generated on 1000 Genomes Project cell lines, (iv) shares data from samples with a similarly open consent to increase the number of samples and populations represented in the resources and (v) provides support to users of these resources. Among recent updates are the release of variation calls from 1000 Genomes Project data calculated directly on GRCh38 and the addition of high coverage sequence data for the 2504 samples in the 1000 Genomes Project phase three panel. The data portal, which facilitates web-based exploration of the IGSR resources, has been updated to include samples which were not part of the 1000 Genomes Project and now presents a unified view of data and samples across almost 5000 samples from multiple studies. All data is fully open and publicly accessible.

5.
NPJ Genom Med ; 4: 31, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31814998

RESUMO

The developmental and epileptic encephalopathies (DEE) are a group of rare, severe neurodevelopmental disorders, where even the most thorough sequencing studies leave 60-65% of patients without a molecular diagnosis. Here, we explore the incompleteness of transcript models used for exome and genome analysis as one potential explanation for a lack of current diagnoses. Therefore, we have updated the GENCODE gene annotation for 191 epilepsy-associated genes, using human brain-derived transcriptomic libraries and other data to build 3,550 putative transcript models. Our annotations increase the transcriptional 'footprint' of these genes by over 674 kb. Using SCN1A as a case study, due to its close phenotype/genotype correlation with Dravet syndrome, we screened 122 people with Dravet syndrome or a similar phenotype with a panel of exon sequences representing eight established genes and identified two de novo SCN1A variants that now - through improved gene annotation - are ascribed to residing among our exons. These two (from 122 screened people, 1.6%) molecular diagnoses carry significant clinical implications. Furthermore, we identified a previously classified SCN1A intronic Dravet syndrome-associated variant that now lies within a deeply conserved exon. Our findings illustrate the potential gains of thorough gene annotation in improving diagnostic yields for genetic disorders.

6.
Sci Rep ; 9(1): 17716, 2019 Nov 27.
Artigo em Inglês | MEDLINE | ID: mdl-31776409

RESUMO

Atlantic herring (Clupea harengus) is one of the most abundant fish species in the world. It is an important economical and nutritional resource, as well as a crucial part of the North Atlantic ecosystem. In 2016, a draft herring genome assembly was published. Being a species of such importance, we sought to independently verify and potentially improve the herring genome assembly. We sequenced the herring genome generating paired-end, mate-pair, linked and long reads. Three assembly versions of the herring genome were generated based on a de novo assembly (A1), which was scaffolded using linked and long reads (A2) and then merged with the previously published assembly (A3). The resulting assemblies were compared using parameters describing the size, fragmentation, correctness, and completeness of the assemblies. Results showed that the A2 assembly was less fragmented, more complete and more correct than A1. A3 showed improvement in fragmentation and correctness compared with A2 and the published assembly but was slightly less complete than the published assembly. Thus, we here confirmed the previously published herring assembly, and made improvements by further scaffolding the assembly and removing low-quality sequences using linked and long reads and merging of assemblies.

7.
Genome Res ; 29(11): 1919-1928, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31649060

RESUMO

The Atlantic herring is a model species for exploring the genetic basis for ecological adaptation, due to its huge population size and extremely low genetic differentiation at selectively neutral loci. However, such studies have so far been hampered because of a highly fragmented genome assembly. Here, we deliver a chromosome-level genome assembly based on a hybrid approach combining a de novo Pacific Biosciences (PacBio) assembly with Hi-C-supported scaffolding. The assembly comprises 26 autosomes with sizes ranging from 12.4 to 33.1 Mb and a total size, in chromosomes, of 726 Mb, which has been corroborated by a high-resolution linkage map. A comparison between the herring genome assembly with other high-quality assemblies from bony fishes revealed few inter-chromosomal but frequent intra-chromosomal rearrangements. The improved assembly facilitates analysis of previously intractable large-scale structural variation, allowing, for example, the detection of a 7.8-Mb inversion on Chromosome 12 underlying ecological adaptation. This supergene shows strong genetic differentiation between populations. The chromosome-based assembly also markedly improves the interpretation of previously detected signals of selection, allowing us to reveal hundreds of independent loci associated with ecological adaptation.

9.
Front Genet ; 10: 709, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31475029

RESUMO

The advent of second-generation sequencing and its application to RNA sequencing have revolutionized the field of genomics by allowing quantification of gene expression, as well as the definition of transcription start/end sites, exons, splice sites and RNA editing sites. However, due to the sequencing of fragments of cDNAs, these methods have not given a reliable picture of complete RNA isoforms. Third-generation sequencing has filled this gap and allows end-to-end sequencing of entire RNA/cDNA molecules. This approach to transcriptomics has been a "niche" technology for a couple of years but now is becoming mainstream with many different applications. Here, we review the background and progress made to date in this rapidly growing field. We start by reviewing the progressive realization that alternative splicing is omnipresent. We then focus on long-noncoding RNA isoforms and the distinct combination patterns of exons in noncoding and coding genes. We consider the implications of the recent technologies of direct RNA sequencing and single-cell isoform RNA sequencing. Finally, we discuss the parameters that define the success of long-read RNA sequencing experiments and strategies commonly used to make the most of such data.

10.
Nat Rev Genet ; 20(11): 693-701, 2019 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-31455890

RESUMO

Human genomics is undergoing a step change from being a predominantly research-driven activity to one driven through health care as many countries in Europe now have nascent precision medicine programmes. To maximize the value of the genomic data generated, these data will need to be shared between institutions and across countries. In recognition of this challenge, 21 European countries recently signed a declaration to transnationally share data on at least 1 million human genomes by 2022. In this Roadmap, we identify the challenges of data sharing across borders and demonstrate that European research infrastructures are well-positioned to support the rapid implementation of widespread genomic data access.

11.
Mol Ecol Resour ; 19(6): 1497-1515, 2019 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-31359622

RESUMO

Whole genome sequences (WGS) greatly increase our ability to precisely infer population genetic parameters, demographic processes, and selection signatures. However, WGS may still be not affordable for a representative number of individuals/populations. In this context, our goal was to assess the efficiency of several SNP genotyping strategies by testing their ability to accurately estimate parameters describing neutral diversity and to detect signatures of selection. We analysed 110 WGS at 12× coverage for four different species, i.e., sheep, goats and their wild counterparts. From these data we generated 946 data sets corresponding to random panels of 1K to 5M variants, commercial SNP chips and exome capture, for sample sizes of five to 48 individuals. We also extracted low-coverage genome resequencing of 1×, 2× and 5× by randomly subsampling reads from the 12× resequencing data. Globally, 5K to 10K random variants were enough for an accurate estimation of genome diversity. Conversely, commercial panels and exome capture displayed strong ascertainment biases. Besides the characterization of neutral diversity, the detection of the signature of selection and the accurate estimation of linkage disequilibrium (LD) required high-density panels of at least 1M variants. Finally, genotype likelihoods increased the quality of variant calling from low coverage resequencing but proportions of incorrect genotypes remained substantial, especially for heterozygote sites. Whole genome resequencing coverage of at least 5× appeared to be necessary for accurate assessment of genomic variations. These results have implications for studies seeking to deploy low-density SNP collections or genome scans across genetically diverse populations/species showing similar genetic characteristics and patterns of LD decay for a wide variety of purposes.

13.
Nat Biotechnol ; 37(4): 480, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30894680

RESUMO

In the version of this article initially published, Lena Dolman's second affiliation was given as Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK. The correct second affiliation is Ontario Institute for Cancer Research, Toronto, Ontario, Canada. The error has been corrected in the HTML and PDF versions of the article.

15.
Cell Rep ; 26(4): 1059-1069.e6, 2019 01 22.
Artigo em Inglês | MEDLINE | ID: mdl-30673601

RESUMO

Global investigation of histone marks in acute myeloid leukemia (AML) remains limited. Analyses of 38 AML samples through integrated transcriptional and chromatin mark analysis exposes 2 major subtypes. One subtype is dominated by patients with NPM1 mutations or MLL-fusion genes, shows activation of the regulatory pathways involving HOX-family genes as targets, and displays high self-renewal capacity and stemness. The second subtype is enriched for RUNX1 or spliceosome mutations, suggesting potential interplay between the 2 aberrations, and mainly depends on IRF family regulators. Cellular consequences in prognosis predict a relatively worse outcome for the first subtype. Our integrated profiling establishes a rich resource to probe AML subtypes on the basis of expression and chromatin data.

16.
Nucleic Acids Res ; 47(D1): D1005-D1012, 2019 Jan 08.
Artigo em Inglês | MEDLINE | ID: mdl-30445434

RESUMO

The GWAS Catalog delivers a high-quality curated collection of all published genome-wide association studies enabling investigations to identify causal variants, understand disease mechanisms, and establish targets for novel therapies. The scope of the Catalog has also expanded to targeted and exome arrays with 1000 new associations added for these technologies. As of September 2018, the Catalog contains 5687 GWAS comprising 71673 variant-trait associations from 3567 publications. New content includes 284 full P-value summary statistics datasets for genome-wide and new targeted array studies, representing 6 × 109 individual variant-trait statistics. In the last 12 months, the Catalog's user interface was accessed by ∼90000 unique users who viewed >1 million pages. We have improved data access with the release of a new RESTful API to support high-throughput programmatic access, an improved web interface and a new summary statistics database. Summary statistics provision is supported by a new format proposed as a community standard for summary statistics data representation. This format was derived from our experience in standardizing heterogeneous submissions, mapping formats and in harmonizing content. Availability: https://www.ebi.ac.uk/gwas/.

17.
Genome Biol Evol ; 11(1): 220-231, 2019 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-30496401

RESUMO

The evolution of antifreeze glycoproteins has enabled notothenioid fish to flourish in the freezing waters of the Southern Ocean. Whereas successful at the biodiversity level to life in the cold, paradoxically at the cellular level these stenothermal animals have problems producing, folding, and degrading proteins at their ambient temperatures of -1.86 °C. In this first multi-species transcriptome comparison of the amino acid composition of notothenioid proteins with temperate teleost proteins, we show that, unlike psychrophilic bacteria, Antarctic fish provide little evidence for the mass alteration of protein amino acid composition to enhance protein folding and reduce protein denaturation in the cold. The exception was the significant overrepresentation of positions where leucine in temperate fish proteins was replaced by methionine in the notothenioid orthologues. We hypothesize that these extra methionines have been preferentially assimilated into the genome to act as redox sensors in the highly oxygenated waters of the Southern Ocean. This redox hypothesis is supported by analyses of notothenioids showing enrichment of genes associated with responses to environmental stress, particularly reactive oxygen species. So overall, although notothenioid fish show cold-associated problems with protein homeostasis, they may have modified only a selected number of biochemical pathways to work efficiently below 0 °C. Even a slight warming of the Southern Ocean might disrupt the critical functions of this handful of key pathways with considerable impacts for the functioning of this ecosystem in the future.


Assuntos
Aclimatação , Proteínas de Peixes/metabolismo , Congelamento , Metionina/metabolismo , Perciformes/metabolismo , Animais , Regiões Antárticas , Evolução Molecular , Proteínas de Peixes/genética , Perciformes/genética , Dobramento de Proteína , Transcriptoma
18.
Database (Oxford) ; 20182018 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-30576484

RESUMO

The major goal of sequencing humans and many other species is to understand the link between genomic variation, phenotype and disease. There are numerous valuable and well-established variation resources, but collating and making sense of non-homogeneous, often large-scale data sets from disparate sources remains a challenge. Without a systematic catalogue of these data and appropriate query and annotation tools, understanding the genome sequence of an individual and assessing their disease risk is impossible. In Ensembl, we substantially solve this problem: we develop methods to facilitate data integration and broad access; aggregate information in a consistent manner and make it available a variety of standard formats, both visually and programmatically; build analysis pipelines to compare variants to comprehensive genomic annotation sets; and make all tools and data publicly available.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Genômica/métodos , Anotação de Sequência Molecular/métodos , Algoritmos , Humanos , Análise de Sequência de DNA , Interface Usuário-Computador
19.
Nat Commun ; 9(1): 4128, 2018 10 08.
Artigo em Inglês | MEDLINE | ID: mdl-30297836

RESUMO

Selecting the most appropriate protein sequences is critical for precision drug design. Here we describe Haplosaurus, a bioinformatic tool for computation of protein haplotypes. Haplosaurus computes protein haplotypes from pre-existing chromosomally-phased genomic variation data. Integration into the Ensembl resource provides rapid and detailed protein haplotypes retrieval. Using Haplosaurus, we build a database of unique protein haplotypes from the 1000 Genomes dataset reflecting real-world protein sequence variability and their prevalence. For one in seven genes, their most common protein haplotype differs from the reference sequence and a similar number differs on their most common haplotype between human populations. Three case studies show how knowledge of the range of commonly encountered protein forms predicted in populations leads to insights into therapeutic efficacy. Haplosaurus and its associated database is expected to find broad applications in many disciplines using protein sequences and particularly impactful for therapeutics design.


Assuntos
Biologia Computacional/métodos , Desenho de Drogas , Haplótipos , Medicina de Precisão/métodos , Proteínas/genética , Projeto Auxiliado por Computador , Genoma Humano/genética , Genômica/métodos , Humanos , Proteoma/genética , Reprodutibilidade dos Testes , Software
20.
Nat Genet ; 50(11): 1574-1583, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30275530

RESUMO

We report full-length draft de novo genome assemblies for 16 widely used inbred mouse strains and find extensive strain-specific haplotype variation. We identify and characterize 2,567 regions on the current mouse reference genome exhibiting the greatest sequence diversity. These regions are enriched for genes involved in pathogen defence and immunity and exhibit enrichment of transposable elements and signatures of recent retrotransposition events. Combinations of alleles and genes unique to an individual strain are commonly observed at these loci, reflecting distinct strain phenotypes. We used these genomes to improve the mouse reference genome, resulting in the completion of 10 new gene structures. Also, 62 new coding loci were added to the reference genome annotation. These genomes identified a large, previously unannotated, gene (Efcab3-like) encoding 5,874 amino acids. Mutant Efcab3-like mice display anomalies in multiple brain regions, suggesting a possible role for this gene in the regulation of brain development.


Assuntos
Mapeamento Cromossômico , Loci Gênicos , Genoma , Haplótipos , Camundongos Endogâmicos/genética , Animais , Animais de Laboratório , Mapeamento Cromossômico/veterinária , Haplótipos/genética , Camundongos , Camundongos Endogâmicos BALB C/genética , Camundongos Endogâmicos C3H/genética , Camundongos Endogâmicos C57BL/genética , Camundongos Endogâmicos CBA/genética , Camundongos Endogâmicos DBA/genética , Camundongos Endogâmicos NOD/genética , Camundongos Endogâmicos/classificação , Anotação de Sequência Molecular , Filogenia , Polimorfismo de Nucleotídeo Único , Especificidade da Espécie
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA