Search | VHL Search Portal

1.

A joint NCBI and EMBL-EBI transcript set for clinical genomics and research.

Morales, Joannella; Pujar, Shashikant; Loveland, Jane E; Astashyn, Alex; Bennett, Ruth; Berry, Andrew; Cox, Eric; Davidson, Claire; Ermolaeva, Olga; Farrell, Catherine M; Fatima, Reham; Gil, Laurent; Goldfarb, Tamara; Gonzalez, Jose M; Haddad, Diana; Hardy, Matthew; Hunt, Toby; Jackson, John; Joardar, Vinita S; Kay, Michael; Kodali, Vamsi K; McGarvey, Kelly M; McMahon, Aoife; Mudge, Jonathan M; Murphy, Daniel N; Murphy, Michael R; Rajput, Bhanu; Rangwala, Sanjida H; Riddick, Lillian D; Thibaud-Nissen, Françoise; Threadgold, Glen; Vatsan, Anjana R; Wallin, Craig; Webb, David; Flicek, Paul; Birney, Ewan; Pruitt, Kim D; Frankish, Adam; Cunningham, Fiona; Murphy, Terence D.

Nature ; 604(7905): 310-315, 2022 04.

Article in English | MEDLINE | ID: mdl-35388217

ABSTRACT

Comprehensive genome annotation is essential to understand the impact of clinically relevant variants. However, the absence of a standard for clinical reporting and browser display complicates the process of consistent interpretation and reporting. To address these challenges, Ensembl/GENCODE1 and RefSeq2 launched a joint initiative, the Matched Annotation from NCBI and EMBL-EBI (MANE) collaboration, to converge on human gene and transcript annotation and to jointly define a high-value set of transcripts and corresponding proteins. Here, we describe the MANE transcript sets for use as universal standards for variant reporting and browser display. The MANE Select set identifies a representative transcript for each human protein-coding gene, whereas the MANE Plus Clinical set provides additional transcripts at loci where the Select transcripts alone are not sufficient to report all currently known clinical variants. Each MANE transcript represents an exact match between the exonic sequences of an Ensembl/GENCODE transcript and its counterpart in RefSeq such that the identifiers can be used synonymously. We have now released MANE Select transcripts for 97% of human protein-coding genes, including all American College of Medical Genetics and Genomics Secondary Findings list v3.0 (ref. 3) genes. MANE transcripts are accessible from major genome browsers and key resources. Widespread adoption of these transcript sets will increase the consistency of reporting, facilitate the exchange of data regardless of the annotation source and help to streamline clinical interpretation.

Subject(s)

Computational Biology , Databases, Genetic , Genomics , Genome , Humans , Information Dissemination , Molecular Sequence Annotation , National Library of Medicine (U.S.) , United States

2.

A transcription factor collective defines cardiac cell fate and reflects lineage history.

Junion, Guillaume; Spivakov, Mikhail; Girardot, Charles; Braun, Martina; Gustafson, E Hilary; Birney, Ewan; Furlong, Eileen E M.

Cell ; 148(3): 473-86, 2012 Feb 03.

Article in English | MEDLINE | ID: mdl-22304916

ABSTRACT

Cell fate decisions are driven through the integration of inductive signals and tissue-specific transcription factors (TFs), although the details on how this information converges in cis remain unclear. Here, we demonstrate that the five genetic components essential for cardiac specification in Drosophila, including the effectors of Wg and Dpp signaling, act as a collective unit to cooperatively regulate heart enhancer activity, both in vivo and in vitro. Their combinatorial binding does not require any specific motif orientation or spacing, suggesting an alternative mode of enhancer function whereby cooperative activity occurs with extensive motif flexibility. A fraction of enhancers co-occupied by cardiogenic TFs had unexpected activity in the neighboring visceral mesoderm but could be rendered active in heart through single-site mutations. Given that cardiac and visceral cells are both derived from the dorsal mesoderm, this "dormant" TF binding signature may represent a molecular footprint of these cells' developmental lineage.

Subject(s)

Drosophila melanogaster/cytology , Gene Regulatory Networks , Animals , Drosophila Proteins/metabolism , Drosophila melanogaster/embryology , Drosophila melanogaster/metabolism , Enhancer Elements, Genetic , Gene Expression Regulation, Developmental , Mesoderm/cytology , Mesoderm/metabolism , Myocardium/cytology , Myocardium/metabolism , Transcription Factors/metabolism

3.

Genomic reconstruction of the SARS-CoV-2 epidemic in England.

Vöhringer, Harald S; Sanderson, Theo; Sinnott, Matthew; De Maio, Nicola; Nguyen, Thuy; Goater, Richard; Schwach, Frank; Harrison, Ian; Hellewell, Joel; Ariani, Cristina V; Gonçalves, Sonia; Jackson, David K; Johnston, Ian; Jung, Alexander W; Saint, Callum; Sillitoe, John; Suciu, Maria; Goldman, Nick; Panovska-Griffiths, Jasmina; Birney, Ewan; Volz, Erik; Funk, Sebastian; Kwiatkowski, Dominic; Chand, Meera; Martincorena, Inigo; Barrett, Jeffrey C; Gerstung, Moritz.

Nature ; 600(7889): 506-511, 2021 12.

Article in English | MEDLINE | ID: mdl-34649268

ABSTRACT

The evolution of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus leads to new variants that warrant timely epidemiological characterization. Here we use the dense genomic surveillance data generated by the COVID-19 Genomics UK Consortium to reconstruct the dynamics of 71 different lineages in each of 315 English local authorities between September 2020 and June 2021. This analysis reveals a series of subepidemics that peaked in early autumn 2020, followed by a jump in transmissibility of the B.1.1.7/Alpha lineage. The Alpha variant grew when other lineages declined during the second national lockdown and regionally tiered restrictions between November and December 2020. A third more stringent national lockdown suppressed the Alpha variant and eliminated nearly all other lineages in early 2021. Yet a series of variants (most of which contained the spike E484K mutation) defied these trends and persisted at moderately increasing proportions. However, by accounting for sustained introductions, we found that the transmissibility of these variants is unlikely to have exceeded the transmissibility of the Alpha variant. Finally, B.1.617.2/Delta was repeatedly introduced in England and grew rapidly in early summer 2021, constituting approximately 98% of sampled SARS-CoV-2 genomes on 26 June 2021.

Subject(s)

COVID-19/epidemiology , COVID-19/virology , Genome, Viral/genetics , Genomics , SARS-CoV-2/genetics , Amino Acid Substitution , COVID-19/transmission , England/epidemiology , Epidemiological Monitoring , Humans , Molecular Epidemiology , Mutation , Quarantine/statistics & numerical data , SARS-CoV-2/classification , Spatio-Temporal Analysis , Spike Glycoprotein, Coronavirus/genetics

4.

Highly accurate protein structure prediction for the human proteome.

Tunyasuvunakool, Kathryn; Adler, Jonas; Wu, Zachary; Green, Tim; Zielinski, Michal; Zídek, Augustin; Bridgland, Alex; Cowie, Andrew; Meyer, Clemens; Laydon, Agata; Velankar, Sameer; Kleywegt, Gerard J; Bateman, Alex; Evans, Richard; Pritzel, Alexander; Figurnov, Michael; Ronneberger, Olaf; Bates, Russ; Kohl, Simon A A; Potapenko, Anna; Ballard, Andrew J; Romera-Paredes, Bernardino; Nikolov, Stanislav; Jain, Rishub; Clancy, Ellen; Reiman, David; Petersen, Stig; Senior, Andrew W; Kavukcuoglu, Koray; Birney, Ewan; Kohli, Pushmeet; Jumper, John; Hassabis, Demis.

Nature ; 596(7873): 590-596, 2021 08.

Article in English | MEDLINE | ID: mdl-34293799

ABSTRACT

Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure1. Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold2, at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.

Subject(s)

Computational Biology/standards , Deep Learning/standards , Models, Molecular , Protein Conformation , Proteome/chemistry , Datasets as Topic/standards , Diacylglycerol O-Acyltransferase/chemistry , Glucose-6-Phosphatase/chemistry , Humans , Membrane Proteins/chemistry , Protein Folding , Reproducibility of Results

5.

Benchmarking of computational methods for m6A profiling with Nanopore direct RNA sequencing.

Maestri, Simone; Furlan, Mattia; Mulroney, Logan; Coscujuela Tarrero, Lucia; Ugolini, Camilla; Dalla Pozza, Fabio; Leonardi, Tommaso; Birney, Ewan; Nicassio, Francesco; Pelizzola, Mattia.

Brief Bioinform ; 25(2)2024 Jan 22.

Article in English | MEDLINE | ID: mdl-38279646

ABSTRACT

N6-methyladenosine (m6A) is the most abundant internal eukaryotic mRNA modification, and is involved in the regulation of various biological processes. Direct Nanopore sequencing of native RNA (dRNA-seq) emerged as a leading approach for its identification. Several software were published for m6A detection and there is a strong need for independent studies benchmarking their performance on data from different species, and against various reference datasets. Moreover, a computational workflow is needed to streamline the execution of tools whose installation and execution remains complicated. We developed NanOlympicsMod, a Nextflow pipeline exploiting containerized technology for comparing 14 tools for m6A detection on dRNA-seq data. NanOlympicsMod was tested on dRNA-seq data generated from in vitro (un)modified synthetic oligos. The m6A hits returned by each tool were compared to the m6A position known by design of the oligos. In addition, NanOlympicsMod was used on dRNA-seq datasets from wild-type and m6A-depleted yeast, mouse and human, and each tool's hits were compared to reference m6A sets generated by leading orthogonal methods. The performance of the tools markedly differed across datasets, and methods adopting different approaches showed different preferences in terms of precision and recall. Changing the stringency cut-offs allowed for tuning the precision-recall trade-off towards user preferences. Finally, we determined that precision and recall of tools are markedly influenced by sequencing depth, and that additional sequencing would likely reveal additional m6A sites. Thanks to the possibility of including novel tools, NanOlympicsMod will streamline the benchmarking of m6A detection tools on dRNA-seq data, improving future RNA modification characterization.

Subject(s)

Adenine/analogs & derivatives , Nanopore Sequencing , Nanopores , Humans , Animals , Mice , RNA/genetics , Benchmarking , Sequence Analysis, RNA/methods

6.

Genetic and functional insights into the fractal structure of the heart.

Meyer, Hannah V; Dawes, Timothy J W; Serrani, Marta; Bai, Wenjia; Tokarczuk, Pawel; Cai, Jiashen; de Marvao, Antonio; Henry, Albert; Lumbers, R Thomas; Gierten, Jakob; Thumberger, Thomas; Wittbrodt, Joachim; Ware, James S; Rueckert, Daniel; Matthews, Paul M; Prasad, Sanjay K; Costantino, Maria L; Cook, Stuart A; Birney, Ewan; O'Regan, Declan P.

Nature ; 584(7822): 589-594, 2020 08.

Article in English | MEDLINE | ID: mdl-32814899

ABSTRACT

The inner surfaces of the human heart are covered by a complex network of muscular strands that is thought to be a remnant of embryonic development1,2. The function of these trabeculae in adults and their genetic architecture are unknown. Here we performed a genome-wide association study to investigate image-derived phenotypes of trabeculae using the fractal analysis of trabecular morphology in 18,096 participants of the UK Biobank. We identified 16 significant loci that contain genes associated with haemodynamic phenotypes and regulation of cytoskeletal arborization3,4. Using biomechanical simulations and observational data from human participants, we demonstrate that trabecular morphology is an important determinant of cardiac performance. Through genetic association studies with cardiac disease phenotypes and Mendelian randomization, we find a causal relationship between trabecular morphology and risk of cardiovascular disease. These findings suggest a previously unknown role for myocardial trabeculae in the function of the adult heart, identify conserved pathways that regulate structural complexity and reveal the influence of the myocardial trabeculae on susceptibility to cardiovascular disease.

Subject(s)

Cardiovascular Diseases/genetics , Fractals , Genetic Predisposition to Disease , Heart/anatomy & histology , Heart/physiology , Myocardium/metabolism , Adult , Aged , Animals , Cardiovascular Diseases/physiopathology , Cytoskeleton/genetics , Cytoskeleton/physiology , Gene Knockout Techniques , Genetic Loci/genetics , Genome-Wide Association Study , Heart/embryology , Hemodynamics , Humans , Middle Aged , Myocardium/cytology , Oryzias/embryology , Oryzias/genetics , Phenotype

7.

AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences.

Varadi, Mihaly; Bertoni, Damian; Magana, Paulyna; Paramval, Urmila; Pidruchna, Ivanna; Radhakrishnan, Malarvizhi; Tsenkov, Maxim; Nair, Sreenath; Mirdita, Milot; Yeo, Jingi; Kovalevskiy, Oleg; Tunyasuvunakool, Kathryn; Laydon, Agata; Zídek, Augustin; Tomlinson, Hamish; Hariharan, Dhavanthi; Abrahamson, Josh; Green, Tim; Jumper, John; Birney, Ewan; Steinegger, Martin; Hassabis, Demis; Velankar, Sameer.

Nucleic Acids Res ; 52(D1): D368-D375, 2024 Jan 05.

Article in English | MEDLINE | ID: mdl-37933859

ABSTRACT

The AlphaFold Database Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) has significantly impacted structural biology by amassing over 214 million predicted protein structures, expanding from the initial 300k structures released in 2021. Enabled by the groundbreaking AlphaFold2 artificial intelligence (AI) system, the predictions archived in AlphaFold DB have been integrated into primary data resources such as PDB, UniProt, Ensembl, InterPro and MobiDB. Our manuscript details subsequent enhancements in data archiving, covering successive releases encompassing model organisms, global health proteomes, Swiss-Prot integration, and a host of curated protein datasets. We detail the data access mechanisms of AlphaFold DB, from direct file access via FTP to advanced queries using Google Cloud Public Datasets and the programmatic access endpoints of the database. We also discuss the improvements and services added since its initial release, including enhancements to the Predicted Aligned Error viewer, customisation options for the 3D viewer, and improvements in the search engine of AlphaFold DB.

The AlphaFold Protein Structure Database (AlphaFold DB) is a massive digital library of predicted protein structures, with over 214 million entries, marking a 500-times expansion in size since its initial release in 2021. The structures are predicted using Google DeepMind's AlphaFold 2 artificial intelligence (AI) system. Our new report highlights the latest updates we have made to this database. We have added more data on specific organisms and proteins related to global health and expanded to cover almost the complete UniProt database, a primary data resource of protein sequences. We also made it easier for our users to access the data by directly downloading files or using advanced cloud-based tools. Finally, we have also improved how users view and search through these protein structures, making the user experience smoother and more informative. In short, AlphaFold DB has been growing rapidly and has become more user-friendly and robust to support the broader scientific community.

Subject(s)

Artificial Intelligence , Protein Structure, Secondary , Proteome , Amino Acid Sequence , Databases, Protein , Search Engine , Proteins/chemistry

8.

Sub-cellular level resolution of common genetic variation in the photoreceptor layer identifies continuum between rare disease and common variation.

Currant, Hannah; Fitzgerald, Tomas W; Patel, Praveen J; Khawaja, Anthony P; Webster, Andrew R; Mahroo, Omar A; Birney, Ewan.

PLoS Genet ; 19(2): e1010587, 2023 02.

Article in English | MEDLINE | ID: mdl-36848389

ABSTRACT

Photoreceptor cells (PRCs) are the light-detecting cells of the retina. Such cells can be non-invasively imaged using optical coherence tomography (OCT) which is used in clinical settings to diagnose and monitor ocular diseases. Here we present the largest genome-wide association study of PRC morphology to date utilising quantitative phenotypes extracted from OCT images within the UK Biobank. We discovered 111 loci associated with the thickness of one or more of the PRC layers, many of which had prior associations to ocular phenotypes and pathologies, and 27 with no prior associations. We further identified 10 genes associated with PRC thickness through gene burden testing using exome data. In both cases there was a significant enrichment for genes involved in rare eye pathologies, in particular retinitis pigmentosa. There was evidence for an interaction effect between common genetic variants, VSX2 involved in eye development and PRPH2 known to be involved in retinal dystrophies. We further identified a number of genetic variants with a differential effect across the macular spatial field. Our results suggest a continuum between common and rare variation which impacts retinal structure, sometimes leading to disease.

Subject(s)

Genome-Wide Association Study , Rare Diseases , Humans , Rare Diseases/pathology , Retina/pathology , Photoreceptor Cells , Genetic Variation

9.

Leveraging European infrastructures to access 1 million human genomes by 2022.

Saunders, Gary; Baudis, Michael; Becker, Regina; Beltran, Sergi; Béroud, Christophe; Birney, Ewan; Brooksbank, Cath; Brunak, Søren; Van den Bulcke, Marc; Drysdale, Rachel; Capella-Gutierrez, Salvador; Flicek, Paul; Florindi, Francesco; Goodhand, Peter; Gut, Ivo; Heringa, Jaap; Holub, Petr; Hooyberghs, Jef; Juty, Nick; Keane, Thomas M; Korbel, Jan O; Lappalainen, Ilkka; Leskosek, Brane; Matthijs, Gert; Mayrhofer, Michaela Th; Metspalu, Andres; Navarro, Arcadi; Newhouse, Steven; Nyrönen, Tommi; Page, Angela; Persson, Bengt; Palotie, Aarno; Parkinson, Helen; Rambla, Jordi; Salgado, David; Steinfelder, Erik; Swertz, Morris A; Valencia, Alfonso; Varma, Susheel; Blomberg, Niklas; Scollen, Serena.

Nat Rev Genet ; 20(11): 693-701, 2019 11.

Article in English | MEDLINE | ID: mdl-31455890

ABSTRACT

Human genomics is undergoing a step change from being a predominantly research-driven activity to one driven through health care as many countries in Europe now have nascent precision medicine programmes. To maximize the value of the genomic data generated, these data will need to be shared between institutions and across countries. In recognition of this challenge, 21 European countries recently signed a declaration to transnationally share data on at least 1 million human genomes by 2022. In this Roadmap, we identify the challenges of data sharing across borders and demonstrate that European research infrastructures are well-positioned to support the rapid implementation of widespread genomic data access.

Subject(s)

Biomedical Research , Genome, Human , Human Genome Project , Europe , Humans

10.

Author Correction: Leveraging European infrastructures to access 1 million human genomes by 2022.

Saunders, Gary; Baudis, Michael; Becker, Regina; Beltran, Sergi; Béroud, Christophe; Birney, Ewan; Brooksbank, Cath; Brunak, Søren; Van den Bulcke, Marc; Drysdale, Rachel; Capella-Gutierrez, Salvador; Flicek, Paul; Florindi, Francesco; Goodhand, Peter; Gut, Ivo; Heringa, Jaap; Holub, Petr; Hooyberghs, Jef; Juty, Nick; Keane, Thomas M; Korbel, Jan O; Lappalainen, Ilkka; Leskosek, Brane; Matthijs, Gert; Mayrhofer, Michaela Th; Metspalu, Andres; Navarro, Arcadi; Newhouse, Steven; Nyrönen, Tommi; Page, Angela; Persson, Bengt; Palotie, Aarno; Parkinson, Helen; Rambla, Jordi; Salgado, David; Steinfelder, Erik; Swertz, Morris A; Valencia, Alfonso; Varma, Susheel; Blomberg, Niklas; Scollen, Serena.

Nat Rev Genet ; 20(11): 702, 2019 Nov.

Article in English | MEDLINE | ID: mdl-31520075

ABSTRACT

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

11.

A multilayered approach to the analysis of genetic data from individuals with suspected albinism.

Sergouniotis, Panagiotis I; Michaud, Vincent; Lasseaux, Eulalie; Campbell, Christopher; Plaisant, Claudio; Javerzat, Sophie; Birney, Ewan; Ramsden, Simon C; Black, Graeme C; Arveiler, Benoit.

J Med Genet ; 60(12): 1245-1249, 2023 Nov 27.

Article in English | MEDLINE | ID: mdl-37460203

ABSTRACT

Albinism is a clinically and genetically heterogeneous group of conditions characterised by visual abnormalities and variable degrees of hypopigmentation. Multiple studies have demonstrated the clinical utility of genetic investigations in individuals with suspected albinism. Despite this, the variation in the provision of genetic testing for albinism remains significant. One key issue is the lack of a standardised approach to the analysis of genomic data from affected individuals. For example, there is variation in how different clinical genetic laboratories approach genotypes that involve incompletely penetrant alleles, including the common, 'hypomorphic' TYR c.1205G>A (p.Arg402Gln) [rs1126809] variant. Here, we discuss the value of genetic testing as a frontline diagnostic tool in individuals with features of albinism and propose a practice pattern for the analysis of genomic data from affected families.

Subject(s)

Albinism, Oculocutaneous , Albinism , Humans , Albinism/genetics , Albinism/diagnosis , Albinism, Oculocutaneous/diagnosis , Albinism, Oculocutaneous/genetics , Genetic Testing , Genotype , Alleles

12.

The European Bioinformatics Institute (EMBL-EBI) in 2021.

Cantelli, Gaia; Bateman, Alex; Brooksbank, Cath; Petrov, Anton I; Malik-Sheriff, Rahuman S; Ide-Smith, Michele; Hermjakob, Henning; Flicek, Paul; Apweiler, Rolf; Birney, Ewan; McEntyre, Johanna.

Nucleic Acids Res ; 50(D1): D11-D19, 2022 01 07.

Article in English | MEDLINE | ID: mdl-34850134

ABSTRACT

The European Bioinformatics Institute (EMBL-EBI) maintains a comprehensive range of freely available and up-to-date molecular data resources, which includes over 40 resources covering every major data type in the life sciences. This year's service update for EMBL-EBI includes new resources, PGS Catalog and AlphaFold DB, and updates on existing resources, including the COVID-19 Data Platform, trRosetta and RoseTTAfold models introduced in Pfam and InterPro, and the launch of Genome Integrations with Function and Sequence by UniProt and Ensembl. Furthermore, we highlight projects through which EMBL-EBI has contributed to the development of community-driven data standards and guidelines, including the Recommended Metadata for Biological Images (REMBI), and the BioModels Reproducibility Scorecard. Training is one of EMBL-EBI's core missions and a key component of the provision of bioinformatics services to users: this year's update includes many of the improvements that have been developed to EMBL-EBI's online training offering.

Subject(s)

Computational Biology/education , Computational Biology/methods , Databases, Factual , Academies and Institutes , Artificial Intelligence , COVID-19 , Databases, Factual/economics , Databases, Factual/statistics & numerical data , Databases, Pharmaceutical , Databases, Protein , Europe , Genome, Human , Humans , Information Storage and Retrieval , RNA, Untranslated/genetics , SARS-CoV-2/genetics

13.

Nanopore ReCappable sequencing maps SARS-CoV-2 5' capping sites and provides new insights into the structure of sgRNAs.

Ugolini, Camilla; Mulroney, Logan; Leger, Adrien; Castelli, Matteo; Criscuolo, Elena; Williamson, Maia Kavanagh; Davidson, Andrew D; Almuqrin, Abdulaziz; Giambruno, Roberto; Jain, Miten; Frigè, Gianmaria; Olsen, Hugh; Tzertzinis, George; Schildkraut, Ira; Wulf, Madalee G; Corrêa, Ivan R; Ettwiller, Laurence; Clementi, Nicola; Clementi, Massimo; Mancini, Nicasio; Birney, Ewan; Akeson, Mark; Nicassio, Francesco; Matthews, David A; Leonardi, Tommaso.

Nucleic Acids Res ; 50(6): 3475-3489, 2022 04 08.

Article in English | MEDLINE | ID: mdl-35244721

ABSTRACT

The SARS-CoV-2 virus has a complex transcriptome characterised by multiple, nested subgenomic RNAsused to express structural and accessory proteins. Long-read sequencing technologies such as nanopore direct RNA sequencing can recover full-length transcripts, greatly simplifying the assembly of structurally complex RNAs. However, these techniques do not detect the 5' cap, thus preventing reliable identification and quantification of full-length, coding transcript models. Here we used Nanopore ReCappable Sequencing (NRCeq), a new technique that can identify capped full-length RNAs, to assemble a complete annotation of SARS-CoV-2 sgRNAs and annotate the location of capping sites across the viral genome. We obtained robust estimates of sgRNA expression across cell lines and viral isolates and identified novel canonical and non-canonical sgRNAs, including one that uses a previously un-annotated leader-to-body junction site. The data generated in this work constitute a useful resource for the scientific community and provide important insights into the mechanisms that regulate the transcription of SARS-CoV-2 sgRNAs.

Subject(s)

COVID-19 , Nanopores , RNA, Guide, Kinetoplastida/chemistry , COVID-19/genetics , Genome, Viral/genetics , Humans , RNA Caps , RNA, Viral/genetics , RNA, Viral/metabolism , SARS-CoV-2/genetics

14.

AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models.

Varadi, Mihaly; Anyango, Stephen; Deshpande, Mandar; Nair, Sreenath; Natassia, Cindy; Yordanova, Galabina; Yuan, David; Stroe, Oana; Wood, Gemma; Laydon, Agata; Zídek, Augustin; Green, Tim; Tunyasuvunakool, Kathryn; Petersen, Stig; Jumper, John; Clancy, Ellen; Green, Richard; Vora, Ankur; Lutfi, Mira; Figurnov, Michael; Cowie, Andrew; Hobbs, Nicole; Kohli, Pushmeet; Kleywegt, Gerard; Birney, Ewan; Hassabis, Demis; Velankar, Sameer.

Nucleic Acids Res ; 50(D1): D439-D444, 2022 01 07.

Article in English | MEDLINE | ID: mdl-34791371

ABSTRACT

The AlphaFold Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) is an openly accessible, extensive database of high-accuracy protein-structure predictions. Powered by AlphaFold v2.0 of DeepMind, it has enabled an unprecedented expansion of the structural coverage of the known protein-sequence space. AlphaFold DB provides programmatic access to and interactive visualization of predicted atomic coordinates, per-residue and pairwise model-confidence estimates and predicted aligned errors. The initial release of AlphaFold DB contains over 360,000 predicted structures across 21 model-organism proteomes, which will soon be expanded to cover most of the (over 100 million) representative sequences from the UniRef90 data set.

Subject(s)

Databases, Protein , Protein Folding , Proteins/chemistry , Software , Amino Acid Sequence , Animals , Bacteria/genetics , Bacteria/metabolism , Datasets as Topic , Dictyostelium/genetics , Dictyostelium/metabolism , Fungi/genetics , Fungi/metabolism , Humans , Internet , Models, Molecular , Plants/genetics , Plants/metabolism , Protein Conformation, alpha-Helical , Protein Conformation, beta-Strand , Proteins/genetics , Proteins/metabolism , Trypanosoma cruzi/genetics , Trypanosoma cruzi/metabolism

15.

Correction: Genetic variation affects morphological retinal phenotypes extracted from UK Biobank optical coherence tomography images.

Currant, Hannah; Hysi, Pirro; Fitzgerald, Tomas W; Gharahkhani, Puya; Bonnemaijer, Pieter W M; Senabouth, Anne; Hewitt, Alex W; Atan, Denize; Aung, Tin; Charng, Jason; Choquet, Hélène; Craig, Jamie; Khaw, Peng T; Klaver, Caroline C W; Kubo, Michiaki; Ong, Jue-Sheng; Pasquale, Louis R; Reisman, Charles A; Daniszewski, Maciej; Powell, Joseph E; Pébay, Alice; Simcoe, Mark J; Thiadens, Alberta A H J; van Duijn, Cornelia M; Yazar, Seyhan; Jorgenson, Eric; MacGregor, Stuart; Hammond, Chris J; Mackey, David A; Wiggs, Janey L; Foster, Paul J; Patel, Praveen J; Birney, Ewan; Khawaja, Anthony P.

PLoS Genet ; 17(10): e1009858, 2021 Oct.

Article in English | MEDLINE | ID: mdl-34662343

ABSTRACT

[This corrects the article DOI: 10.1371/journal.pgen.1009497.].

16.

Genetic variation affects morphological retinal phenotypes extracted from UK Biobank optical coherence tomography images.

Currant, Hannah; Hysi, Pirro; Fitzgerald, Tomas W; Gharahkhani, Puya; Bonnemaijer, Pieter W M; Senabouth, Anne; Hewitt, Alex W; Atan, Denize; Aung, Tin; Charng, Jason; Choquet, Hélène; Craig, Jamie; Khaw, Peng T; Klaver, Caroline C W; Kubo, Michiaki; Ong, Jue-Sheng; Pasquale, Louis R; Reisman, Charles A; Daniszewski, Maciej; Powell, Joseph E; Pébay, Alice; Simcoe, Mark J; Thiadens, Alberta A H J; van Duijn, Cornelia M; Yazar, Seyhan; Jorgenson, Eric; MacGregor, Stuart; Hammond, Chris J; Mackey, David A; Wiggs, Janey L; Foster, Paul J; Patel, Praveen J; Birney, Ewan; Khawaja, Anthony P.

PLoS Genet ; 17(5): e1009497, 2021 05.

Article in English | MEDLINE | ID: mdl-33979322

ABSTRACT

Optical Coherence Tomography (OCT) enables non-invasive imaging of the retina and is used to diagnose and manage ophthalmic diseases including glaucoma. We present the first large-scale genome-wide association study of inner retinal morphology using phenotypes derived from OCT images of 31,434 UK Biobank participants. We identify 46 loci associated with thickness of the retinal nerve fibre layer or ganglion cell inner plexiform layer. Only one of these loci has been associated with glaucoma, and despite its clear role as a biomarker for the disease, Mendelian randomisation does not support inner retinal thickness being on the same genetic causal pathway as glaucoma. We extracted overall retinal thickness at the fovea, representative of foveal hypoplasia, with which three of the 46 SNPs were associated. We additionally associate these three loci with visual acuity. In contrast to the Mendelian causes of severe foveal hypoplasia, our results suggest a spectrum of foveal hypoplasia, in part genetically determined, with consequences on visual function.

Subject(s)

Biological Specimen Banks , Genetic Variation , Phenotype , Retina/metabolism , Tomography, Optical Coherence , Female , Genotype , Glaucoma/genetics , Glaucoma/pathology , Hair Color/genetics , Humans , Male , Middle Aged , Polymorphism, Single Nucleotide/genetics , Quality Control , Retina/pathology , United Kingdom , Vision Disorders , Visual Acuity/genetics

17.

The International Human Genome Project.

Birney, Ewan.

Hum Mol Genet ; 30(R2): R161-R163, 2021 10 01.

Article in English | MEDLINE | ID: mdl-34264324

ABSTRACT

The human genome project was conceived and executed as an international project, due to both pragmatic and principled reasons. This internationality has served the project well, with the resulting human genome being freely available for all researchers in all countries. Over time the reference human genome will likely have to evolve to a graph genome, and tap into more diverse sequences worldwide. A similar international mindset underpins data analysis for the interpretation of the human genome from basic to clinical research.

Subject(s)

Genome, Human , Human Genome Project , Animals , Databases, Genetic , Genetics, Medical/trends , Humans , Internationality , Research/trends

18.

Genetic variants regulating expression levels and isoform diversity during embryogenesis.

Cannavò, Enrico; Koelling, Nils; Harnett, Dermot; Garfield, David; Casale, Francesco P; Ciglar, Lucia; Gustafson, Hilary E; Viales, Rebecca R; Marco-Ferreres, Raquel; Degner, Jacob F; Zhao, Bingqing; Stegle, Oliver; Birney, Ewan; Furlong, Eileen E M.

Nature ; 541(7637): 402-406, 2017 01 19.

Article in English | MEDLINE | ID: mdl-28024300

ABSTRACT

Embryonic development is driven by tightly regulated patterns of gene expression, despite extensive genetic variation among individuals. Studies of expression quantitative trait loci (eQTL) indicate that genetic variation frequently alters gene expression in cell-culture models and differentiated tissues. However, the extent and types of genetic variation impacting embryonic gene expression, and their interactions with developmental programs, remain largely unknown. Here we assessed the effect of genetic variation on transcriptional (expression levels) and post-transcriptional (3' RNA processing) regulation across multiple stages of metazoan development, using 80 inbred Drosophila wild isolates, identifying thousands of developmental-stage-specific and shared QTL. Given the small blocks of linkage disequilibrium in Drosophila, we obtain near base-pair resolution, resolving causal mutations in developmental enhancers, validated transcription-factor-binding sites and RNA motifs. This fine-grain mapping uncovered extensive allelic interactions within enhancers that have opposite effects, thereby buffering their impact on enhancer activity. QTL affecting 3' RNA processing identify new functional motifs leading to transcript isoform diversity and changes in the lengths of 3' untranslated regions. These results highlight how developmental stage influences the effects of genetic variation and uncover multiple mechanisms that regulate and buffer expression variation during embryogenesis.

Subject(s)

Drosophila melanogaster/embryology , Drosophila melanogaster/genetics , Embryonic Development/genetics , Gene Expression Regulation, Developmental , Genetic Variation , 3' Untranslated Regions/genetics , Alleles , Animals , Binding Sites , Enhancer Elements, Genetic , Linkage Disequilibrium , Mutation , Quantitative Trait Loci , RNA 3' End Processing , Transcription Factors/metabolism , Transcription, Genetic

19.

Publisher Correction: Genomic reconstruction of the SARS CoV-2 epidemic in England.

Vöhringer, Harald S; Sanderson, Theo; Sinnott, Matthew; De Maio, Nicola; Nguyen, Thuy; Goater, Richard; Schwach, Frank; Harrison, Ian; Hellewell, Joel; Ariani, Cristina V; Gonçalves, Sonia; Jackson, David K; Johnston, Ian; Jung, Alexander W; Saint, Callum; Sillitoe, John; Suciu, Maria; Goldman, Nick; Panovska-Griffiths, Jasmina; Birney, Ewan; Volz, Erik; Funk, Sebastian; Kwiatkowski, Dominic; Chand, Meera; Martincorena, Inigo; Barrett, Jeffrey C; Gerstung, Moritz.

Nature ; 606(7915): E18, 2022 Jun.

Article in English | MEDLINE | ID: mdl-35701578

20.

Common genetic variation drives molecular heterogeneity in human iPSCs.

Kilpinen, Helena; Goncalves, Angela; Leha, Andreas; Afzal, Vackar; Alasoo, Kaur; Ashford, Sofie; Bala, Sendu; Bensaddek, Dalila; Casale, Francesco Paolo; Culley, Oliver J; Danecek, Petr; Faulconbridge, Adam; Harrison, Peter W; Kathuria, Annie; McCarthy, Davis; McCarthy, Shane A; Meleckyte, Ruta; Memari, Yasin; Moens, Nathalie; Soares, Filipa; Mann, Alice; Streeter, Ian; Agu, Chukwuma A; Alderton, Alex; Nelson, Rachel; Harper, Sarah; Patel, Minal; White, Alistair; Patel, Sharad R; Clarke, Laura; Halai, Reena; Kirton, Christopher M; Kolb-Kokocinski, Anja; Beales, Philip; Birney, Ewan; Danovi, Davide; Lamond, Angus I; Ouwehand, Willem H; Vallier, Ludovic; Watt, Fiona M; Durbin, Richard; Stegle, Oliver; Gaffney, Daniel J.

Nature ; 546(7658): 370-375, 2017 06 15.

Article in English | MEDLINE | ID: mdl-28489815

ABSTRACT

Technology utilizing human induced pluripotent stem cells (iPS cells) has enormous potential to provide improved cellular models of human disease. However, variable genetic and phenotypic characterization of many existing iPS cell lines limits their potential use for research and therapy. Here we describe the systematic generation, genotyping and phenotyping of 711 iPS cell lines derived from 301 healthy individuals by the Human Induced Pluripotent Stem Cells Initiative. Our study outlines the major sources of genetic and phenotypic variation in iPS cells and establishes their suitability as models of complex human traits and cancer. Through genome-wide profiling we find that 5-46% of the variation in different iPS cell phenotypes, including differentiation capacity and cellular morphology, arises from differences between individuals. Additionally, we assess the phenotypic consequences of genomic copy-number alterations that are repeatedly observed in iPS cells. In addition, we present a comprehensive map of common regulatory variants affecting the transcriptome of human pluripotent cells.

Subject(s)

Genetic Variation/genetics , Induced Pluripotent Stem Cells/metabolism , Cells, Cultured , Cellular Reprogramming/genetics , DNA Copy Number Variations/genetics , Gene Expression Regulation/genetics , Genotype , Humans , Organ Specificity , Phenotype , Quality Control , Quantitative Trait Loci/genetics , Transcriptome/genetics

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL