Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 527
Filter
1.
Cell ; 186(6): 1279-1294.e19, 2023 03 16.
Article in English | MEDLINE | ID: mdl-36868220

ABSTRACT

Antarctic krill (Euphausia superba) is Earth's most abundant wild animal, and its enormous biomass is vital to the Southern Ocean ecosystem. Here, we report a 48.01-Gb chromosome-level Antarctic krill genome, whose large genome size appears to have resulted from inter-genic transposable element expansions. Our assembly reveals the molecular architecture of the Antarctic krill circadian clock and uncovers expanded gene families associated with molting and energy metabolism, providing insights into adaptations to the cold and highly seasonal Antarctic environment. Population-level genome re-sequencing from four geographical sites around the Antarctic continent reveals no clear population structure but highlights natural selection associated with environmental variables. An apparent drastic reduction in krill population size 10 mya and a subsequent rebound 100 thousand years ago coincides with climate change events. Our findings uncover the genomic basis of Antarctic krill adaptations to the Southern Ocean and provide valuable resources for future Antarctic research.


Subject(s)
Euphausiacea , Genome , Animals , Circadian Clocks/genetics , Ecosystem , Euphausiacea/genetics , Euphausiacea/physiology , Genomics , Sequence Analysis, DNA , DNA Transposable Elements , Biological Evolution , Adaptation, Physiological
2.
Cell ; 185(10): 1646-1660.e18, 2022 05 12.
Article in English | MEDLINE | ID: mdl-35447073

ABSTRACT

Incomplete lineage sorting (ILS) makes ancestral genetic polymorphisms persist during rapid speciation events, inducing incongruences between gene trees and species trees. ILS has complicated phylogenetic inference in many lineages, including hominids. However, we lack empirical evidence that ILS leads to incongruent phenotypic variation. Here, we performed phylogenomic analyses to show that the South American monito del monte is the sister lineage of all Australian marsupials, although over 31% of its genome is closer to the Diprotodontia than to other Australian groups due to ILS during ancient radiation. Pervasive conflicting phylogenetic signals across the whole genome are consistent with some of the morphological variation among extant marsupials. We detected hundreds of genes that experienced stochastic fixation during ILS, encoding the same amino acids in non-sister species. Using functional experiments, we confirm how ILS may have directly contributed to hemiplasy in morphological traits that were established during rapid marsupial speciation ca. 60 mya.


Subject(s)
Marsupialia , Animals , Australia , Evolution, Molecular , Genetic Speciation , Genome , Marsupialia/genetics , Phenotype , Phylogeny
3.
Cell ; 184(5): 1377-1391.e14, 2021 03 04.
Article in English | MEDLINE | ID: mdl-33545088

ABSTRACT

Rich fossil evidence suggests that many traits and functions related to terrestrial evolution were present long before the ancestor of lobe- and ray-finned fishes. Here, we present genome sequences of the bichir, paddlefish, bowfin, and alligator gar, covering all major early divergent lineages of ray-finned fishes. Our analyses show that these species exhibit many mosaic genomic features of lobe- and ray-finned fishes. In particular, many regulatory elements for limb development are present in these fishes, supporting the hypothesis that the relevant ancestral regulation networks emerged before the origin of tetrapods. Transcriptome analyses confirm the homology between the lung and swim bladder and reveal the presence of functional lung-related genes in early ray-finned fishes. Furthermore, we functionally validate the essential role of a jawed vertebrate highly conserved element for cardiovascular development. Our results imply the ancestors of jawed vertebrates already had the potential gene networks for cardio-respiratory systems supporting air breathing.


Subject(s)
Biological Evolution , Fishes/genetics , Animal Fins/physiology , Animals , Cardiovascular Physiological Phenomena , Cardiovascular System/anatomy & histology , Extremities/physiology , Fishes/classification , Genome , Lung/anatomy & histology , Lung/physiology , Phylogeny , Receptors, Odorant/genetics , Transcription Factors/genetics , Transcription Factors/metabolism , Transcriptome , Vertebrates/classification , Vertebrates/genetics
4.
Cell ; 184(2): 404-421.e16, 2021 01 21.
Article in English | MEDLINE | ID: mdl-33357445

ABSTRACT

Hepatocellular carcinoma (HCC) has high relapse and low 5-year survival rates. Single-cell profiling in relapsed HCC may aid in the design of effective anticancer therapies, including immunotherapies. We profiled the transcriptomes of ∼17,000 cells from 18 primary or early-relapse HCC cases. Early-relapse tumors have reduced levels of regulatory T cells, increased dendritic cells (DCs), and increased infiltrated CD8+ T cells, compared with primary tumors, in two independent cohorts. Remarkably, CD8+ T cells in recurrent tumors overexpressed KLRB1 (CD161) and displayed an innate-like low cytotoxic state, with low clonal expansion, unlike the classical exhausted state observed in primary HCC. The enrichment of these cells was associated with a worse prognosis. Differential gene expression and interaction analyses revealed potential immune evasion mechanisms in recurrent tumor cells that dampen DC antigen presentation and recruit innate-like CD8+ T cells. Our comprehensive picture of the HCC ecosystem provides deeper insights into immune evasion mechanisms associated with tumor relapse.


Subject(s)
Carcinoma, Hepatocellular/pathology , Liver Neoplasms/pathology , Neoplasm Recurrence, Local/pathology , Single-Cell Analysis , CD8-Positive T-Lymphocytes/immunology , Carcinoma, Hepatocellular/genetics , Carcinoma, Hepatocellular/immunology , Gene Expression Regulation, Neoplastic , Humans , Killer Cells, Natural/immunology , Liver Neoplasms/genetics , Liver Neoplasms/immunology , Myeloid Cells/metabolism , Neoplasm Recurrence, Local/genetics , Neoplasm Recurrence, Local/immunology , Phenotype , RNA-Seq , Tumor Microenvironment
5.
Cell ; 179(5): 1057-1067.e14, 2019 11 14.
Article in English | MEDLINE | ID: mdl-31730849

ABSTRACT

The transition to a terrestrial environment, termed terrestrialization, is generally regarded as a pivotal event in the evolution and diversification of the land plant flora that changed the surface of our planet. Through phylogenomic studies, a group of streptophyte algae, the Zygnematophyceae, have recently been recognized as the likely sister group to land plants (embryophytes). Here, we report genome sequences and analyses of two early diverging Zygnematophyceae (Spirogloea muscicola gen. nov. and Mesotaenium endlicherianum) that share the same subaerial/terrestrial habitat with the earliest-diverging embryophytes, the bryophytes. We provide evidence that genes (i.e., GRAS and PYR/PYL/RCAR) that increase resistance to biotic and abiotic stresses in land plants, in particular desiccation, originated or expanded in the common ancestor of Zygnematophyceae and embryophytes, and were gained by horizontal gene transfer (HGT) from soil bacteria. These two Zygnematophyceae genomes represent a cornerstone for future studies to understand the underlying molecular mechanism and process of plant terrestrialization.


Subject(s)
Biological Evolution , Embryophyta/genetics , Genome, Plant , Streptophyta/genetics , Abscisic Acid/pharmacology , Amino Acid Sequence , Multigene Family , Phylogeny , Plant Proteins/chemistry , Protein Domains , Streptophyta/classification , Symbiosis/genetics , Synteny/genetics
6.
Cell ; 175(2): 347-359.e14, 2018 10 04.
Article in English | MEDLINE | ID: mdl-30290141

ABSTRACT

We analyze whole-genome sequencing data from 141,431 Chinese women generated for non-invasive prenatal testing (NIPT). We use these data to characterize the population genetic structure and to investigate genetic associations with maternal and infectious traits. We show that the present day distribution of alleles is a function of both ancient migration and very recent population movements. We reveal novel phenotype-genotype associations, including several replicated associations with height and BMI, an association between maternal age and EMB, and between twin pregnancy and NRG1. Finally, we identify a unique pattern of circulating viral DNA in plasma with high prevalence of hepatitis B and other clinically relevant maternal infections. A GWAS for viral infections identifies an exceptionally strong association between integrated herpesvirus 6 and MOV10L1, which affects piwi-interacting RNA (piRNA) processing and PIWI protein function. These findings demonstrate the great value and potential of accumulating NIPT data for worldwide medical and genetic analyses.


Subject(s)
Asian People/genetics , Prenatal Diagnosis/methods , Adult , Alleles , China , DNA/genetics , Ethnicity/genetics , Female , Gene Frequency/genetics , Genetic Testing , Genetic Variation/genetics , Genetics, Population/methods , Genome-Wide Association Study/methods , Genomics/methods , Human Migration , Humans , Pregnancy , Sequence Analysis, DNA
7.
Nature ; 633(8029): 371-379, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39232160

ABSTRACT

The past two decades has witnessed a remarkable increase in the number of microbial genomes retrieved from marine systems1,2. However, it has remained challenging to translate this marine genomic diversity into biotechnological and biomedical applications3,4. Here we recovered 43,191 bacterial and archaeal genomes from publicly available marine metagenomes, encompassing a wide range of diversity with 138 distinct phyla, redefining the upper limit of marine bacterial genome size and revealing complex trade-offs between the occurrence of CRISPR-Cas systems and antibiotic resistance genes. In silico bioprospecting of these marine genomes led to the discovery of a novel CRISPR-Cas9 system, ten antimicrobial peptides, and three enzymes that degrade polyethylene terephthalate. In vitro experiments confirmed their effectiveness and efficacy. This work provides evidence that global-scale sequencing initiatives advance our understanding of how microbial diversity has evolved in the oceans and is maintained, and demonstrates how such initiatives can be sustainably exploited to advance biotechnology and biomedicine.


Subject(s)
Aquatic Organisms , Biodiversity , Bioprospecting , CRISPR-Cas Systems , CRISPR-Cas Systems/genetics , Aquatic Organisms/genetics , Bacteria/genetics , Bacteria/classification , Archaea/genetics , Archaea/classification , Genome, Bacterial/genetics , Metagenome , Genome, Archaeal/genetics , Seawater/microbiology , Phylogeny , Oceans and Seas
8.
Cell ; 148(5): 873-85, 2012 Mar 02.
Article in English | MEDLINE | ID: mdl-22385957

ABSTRACT

Tumor heterogeneity presents a challenge for inferring clonal evolution and driver gene identification. Here, we describe a method for analyzing the cancer genome at a single-cell nucleotide level. To perform our analyses, we first devised and validated a high-throughput whole-genome single-cell sequencing method using two lymphoblastoid cell line single cells. We then carried out whole-exome single-cell sequencing of 90 cells from a JAK2-negative myeloproliferative neoplasm patient. The sequencing data from 58 cells passed our quality control criteria, and these data indicated that this neoplasm represented a monoclonal evolution. We further identified essential thrombocythemia (ET)-related candidate mutations such as SESN2 and NTRK1, which may be involved in neoplasm progression. This pilot study allowed the initial characterization of the disease-related genetic architecture at the single-cell nucleotide level. Further, we established a single-cell sequencing method that opens the way for detailed analyses of a variety of tumor types, including those with high genetic complex between patients.


Subject(s)
Clonal Evolution , Gene Expression Profiling , High-Throughput Nucleotide Sequencing/methods , Janus Kinase 2/genetics , Myeloproliferative Disorders/genetics , Myeloproliferative Disorders/pathology , Single-Cell Analysis/methods , Thrombocythemia, Essential/genetics , Exome , Genome, Human , Humans , Male , Middle Aged , Mutation
9.
Cell ; 148(5): 886-95, 2012 Mar 02.
Article in English | MEDLINE | ID: mdl-22385958

ABSTRACT

Clear cell renal cell carcinoma (ccRCC) is the most common kidney cancer and has very few mutations that are shared between different patients. To better understand the intratumoral genetics underlying mutations of ccRCC, we carried out single-cell exome sequencing on a ccRCC tumor and its adjacent kidney tissue. Our data indicate that this tumor was unlikely to have resulted from mutations in VHL and PBRM1. Quantitative population genetic analysis indicates that the tumor did not contain any significant clonal subpopulations and also showed that mutations that had different allele frequencies within the population also had different mutation spectrums. Analyses of these data allowed us to delineate a detailed intratumoral genetic landscape at a single-cell level. Our pilot study demonstrates that ccRCC may be more genetically complex than previously thought and provides information that can lead to new ways to investigate individual tumors, with the aim of developing more effective cellular targeted therapies.


Subject(s)
Carcinoma, Renal Cell/genetics , Carcinoma, Renal Cell/pathology , Kidney Neoplasms/genetics , Kidney Neoplasms/pathology , Single-Cell Analysis/methods , DNA-Binding Proteins , Exome , Gene Frequency , Humans , Male , Middle Aged , Mutation , Nuclear Proteins/genetics , Phylogeny , Pilot Projects , Principal Component Analysis , Transcription Factors/genetics , Von Hippel-Lindau Tumor Suppressor Protein/genetics
10.
Nature ; 594(7862): 227-233, 2021 06.
Article in English | MEDLINE | ID: mdl-33910227

ABSTRACT

The accurate and complete assembly of both haplotype sequences of a diploid organism is essential to understanding the role of variation in genome functions, phenotypes and diseases1. Here, using a trio-binning approach, we present a high-quality, diploid reference genome, with both haplotypes assembled independently at the chromosome level, for the common marmoset (Callithrix jacchus), an primate model system that is widely used in biomedical research2,3. The full spectrum of heterozygosity between the two haplotypes involves 1.36% of the genome-much higher than the 0.13% indicated by the standard estimation based on single-nucleotide heterozygosity alone. The de novo mutation rate is 0.43 × 10-8 per site per generation, and the paternal inherited genome acquired twice as many mutations as the maternal. Our diploid assembly enabled us to discover a recent expansion of the sex-differentiation region and unique evolutionary changes in the marmoset Y chromosome. In addition, we identified many genes with signatures of positive selection that might have contributed to the evolution of Callithrix biological features. Brain-related genes were highly conserved between marmosets and humans, although several genes experienced lineage-specific copy number variations or diversifying selection, with implications for the use of marmosets as a model system.


Subject(s)
Callithrix/genetics , Diploidy , Evolution, Molecular , Genome/genetics , Genomics/standards , Animals , Biomedical Research , DNA Copy Number Variations , Female , Germ-Line Mutation/genetics , Haplotypes/genetics , Heterozygote , Humans , INDEL Mutation/genetics , Male , Reference Standards , Selection, Genetic , Sex Differentiation/genetics , Y Chromosome/genetics
11.
Nature ; 592(7856): 756-762, 2021 04.
Article in English | MEDLINE | ID: mdl-33408411

ABSTRACT

Egg-laying mammals (monotremes) are the only extant mammalian outgroup to therians (marsupial and eutherian animals) and provide key insights into mammalian evolution1,2. Here we generate and analyse reference genomes of the platypus (Ornithorhynchus anatinus) and echidna (Tachyglossus aculeatus), which represent the only two extant monotreme lineages. The nearly complete platypus genome assembly has anchored almost the entire genome onto chromosomes, markedly improving the genome continuity and gene annotation. Together with our echidna sequence, the genomes of the two species allow us to detect the ancestral and lineage-specific genomic changes that shape both monotreme and mammalian evolution. We provide evidence that the monotreme sex chromosome complex originated from an ancestral chromosome ring configuration. The formation of such a unique chromosome complex may have been facilitated by the unusually extensive interactions between the multi-X and multi-Y chromosomes that are shared by the autosomal homologues in humans. Further comparative genomic analyses unravel marked differences between monotremes and therians in haptoglobin genes, lactation genes and chemosensory receptor genes for smell and taste that underlie the ecological adaptation of monotremes.


Subject(s)
Biological Evolution , Genome , Platypus/genetics , Tachyglossidae/genetics , Animals , Female , Male , Mammals/genetics , Phylogeny , Sex Chromosomes/genetics
12.
Nature ; 599(7886): 622-627, 2021 11.
Article in English | MEDLINE | ID: mdl-34759320

ABSTRACT

Zero hunger and good health could be realized by 2030 through effective conservation, characterization and utilization of germplasm resources1. So far, few chickpea (Cicer arietinum) germplasm accessions have been characterized at the genome sequence level2. Here we present a detailed map of variation in 3,171 cultivated and 195 wild accessions to provide publicly available resources for chickpea genomics research and breeding. We constructed a chickpea pan-genome to describe genomic diversity across cultivated chickpea and its wild progenitor accessions. A divergence tree using genes present in around 80% of individuals in one species allowed us to estimate the divergence of Cicer over the last 21 million years. Our analysis found chromosomal segments and genes that show signatures of selection during domestication, migration and improvement. The chromosomal locations of deleterious mutations responsible for limited genetic diversity and decreased fitness were identified in elite germplasm. We identified superior haplotypes for improvement-related traits in landraces that can be introgressed into elite breeding lines through haplotype-based breeding, and found targets for purging deleterious alleles through genomics-assisted breeding and/or gene editing. Finally, we propose three crop breeding strategies based on genomic prediction to enhance crop productivity for 16 traits while avoiding the erosion of genetic diversity through optimal contribution selection (OCS)-based pre-breeding. The predicted performance for 100-seed weight, an important yield-related trait, increased by up to 23% and 12% with OCS- and haplotype-based genomic approaches, respectively.


Subject(s)
Cicer/genetics , Genetic Variation , Genome, Plant/genetics , Sequence Analysis, DNA , Crops, Agricultural/genetics , Haplotypes/genetics , Plant Breeding , Polymorphism, Single Nucleotide/genetics
13.
Nature ; 578(7793): 129-136, 2020 02.
Article in English | MEDLINE | ID: mdl-32025019

ABSTRACT

Transcript alterations often result from somatic changes in cancer genomes1. Various forms of RNA alterations have been described in cancer, including overexpression2, altered splicing3 and gene fusions4; however, it is difficult to attribute these to underlying genomic changes owing to heterogeneity among patients and tumour types, and the relatively small cohorts of patients for whom samples have been analysed by both transcriptome and whole-genome sequencing. Here we present, to our knowledge, the most comprehensive catalogue of cancer-associated gene alterations to date, obtained by characterizing tumour transcriptomes from 1,188 donors of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA)5. Using matched whole-genome sequencing data, we associated several categories of RNA alterations with germline and somatic DNA alterations, and identified probable genetic mechanisms. Somatic copy-number alterations were the major drivers of variations in total gene and allele-specific expression. We identified 649 associations of somatic single-nucleotide variants with gene expression in cis, of which 68.4% involved associations with flanking non-coding regions of the gene. We found 1,900 splicing alterations associated with somatic mutations, including the formation of exons within introns in proximity to Alu elements. In addition, 82% of gene fusions were associated with structural variants, including 75 of a new class, termed 'bridged' fusions, in which a third genomic location bridges two genes. We observed transcriptomic alteration signatures that differ between cancer types and have associations with variations in DNA mutational signatures. This compendium of RNA alterations in the genomic context provides a rich resource for identifying genes and mechanisms that are functionally implicated in cancer.


Subject(s)
Gene Expression Regulation, Neoplastic , Neoplasms/genetics , RNA/genetics , DNA Copy Number Variations , DNA, Neoplasm , Genome, Human , Genomics , Humans , Transcriptome
14.
Genome Res ; 32(2): 228-241, 2022 02.
Article in English | MEDLINE | ID: mdl-35064006

ABSTRACT

The pathogenesis of COVID-19 is still elusive, which impedes disease progression prediction, differential diagnosis, and targeted therapy. Plasma cell-free RNAs (cfRNAs) carry unique information from human tissue and thus could point to resourceful solutions for pathogenesis and host-pathogen interactions. Here, we performed a comparative analysis of cfRNA profiles between COVID-19 patients and healthy donors using serial plasma. Analyses of the cfRNA landscape, potential gene regulatory mechanisms, dynamic changes in tRNA pools upon infection, and microbial communities were performed. A total of 380 cfRNA molecules were up-regulated in all COVID-19 patients, of which seven could serve as potential biomarkers (AUC > 0.85) with great sensitivity and specificity. Antiviral (NFKB1A, IFITM3, and IFI27) and neutrophil activation (S100A8, CD68, and CD63)-related genes exhibited decreased expression levels during treatment in COVID-19 patients, which is in accordance with the dynamically enhanced inflammatory response in COVID-19 patients. Noncoding RNAs, including some microRNAs (let 7 family) and long noncoding RNAs (GJA9-MYCBP) targeting interleukin (IL6/IL6R), were differentially expressed between COVID-19 patients and healthy donors, which accounts for the potential core mechanism of cytokine storm syndromes; the tRNA pools change significantly between the COVID-19 and healthy group, leading to the accumulation of SARS-CoV-2 biased codons, which facilitate SARS-CoV-2 replication. Finally, several pneumonia-related microorganisms were detected in the plasma of COVID-19 patients, raising the possibility of simultaneously monitoring immune response regulation and microbial communities using cfRNA analysis. This study fills the knowledge gap in the plasma cfRNA landscape of COVID-19 patients and offers insight into the potential mechanisms of cfRNAs to explain COVID-19 pathogenesis.


Subject(s)
COVID-19 , Cell-Free Nucleic Acids , RNA/blood , COVID-19/blood , COVID-19/genetics , Cell-Free Nucleic Acids/blood , Cytokine Release Syndrome , Humans , SARS-CoV-2
15.
Nucleic Acids Res ; 51(21): 11770-11782, 2023 Nov 27.
Article in English | MEDLINE | ID: mdl-37870428

ABSTRACT

Precision medicine depends on high-accuracy individual-level genotype data. However, the whole-genome sequencing (WGS) is still not suitable for gigantic studies due to budget constraints. It is particularly important to construct highly accurate haplotype reference panel for genotype imputation. In this study, we used 10 000 samples with medium-depth WGS to construct a reference panel that we named the CKB reference panel. By imputing microarray datasets, it showed that the CKB panel outperformed compared panels in terms of both the number of well-imputed variants and imputation accuracy. In addition, we have completed the imputation of 100 706 microarrays with the CKB panel, and the after-imputed data is the hitherto largest whole genome data of the Chinese population. Furthermore, in the GWAS analysis of real phenotype height, the number of tested SNPs tripled and the number of significant SNPs doubled after imputation. Finally, we developed an online server for offering free genotype imputation service based on the CKB reference panel (https://db.cngb.org/imputation/). We believe that the CKB panel is of great value for imputing microarray or low-coverage genotype data of Chinese population, and potentially mixed populations. The imputation-completed 100 706 microarray data are enormous and precious resources of population genetic studies for complex traits and diseases.


Subject(s)
Biological Specimen Banks , Genome , Humans , Haplotypes , Genotype , Genome-Wide Association Study , Polymorphism, Single Nucleotide , China
16.
PLoS Pathog ; 18(2): e1010288, 2022 02.
Article in English | MEDLINE | ID: mdl-35167626

ABSTRACT

Urogenital schistosomiasis is caused by the blood fluke Schistosoma haematobium and is one of the most neglected tropical diseases worldwide, afflicting > 100 million people. It is characterised by granulomata, fibrosis and calcification in urogenital tissues, and can lead to increased susceptibility to HIV/AIDS and squamous cell carcinoma of the bladder. To complement available treatment programs and break the transmission of disease, sound knowledge and understanding of the biology and ecology of S. haematobium is required. Hybridisation/introgression events and molecular variation among members of the S. haematobium-group might effect important biological and/or disease traits as well as the morbidity of disease and the effectiveness of control programs including mass drug administration. Here we report the first chromosome-contiguous genome for a well-defined laboratory line of this blood fluke. An exploration of this genome using transcriptomic data for all key developmental stages allowed us to refine gene models (including non-coding elements) and annotations, discover 'new' genes and transcription profiles for these stages, likely linked to development and/or pathogenesis. Molecular variation within S. haematobium among some geographical locations in Africa revealed unique genomic 'signatures' that matched species other than S. haematobium, indicating the occurrence of introgression events. The present reference genome (designated Shae.V3) and the findings from this study solidly underpin future functional genomic and molecular investigations of S. haematobium and accelerate systematic, large-scale population genomics investigations, with a focus on improved and sustained control of urogenital schistosomiasis.


Subject(s)
Genetic Variation , Genome, Protozoan , Schistosoma haematobium/genetics , Schistosomiasis haematobia/parasitology , Transcriptome , Animals , Chromosomes/parasitology , Genes, Protozoan , Genome , Genome-Wide Association Study , Sequence Analysis, DNA
18.
Nucleic Acids Res ; 50(14): e81, 2022 08 12.
Article in English | MEDLINE | ID: mdl-35536244

ABSTRACT

Interpretation of non-coding genome remains an unsolved challenge in human genetics due to impracticality of exhaustively annotating biochemically active elements in all conditions. Deep learning based computational approaches emerge recently to help interpret non-coding regions. Here, we present LOGO (Language of Genome), a self-attention based contextualized pre-trained language model containing only two self-attention layers with 1 million parameters as a substantially light architecture that applies self-supervision techniques to learn bidirectional representations of the unlabelled human reference genome. LOGO is then fine-tuned for sequence labelling task, and further extended to variant prioritization task via a special input encoding scheme of alternative alleles followed by adding a convolutional module. Experiments show that LOGO achieves 15% absolute improvement for promoter identification and up to 4.5% absolute improvement for enhancer-promoter interaction prediction. LOGO exhibits state-of-the-art multi-task predictive power on thousands of chromatin features with only 3% parameterization benchmarking against the fully supervised model, DeepSEA and 1% parameterization against a recent BERT-based DNA language model. For allelic-effect prediction, locality introduced by one dimensional convolution shows improved sensitivity and specificity for prioritizing non-coding variants associated with human diseases. In addition, we apply LOGO to interpret type 2 diabetes (T2D) GWAS signals and infer underlying regulatory mechanisms. We make a conceptual analogy between natural language and human genome and demonstrate LOGO is an accurate, fast, scalable, and robust framework to interpret non-coding regions for global sequence labeling as well as for variant prioritization at base-resolution.


Subject(s)
Diabetes Mellitus, Type 2 , Genome, Human , Humans , Promoter Regions, Genetic , Regulatory Sequences, Nucleic Acid , Sequence Analysis, DNA/methods
19.
Proc Natl Acad Sci U S A ; 118(37)2021 09 14.
Article in English | MEDLINE | ID: mdl-34503999

ABSTRACT

The ancestors of marine mammals once roamed the land and independently committed to an aquatic lifestyle. These macroevolutionary transitions have intrigued scientists for centuries. Here, we generated high-quality genome assemblies of 17 marine mammals (11 cetaceans and six pinnipeds), including eight assemblies at the chromosome level. Incorporating previously published data, we reconstructed the marine mammal phylogeny and population histories and identified numerous idiosyncratic and convergent genomic variations that possibly contributed to the transition from land to water in marine mammal lineages. Genes associated with the formation of blubber (NFIA), vascular development (SEMA3E), and heat production by brown adipose tissue (UCP1) had unique changes that may contribute to marine mammal thermoregulation. We also observed many lineage-specific changes in the marine mammals, including genes associated with deep diving and navigation. Our study advances understanding of the timing, pattern, and molecular changes associated with the evolution of mammalian lineages adapting to aquatic life.


Subject(s)
Adaptation, Physiological , Evolution, Molecular , Genome , Genomics , Mammals/physiology , Phylogeny , Thermogenesis/genetics , Animals , NFI Transcription Factors/genetics , NFI Transcription Factors/metabolism , Selection, Genetic , Semaphorins/genetics , Semaphorins/metabolism , Uncoupling Protein 1/genetics , Uncoupling Protein 1/metabolism
20.
Mol Genet Genomics ; 298(4): 823-836, 2023 Jul.
Article in English | MEDLINE | ID: mdl-37059908

ABSTRACT

Coronavirus 2019 (COVID-19) is a complex disease that affects billions of people worldwide. Currently, effective etiological treatment of COVID-19 is still lacking; COVID-19 also causes damages to various organs that affects therapeutics and mortality of the patients. Surveillance of the treatment responses and organ injury assessment of COVID-19 patients are of high clinical value. In this study, we investigated the characteristic fragmentation patterns and explored the potential in tissue injury assessment of plasma cell-free DNA in COVID-19 patients. Through recruitment of 37 COVID-19 patients, 32 controls and analysis of 208 blood samples upon diagnosis and during treatment, we report gross abnormalities in cfDNA of COVID-19 patients, including elevated GC content, altered molecule size and end motif patterns. More importantly, such cfDNA fragmentation characteristics reflect patient-specific physiological changes during treatment. Further analysis on cfDNA tissue-of-origin tracing reveals frequent tissue injuries in COVID-19 patients, which is supported by clinical diagnoses. Hence, our work demonstrates and extends the translational merit of cfDNA fragmentation pattern as valuable analyte for effective treatment monitoring, as well as tissue injury assessment in COVID-19.


Subject(s)
COVID-19 , Cell-Free Nucleic Acids , Humans , COVID-19/diagnosis , Cell-Free Nucleic Acids/genetics
SELECTION OF CITATIONS
SEARCH DETAIL