Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 3.397
Filter
Add more filters

Publication year range
1.
Cell ; 184(20): 5179-5188.e8, 2021 09 30.
Article in English | MEDLINE | ID: mdl-34499854

ABSTRACT

We present evidence for multiple independent origins of recombinant SARS-CoV-2 viruses sampled from late 2020 and early 2021 in the United Kingdom. Their genomes carry single-nucleotide polymorphisms and deletions that are characteristic of the B.1.1.7 variant of concern but lack the full complement of lineage-defining mutations. Instead, the remainder of their genomes share contiguous genetic variation with non-B.1.1.7 viruses circulating in the same geographic area at the same time as the recombinants. In four instances, there was evidence for onward transmission of a recombinant-origin virus, including one transmission cluster of 45 sequenced cases over the course of 2 months. The inferred genomic locations of recombination breakpoints suggest that every community-transmitted recombinant virus inherited its spike region from a B.1.1.7 parental virus, consistent with a transmission advantage for B.1.1.7's set of mutations.


Subject(s)
COVID-19/epidemiology , COVID-19/transmission , Pandemics , Recombination, Genetic , SARS-CoV-2/genetics , Base Sequence/genetics , COVID-19/virology , Computational Biology/methods , Gene Frequency , Genome, Viral , Genotype , Humans , Mutation , Phylogeny , Polymorphism, Single Nucleotide , United Kingdom/epidemiology , Whole Genome Sequencing/methods
2.
Cell ; 181(6): 1218-1231.e27, 2020 06 11.
Article in English | MEDLINE | ID: mdl-32492404

ABSTRACT

The discovery of the 2,000-year-old Dead Sea Scrolls had an incomparable impact on the historical understanding of Judaism and Christianity. "Piecing together" scroll fragments is like solving jigsaw puzzles with an unknown number of missing parts. We used the fact that most scrolls are made from animal skins to "fingerprint" pieces based on DNA sequences. Genetic sorting of the scrolls illuminates their textual relationship and historical significance. Disambiguating the contested relationship between Jeremiah fragments supplies evidence that some scrolls were brought to the Qumran caves from elsewhere; significantly, they demonstrate that divergent versions of Jeremiah circulated in parallel throughout Israel (ancient Judea). Similarly, patterns discovered in non-biblical scrolls, particularly the Songs of the Sabbath Sacrifice, suggest that the Qumran scrolls represent the broader cultural milieu of the period. Finally, genetic analysis divorces debated fragments from the Qumran scrolls. Our study demonstrates that interdisciplinary approaches enrich the scholar's toolkit.


Subject(s)
Base Sequence/genetics , Genetics/history , Skin/metabolism , Animals , Christianity/history , History, Ancient , Humans , Israel , Judaism/history
3.
Cell ; 181(5): 1062-1079.e30, 2020 05 28.
Article in English | MEDLINE | ID: mdl-32386547

ABSTRACT

Expansions of amino acid repeats occur in >20 inherited human disorders, and many occur in intrinsically disordered regions (IDRs) of transcription factors (TFs). Such diseases are associated with protein aggregation, but the contribution of aggregates to pathology has been controversial. Here, we report that alanine repeat expansions in the HOXD13 TF, which cause hereditary synpolydactyly in humans, alter its phase separation capacity and its capacity to co-condense with transcriptional co-activators. HOXD13 repeat expansions perturb the composition of HOXD13-containing condensates in vitro and in vivo and alter the transcriptional program in a cell-specific manner in a mouse model of synpolydactyly. Disease-associated repeat expansions in other TFs (HOXA13, RUNX2, and TBP) were similarly found to alter their phase separation. These results suggest that unblending of transcriptional condensates may underlie human pathologies. We present a molecular classification of TF IDRs, which provides a framework to dissect TF function in diseases associated with transcriptional dysregulation.


Subject(s)
DNA Repeat Expansion/genetics , Homeodomain Proteins/genetics , Transcription Factors/genetics , Alanine/genetics , Animals , Base Sequence/genetics , DNA Repeat Expansion/physiology , Disease Models, Animal , Homeodomain Proteins/metabolism , Humans , Male , Mice , Mutation/genetics , Pedigree , Syndactyly/genetics , Transcription Factors/metabolism
4.
Cell ; 180(2): 248-262.e21, 2020 01 23.
Article in English | MEDLINE | ID: mdl-31978344

ABSTRACT

The testis expresses the largest number of genes of any mammalian organ, a finding that has long puzzled molecular biologists. Our single-cell transcriptomic data of human and mouse spermatogenesis provide evidence that this widespread transcription maintains DNA sequence integrity in the male germline by correcting DNA damage through a mechanism we term transcriptional scanning. We find that genes expressed during spermatogenesis display lower mutation rates on the transcribed strand and have low diversity in the population. Moreover, this effect is fine-tuned by the level of gene expression during spermatogenesis. The unexpressed genes, which in our model do not benefit from transcriptional scanning, diverge faster over evolutionary timescales and are enriched for sensory and immune-defense functions. Collectively, we propose that transcriptional scanning shapes germline mutation signatures and modulates mutation rates in a gene-specific manner, maintaining DNA sequence integrity for the bulk of genes but allowing for faster evolution in a specific subset.


Subject(s)
Gene Expression/genetics , Germ-Line Mutation/genetics , Spermatogenesis/genetics , Adult , Animals , Base Sequence/genetics , Gene Expression Profiling/methods , Germ Cells/metabolism , Humans , Male , Mice , Mice, Inbred C57BL , Middle Aged , Mutation Rate , Testis/metabolism , Transcription, Genetic/genetics , Transcriptome/genetics
5.
Cell ; 181(2): 442-459.e29, 2020 04 16.
Article in English | MEDLINE | ID: mdl-32302573

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) is a powerful tool for defining cellular diversity in tumors, but its application toward dissecting mechanisms underlying immune-modulating therapies is scarce. We performed scRNA-seq analyses on immune and stromal populations from colorectal cancer patients, identifying specific macrophage and conventional dendritic cell (cDC) subsets as key mediators of cellular cross-talk in the tumor microenvironment. Defining comparable myeloid populations in mouse tumors enabled characterization of their response to myeloid-targeted immunotherapy. Treatment with anti-CSF1R preferentially depleted macrophages with an inflammatory signature but spared macrophage populations that in mouse and human expresses pro-angiogenic/tumorigenic genes. Treatment with a CD40 agonist antibody preferentially activated a cDC population and increased Bhlhe40+ Th1-like cells and CD8+ memory T cells. Our comprehensive analysis of key myeloid subsets in human and mouse identifies critical cellular interactions regulating tumor immunity and defines mechanisms underlying myeloid-targeted immunotherapies currently undergoing clinical testing.


Subject(s)
Colonic Neoplasms/pathology , Myeloid Cells/metabolism , Single-Cell Analysis/methods , Adult , Aged , Aged, 80 and over , Animals , Base Sequence/genetics , CD8-Positive T-Lymphocytes/immunology , China , Colonic Neoplasms/therapy , Colorectal Neoplasms/pathology , Dendritic Cells/immunology , Female , Humans , Immunotherapy , Macrophages/immunology , Male , Mice , Middle Aged , Sequence Analysis, RNA/methods , Tumor Microenvironment/genetics , Tumor Microenvironment/immunology
6.
Cell ; 178(4): 887-900.e14, 2019 08 08.
Article in English | MEDLINE | ID: mdl-31398342

ABSTRACT

Variable, glutamine-encoding, CAA interruptions indicate that a property of the uninterrupted HTT CAG repeat sequence, distinct from the length of huntingtin's polyglutamine segment, dictates the rate at which Huntington's disease (HD) develops. The timing of onset shows no significant association with HTT cis-eQTLs but is influenced, sometimes in a sex-specific manner, by polymorphic variation at multiple DNA maintenance genes, suggesting that the special onset-determining property of the uninterrupted CAG repeat is a propensity for length instability that leads to its somatic expansion. Additional naturally occurring genetic modifier loci, defined by GWAS, may influence HD pathogenesis through other mechanisms. These findings have profound implications for the pathogenesis of HD and other repeat diseases and question the fundamental premise that polyglutamine length determines the rate of pathogenesis in the "polyglutamine disorders."


Subject(s)
Huntingtin Protein/genetics , Huntington Disease/genetics , Peptides/genetics , Trinucleotide Repeat Expansion/genetics , Adult , Age of Onset , Aged , Aged, 80 and over , Alleles , Base Sequence/genetics , Female , Genetic Loci , Genome-Wide Association Study , Haplotypes/genetics , Humans , Male , Middle Aged , Phenotype , Polymorphism, Single Nucleotide/genetics , Young Adult
7.
Cell ; 178(1): 91-106.e23, 2019 06 27.
Article in English | MEDLINE | ID: mdl-31178116

ABSTRACT

Alternative polyadenylation (APA) is a major driver of transcriptome diversity in human cells. Here, we use deep learning to predict APA from DNA sequence alone. We trained our model (APARENT, APA REgression NeT) on isoform expression data from over 3 million APA reporters. APARENT's predictions are highly accurate when tasked with inferring APA in synthetic and human 3'UTRs. Visualizing features learned across all network layers reveals that APARENT recognizes sequence motifs known to recruit APA regulators, discovers previously unknown sequence determinants of 3' end processing, and integrates these features into a comprehensive, interpretable, cis-regulatory code. We apply APARENT to forward engineer functional polyadenylation signals with precisely defined cleavage position and isoform usage and validate predictions experimentally. Finally, we use APARENT to quantify the impact of genetic variants on APA. Our approach detects pathogenic variants in a wide range of disease contexts, expanding our understanding of the genetic origins of disease.


Subject(s)
Deep Learning , Models, Genetic , Polyadenylation/genetics , 3' Untranslated Regions/genetics , Base Sequence/genetics , Databases, Genetic , Gene Expression/genetics , HEK293 Cells , Humans , Mutagenesis/genetics , RNA Cleavage/genetics , RNA, Messenger/genetics , RNA-Seq , Synthetic Biology , Transcriptome
8.
Cell ; 176(6): 1265-1281.e24, 2019 03 07.
Article in English | MEDLINE | ID: mdl-30827681

ABSTRACT

Acute myeloid leukemia (AML) is a heterogeneous disease that resides within a complex microenvironment, complicating efforts to understand how different cell types contribute to disease progression. We combined single-cell RNA sequencing and genotyping to profile 38,410 cells from 40 bone marrow aspirates, including 16 AML patients and five healthy donors. We then applied a machine learning classifier to distinguish a spectrum of malignant cell types whose abundances varied between patients and between subclones in the same tumor. Cell type compositions correlated with prototypic genetic lesions, including an association of FLT3-ITD with abundant progenitor-like cells. Primitive AML cells exhibited dysregulated transcriptional programs with co-expression of stemness and myeloid priming genes and had prognostic significance. Differentiated monocyte-like AML cells expressed diverse immunomodulatory genes and suppressed T cell activity in vitro. In conclusion, we provide single-cell technologies and an atlas of AML cell states, regulators, and markers with implications for precision medicine and immune therapies. VIDEO ABSTRACT.


Subject(s)
Leukemia, Myeloid, Acute/genetics , Transcriptome/genetics , Adult , Base Sequence/genetics , Bone Marrow , Bone Marrow Cells/cytology , Cell Line, Tumor , Disease Progression , Female , Genotype , Humans , Leukemia, Myeloid, Acute/immunology , Leukemia, Myeloid, Acute/physiopathology , Machine Learning , Male , Middle Aged , Mutation , Prognosis , RNA , Signal Transduction , Single-Cell Analysis/methods , Tumor Microenvironment , Exome Sequencing/methods
9.
Annu Rev Genet ; 54: 309-336, 2020 11 23.
Article in English | MEDLINE | ID: mdl-32870730

ABSTRACT

Recent advances in pseudouridine detection reveal a complex pseudouridine landscape that includes messenger RNA and diverse classes of noncoding RNA in human cells. The known molecular functions of pseudouridine, which include stabilizing RNA conformations and destabilizing interactions with varied RNA-binding proteins, suggest that RNA pseudouridylation could have widespread effects on RNA metabolism and gene expression. Here, we emphasize how much remains to be learned about the RNA targets of human pseudouridine synthases, their basis for recognizing distinct RNA sequences, and the mechanisms responsible for regulated RNA pseudouridylation. We also examine the roles of noncoding RNA pseudouridylation in splicing and translation and point out the potential effects of mRNA pseudouridylation on protein production, including in the context of therapeutic mRNAs.


Subject(s)
Pseudouridine/genetics , RNA/genetics , Animals , Base Sequence/genetics , Humans , Intramolecular Transferases/genetics , RNA Splicing/genetics , RNA, Messenger/genetics
10.
Mol Cell ; 80(3): 541-553.e5, 2020 11 05.
Article in English | MEDLINE | ID: mdl-33068522

ABSTRACT

To address how genetic variation alters gene expression in complex cell mixtures, we developed direct nuclear tagmentation and RNA sequencing (DNTR-seq), which enables whole-genome and mRNA sequencing jointly in single cells. DNTR-seq readily identified minor subclones within leukemia patients. In a large-scale DNA damage screen, DNTR-seq was used to detect regions under purifying selection and identified genes where mRNA abundance was resistant to copy-number alteration, suggesting strong genetic compensation. mRNA sequencing (mRNA-seq) quality equals RNA-only methods, and the low positional bias of genomic libraries allowed detection of sub-megabase aberrations at ultra-low coverage. Each cell library is individually addressable and can be re-sequenced at increased depth, allowing multi-tiered study designs. Additionally, the direct tagmentation protocol enables coverage-independent estimation of ploidy, which can be used to identify cell singlets. Thus, DNTR-seq directly links each cell's state to its corresponding genome at scale, enabling routine analysis of heterogeneous tumors and other complex tissues.


Subject(s)
Gene Expression Profiling/methods , Single-Cell Analysis/methods , Whole Genome Sequencing/methods , Animals , Base Sequence/genetics , Cell Line, Tumor , Gene Library , High-Throughput Nucleotide Sequencing/methods , Humans , RNA/genetics , RNA, Messenger/genetics , Sequence Analysis, DNA/methods
11.
Genes Dev ; 34(9-10): 663-677, 2020 05 01.
Article in English | MEDLINE | ID: mdl-32217666

ABSTRACT

Cell type-specific transcriptional programs that drive differentiation of specialized cell types are key players in development and tissue regeneration. One of the most dramatic changes in the transcription program in Drosophila occurs with the transition from proliferating spermatogonia to differentiating spermatocytes, with >3000 genes either newly expressed or expressed from new alternative promoters in spermatocytes. Here we show that opening of these promoters from their closed state in precursor cells requires function of the spermatocyte-specific tMAC complex, localized at the promoters. The spermatocyte-specific promoters lack the previously identified canonical core promoter elements except for the Inr. Instead, these promoters are enriched for the binding site for the TALE-class homeodomain transcription factors Achi/Vis and for a motif originally identified under tMAC ChIP-seq peaks. The tMAC motif resembles part of the previously identified 14-bp ß2UE1 element critical for spermatocyte-specific expression. Analysis of downstream sequences relative to transcription start site usage suggested that ACA and CNAAATT motifs at specific positions can help promote efficient transcription initiation. Our results reveal how promoter-proximal sequence elements that recruit and are acted upon by cell type-specific chromatin binding complexes help establish a robust, cell type-specific transcription program for terminal differentiation.


Subject(s)
Drosophila Proteins/genetics , Drosophila melanogaster/growth & development , Drosophila melanogaster/genetics , Spermatogenesis/genetics , Amino Acid Motifs/genetics , Animals , Base Sequence/genetics , Drosophila Proteins/metabolism , Drosophila melanogaster/cytology , Male , Promoter Regions, Genetic/genetics , Spermatocytes/cytology , Spermatocytes/metabolism , Transcription Initiation Site , Transcriptome/genetics
13.
Mol Cell ; 75(3): 549-561.e7, 2019 08 08.
Article in English | MEDLINE | ID: mdl-31398323

ABSTRACT

Enhancers are DNA elements that are bound by transcription factors (TFs), which recruit coactivators and the transcriptional machinery to genes. Phase-separated condensates of TFs and coactivators have been implicated in assembling the transcription machinery at particular enhancers, yet the role of DNA sequence in this process has not been explored. We show that DNA sequences encoding TF binding site number, density, and affinity above sharply defined thresholds drive condensation of TFs and coactivators. A combination of specific structured (TF-DNA) and weak multivalent (TF-coactivator) interactions allows for condensates to form at particular genomic loci determined by the DNA sequence and the complement of expressed TFs. DNA features found to drive condensation promote enhancer activity and transcription in cells. Our study provides a framework to understand how the genome can scaffold transcriptional condensates at specific loci and how the universal phenomenon of phase separation might regulate this process.


Subject(s)
Chromatin/genetics , Enhancer Elements, Genetic , Transcription Factors/genetics , Transcription, Genetic , Animals , Base Sequence/genetics , Binding Sites/genetics , DNA/genetics , DNA-Binding Proteins/genetics , Gene Expression Regulation , Genomics , Mice , Mouse Embryonic Stem Cells
14.
Mol Cell ; 74(3): 584-597.e9, 2019 05 02.
Article in English | MEDLINE | ID: mdl-30905508

ABSTRACT

V(D)J recombination is essential to generate antigen receptor diversity but is also a potent cause of genome instability. Many chromosome alterations that result from aberrant V(D)J recombination involve breaks at single recombination signal sequences (RSSs). A long-standing question, however, is how such breaks occur. Here, we show that the genomic DNA that is excised during recombination, the excised signal circle (ESC), forms a complex with the recombinase proteins to efficiently catalyze breaks at single RSSs both in vitro and in vivo. Following cutting, the RSS is released while the ESC-recombinase complex remains intact to potentially trigger breaks at further RSSs. Consistent with this, chromosome breaks at RSSs increase markedly in the presence of the ESC. Notably, these breaks co-localize with those found in acute lymphoblastic leukemia patients and occur at key cancer driver genes. We have named this reaction "cut-and-run" and suggest that it could be a significant cause of lymphocyte genome instability.


Subject(s)
Genomic Instability/genetics , Precursor Cell Lymphoblastic Leukemia-Lymphoma/genetics , Translocation, Genetic/genetics , V(D)J Recombination/genetics , Animals , Base Sequence/genetics , COS Cells , Chlorocebus aethiops , Chromosomes/genetics , DNA/genetics , DNA Breaks, Double-Stranded , HEK293 Cells , Homeodomain Proteins/genetics , Humans , Mice , NIH 3T3 Cells , Precursor Cell Lymphoblastic Leukemia-Lymphoma/pathology , Recombinases/genetics
15.
Mol Cell ; 74(2): 245-253.e6, 2019 04 18.
Article in English | MEDLINE | ID: mdl-30826165

ABSTRACT

Transcription factors (TFs) control gene expression by binding DNA recognition sites in genomic regulatory regions. Although most forkhead TFs recognize a canonical forkhead (FKH) motif, RYAAAYA, some forkheads recognize a completely different (FHL) motif, GACGC. Bispecific forkhead proteins recognize both motifs, but the molecular basis for bispecific DNA recognition is not understood. We present co-crystal structures of the FoxN3 DNA binding domain bound to the FKH and FHL sites, respectively. FoxN3 adopts a similar conformation to recognize both motifs, making contacts with different DNA bases using the same amino acids. However, the DNA structure is different in the two complexes. These structures reveal how a single TF binds two unrelated DNA sequences and the importance of DNA shape in the mechanism of bispecific recognition.


Subject(s)
Cell Cycle Proteins/chemistry , DNA-Binding Proteins/chemistry , DNA/chemistry , Nucleic Acid Conformation , Repressor Proteins/chemistry , Amino Acid Sequence/genetics , Base Sequence/genetics , Binding Sites/genetics , Cell Cycle Proteins/genetics , Crystallography, X-Ray , DNA/genetics , DNA-Binding Proteins/genetics , Forkhead Transcription Factors , Gene Expression Regulation/genetics , Humans , Multiprotein Complexes/chemistry , Multiprotein Complexes/genetics , Nucleotide Motifs/genetics , Regulatory Sequences, Nucleic Acid/genetics , Repressor Proteins/genetics
16.
Mol Cell ; 72(4): 700-714.e8, 2018 11 15.
Article in English | MEDLINE | ID: mdl-30344094

ABSTRACT

Prokaryotic CRISPR-Cas systems provide adaptive immunity by integrating portions of foreign nucleic acids (spacers) into genomic CRISPR arrays. Cas6 proteins then process CRISPR array transcripts into spacer-derived RNAs (CRISPR RNAs; crRNAs) that target Cas nucleases to matching invaders. We find that a Marinomonas mediterranea fusion protein combines three enzymatic domains (Cas6, reverse transcriptase [RT], and Cas1), which function in both crRNA biogenesis and spacer acquisition from RNA and DNA. We report a crystal structure of this divergent Cas6, identify amino acids required for Cas6 activity, show that the Cas6 domain is required for RT activity and RNA spacer acquisition, and demonstrate that CRISPR-repeat binding to Cas6 regulates RT activity. Co-evolution of putative interacting surfaces suggests a specific structural interaction between the Cas6 and RT domains, and phylogenetic analysis reveals repeated, stable association of free-standing Cas6s with CRISPR RTs in multiple microbial lineages, indicating that a functional interaction between these proteins preceded evolution of the fusion.


Subject(s)
CRISPR-Associated Proteins/physiology , Clustered Regularly Interspaced Short Palindromic Repeats/physiology , RNA-Directed DNA Polymerase/physiology , Base Sequence/genetics , CRISPR-Cas Systems/physiology , Clustered Regularly Interspaced Short Palindromic Repeats/genetics , DNA , Endonucleases , Marinomonas/genetics , Marinomonas/metabolism , Phylogeny , RNA/biosynthesis , Substrate Specificity
17.
Mol Cell ; 69(3): 412-425.e6, 2018 02 01.
Article in English | MEDLINE | ID: mdl-29395063

ABSTRACT

Mutations in several general pre-mRNA splicing factors have been linked to myelodysplastic syndromes (MDSs) and solid tumors. These mutations have generally been assumed to cause disease by the resultant splicing defects, but different mutations appear to induce distinct splicing defects, raising the possibility that an alternative common mechanism is involved. Here we report a chain of events triggered by multiple splicing factor mutations, especially high-risk alleles in SRSF2 and U2AF1, including elevated R-loops, replication stress, and activation of the ataxia telangiectasia and Rad3-related protein (ATR)-Chk1 pathway. We further demonstrate that enhanced R-loops, opposite to the expectation from gained RNA binding with mutant SRSF2, result from impaired transcription pause release because the mutant protein loses its ability to extract the RNA polymerase II (Pol II) C-terminal domain (CTD) kinase-the positive transcription elongation factor complex (P-TEFb)-from the 7SK complex. Enhanced R-loops are linked to compromised proliferation of bone-marrow-derived blood progenitors, which can be partially rescued by RNase H overexpression, suggesting a direct contribution of augmented R-loops to the MDS phenotype.


Subject(s)
Base Sequence/genetics , Myelodysplastic Syndromes/genetics , RNA Splicing Factors/genetics , Cell Cycle Checkpoints/genetics , HEK293 Cells , Humans , Mutation , Nuclear Proteins/genetics , Phosphoproteins/genetics , RNA Splicing/genetics , RNA Splicing Factors/metabolism , Ribonucleoproteins/genetics , Serine-Arginine Splicing Factors/genetics , Splicing Factor U2AF/genetics
18.
EMBO J ; 40(10): e105464, 2021 05 17.
Article in English | MEDLINE | ID: mdl-33792944

ABSTRACT

Eukaryotic transcription factors recognize specific DNA sequence motifs, but are also endowed with generic, non-specific DNA-binding activity. How these binding modes are integrated to determine select transcriptional outputs remains unresolved. We addressed this question by site-directed mutagenesis of the Myc transcription factor. Impairment of non-specific DNA backbone contacts caused pervasive loss of genome interactions and gene regulation, associated with increased intra-nuclear mobility of the Myc protein in murine cells. In contrast, a mutant lacking base-specific contacts retained DNA-binding and mobility profiles comparable to those of the wild-type protein, but failed to recognize its consensus binding motif (E-box) and could not activate Myc-target genes. Incidentally, this mutant gained weak affinity for an alternative motif, driving aberrant activation of different genes. Altogether, our data show that non-specific DNA binding is required to engage onto genomic regulatory regions; sequence recognition in turn contributes to transcriptional activation, acting at distinct levels: stabilization and positioning of Myc onto DNA, and-unexpectedly-promotion of its transcriptional activity. Hence, seemingly pervasive genome interaction profiles, as detected by ChIP-seq, actually encompass diverse DNA-binding modalities, driving defined, sequence-dependent transcriptional responses.


Subject(s)
DNA/metabolism , Proto-Oncogene Proteins c-myc/metabolism , Transcription Factors/metabolism , Base Sequence/genetics , Base Sequence/physiology , Binding Sites , DNA/genetics , Gene Expression Regulation/genetics , Gene Expression Regulation/physiology , Protein Stability , Proto-Oncogene Proteins c-myc/genetics , Transcription Factors/genetics
19.
Proc Natl Acad Sci U S A ; 119(4)2022 01 25.
Article in English | MEDLINE | ID: mdl-35042801

ABSTRACT

Life on Earth has evolved from initial simplicity to the astounding complexity we experience today. Bacteria and archaea have largely excelled in metabolic diversification, but eukaryotes additionally display abundant morphological innovation. How have these innovations come about and what constraints are there on the origins of novelty and the continuing maintenance of biodiversity on Earth? The history of life and the code for the working parts of cells and systems are written in the genome. The Earth BioGenome Project has proposed that the genomes of all extant, named eukaryotes-about 2 million species-should be sequenced to high quality to produce a digital library of life on Earth, beginning with strategic phylogenetic, ecological, and high-impact priorities. Here we discuss why we should sequence all eukaryotic species, not just a representative few scattered across the many branches of the tree of life. We suggest that many questions of evolutionary and ecological significance will only be addressable when whole-genome data representing divergences at all of the branchings in the tree of life or all species in natural ecosystems are available. We envisage that a genomic tree of life will foster understanding of the ongoing processes of speciation, adaptation, and organismal dependencies within entire ecosystems. These explorations will resolve long-standing problems in phylogenetics, evolution, ecology, conservation, agriculture, bioindustry, and medicine.


Subject(s)
Base Sequence/genetics , Eukaryota/genetics , Genomics/ethics , Animals , Biodiversity , Biological Evolution , Ecology , Ecosystem , Genome , Genomics/methods , Humans , Phylogeny
20.
Proc Natl Acad Sci U S A ; 119(4)2022 01 25.
Article in English | MEDLINE | ID: mdl-35042802

ABSTRACT

A global international initiative, such as the Earth BioGenome Project (EBP), requires both agreement and coordination on standards to ensure that the collective effort generates rapid progress toward its goals. To this end, the EBP initiated five technical standards committees comprising volunteer members from the global genomics scientific community: Sample Collection and Processing, Sequencing and Assembly, Annotation, Analysis, and IT and Informatics. The current versions of the resulting standards documents are available on the EBP website, with the recognition that opportunities, technologies, and challenges may improve or change in the future, requiring flexibility for the EBP to meet its goals. Here, we describe some highlights from the proposed standards, and areas where additional challenges will need to be met.


Subject(s)
Base Sequence/genetics , Eukaryota/genetics , Genomics/standards , Animals , Biodiversity , Genomics/methods , Humans , Reference Standards , Reference Values , Sequence Analysis, DNA/methods , Sequence Analysis, DNA/standards
SELECTION OF CITATIONS
SEARCH DETAIL