RESUMO
The UCSC Genome Browser (https://genome.ucsc.edu) is a widely utilized web-based tool for visualization and analysis of genomic data, encompassing over 4000 assemblies from diverse organisms. Since its release in 2001, it has become an essential resource for genomics and bioinformatics research. Annotation data available on Genome Browser includes both internally created and maintained tracks as well as custom tracks and track hubs provided by the research community. This last year's updates include over 25 new annotation tracks such as the gnomAD 4.1 track on the human GRCh38/hg38 assembly, the addition of three new public hubs, and significant expansions to the Genome Archive[GenArk) system for interacting with the enormous variety of assemblies. We have also made improvements to our interface, including updates to the browser graphic page, such as a new popup dialog feature that now displays item details without requiring navigation away from the main Genome Browser page. GenePred tracks have been upgraded with right-click options for zooming and precise navigation, along with enhanced mouseOver functions. Additional improvements include a new grouping feature for track hubs and hub description info links. A new tutorial focusing on Clinical Genetics has also been added to the UCSC Genome Browser.
RESUMO
Vocal production learning ("vocal learning") is a convergently evolved trait in vertebrates. To identify brain genomic elements associated with mammalian vocal learning, we integrated genomic, anatomical, and neurophysiological data from the Egyptian fruit bat (Rousettus aegyptiacus) with analyses of the genomes of 215 placental mammals. First, we identified a set of proteins evolving more slowly in vocal learners. Then, we discovered a vocal motor cortical region in the Egyptian fruit bat, an emergent vocal learner, and leveraged that knowledge to identify active cis-regulatory elements in the motor cortex of vocal learners. Machine learning methods applied to motor cortex open chromatin revealed 50 enhancers robustly associated with vocal learning whose activity tended to be lower in vocal learners. Our research implicates convergent losses of motor cortex regulatory elements in mammalian vocal learning evolution.
Assuntos
Elementos Facilitadores Genéticos , Eutérios , Evolução Molecular , Regulação da Expressão Gênica , Córtex Motor , Neurônios Motores , Proteínas , Vocalização Animal , Animais , Quirópteros/genética , Quirópteros/fisiologia , Vocalização Animal/fisiologia , Córtex Motor/citologia , Córtex Motor/fisiologia , Cromatina/metabolismo , Neurônios Motores/fisiologia , Laringe/fisiologia , Epigênese Genética , Genoma , Proteínas/genética , Proteínas/metabolismo , Sequência de Aminoácidos , Eutérios/genética , Eutérios/fisiologia , Aprendizado de MáquinaRESUMO
The UCSC Genome Browser (https://genome.ucsc.edu) is a web-based genomic visualization and analysis tool that serves data to over 7,000 distinct users per day worldwide. It provides annotation data on thousands of genome assemblies, ranging from human to SARS-CoV2. This year, we have introduced new data from the Human Pangenome Reference Consortium and on viral genomes including SARS-CoV2. We have added 1,200 new genomes to our GenArk genome system, increasing the overall diversity of our genomic representation. We have added support for nine new user-contributed track hubs to our public hub system. Additionally, we have released 29 new tracks on the human genome and 11 new tracks on the mouse genome. Collectively, these new features expand both the breadth and depth of the genomic knowledge that we share publicly with users worldwide.
Assuntos
Bases de Dados Genéticas , Genômica , RNA Viral , Animais , Humanos , Camundongos , Genoma Humano , Genoma Viral , Internet , Anotação de Sequência Molecular , SoftwareRESUMO
Noncoding DNA is central to our understanding of human gene regulation and complex diseases1,2, and measuring the evolutionary sequence constraint can establish the functional relevance of putative regulatory elements in the human genome3-9. Identifying the genomic elements that have become constrained specifically in primates has been hampered by the faster evolution of noncoding DNA compared to protein-coding DNA10, the relatively short timescales separating primate species11, and the previously limited availability of whole-genome sequences12. Here we construct a whole-genome alignment of 239 species, representing nearly half of all extant species in the primate order. Using this resource, we identified human regulatory elements that are under selective constraint across primates and other mammals at a 5% false discovery rate. We detected 111,318 DNase I hypersensitivity sites and 267,410 transcription factor binding sites that are constrained specifically in primates but not across other placental mammals and validate their cis-regulatory effects on gene expression. These regulatory elements are enriched for human genetic variants that affect gene expression and complex traits and diseases. Our results highlight the important role of recent evolution in regulatory sequence elements differentiating primates, including humans, from other placental mammals.
Assuntos
Sequência Conservada , Evolução Molecular , Genoma , Primatas , Animais , Feminino , Humanos , Gravidez , Sequência Conservada/genética , Desoxirribonuclease I/metabolismo , DNA/genética , DNA/metabolismo , Genoma/genética , Mamíferos/classificação , Mamíferos/genética , Placenta , Primatas/classificação , Primatas/genética , Sequências Reguladoras de Ácido Nucleico/genética , Reprodutibilidade dos Testes , Fatores de Transcrição/metabolismo , Proteínas/genética , Regulação da Expressão Gênica/genéticaRESUMO
Interactive graphical genome browsers are essential tools in genomics, but they do not contain all the recent genome assemblies. We create Genome Archive (GenArk) collection of UCSC Genome Browsers from NCBI assemblies. Built on our established track hub system, this enables fast visualization of annotations. Assemblies come with gene models, repeat masks, BLAT, and in silico PCR. Users can add annotations via track hubs and custom tracks. We can bulk-import third-party resources, demonstrated with TOGA and Ensembl gene models for hundreds of assemblies.Three thousand two hundred sixty-nine GenArk assemblies are listed at https://hgdownload.soe.ucsc.edu/hubs/ and can be searched for on the Genome Browser gateway page.
Assuntos
Genoma , Software , Genômica , Arquivos , Técnicas de Amplificação de Ácido Nucleico , Bases de Dados Genéticas , InternetRESUMO
Interactive graphical genome browsers are essential tools for biologists working with DNA sequences. Although tens of thousands of new genome assemblies have become available over the last decade, accessibility is limited by the work involved in manually creating browsers and curating annotations. The results can push the limits of data storage infrastructure. To facilitate managing this increasing number of genome assemblies, we created the Genome Archive (GenArk) collection of UCSC Genome Browsers from assemblies hosted at NCBI(1). Built on our established assembly hub system, this collection enables fast, on-demand visualization of chromosome regions without requiring a database server. Available annotations include gene models, some mapped through whole-genome alignments, repeat masks, GC content, and others. We also modified our popular BLAT(2) aligner and in-silico PCR to support a large number of genomes using limited RAM. Users can upload additional annotations themselves via track hubs(3) and custom tracks. We can import more annotations in bulk from third-party resources, demonstrated here with TOGA(4) gene models. 2,430 GenArk assemblies are listed at https://hgdownload.soe.ucsc.edu/hubs/ and can be found by searching on the main UCSC gateway page. We will continue to add human high-quality assemblies and for other organisms, we are looking forward to receiving requests from the research community for ever more browsers and whole-genome alignments via http://genome.ucsc.edu/assemblyRequest.html.
RESUMO
The UCSC Genome Browser (https://genome.ucsc.edu) is an omics data consolidator, graphical viewer, and general bioinformatics resource that continues to serve the community as it enters its 23rd year. This year has seen an emphasis in clinical data, with new tracks and an expanded Recommended Track Sets feature on hg38 as well as the addition of a single cell track group. SARS-CoV-2 continues to remain a focus, with regular annotation updates to the browser and continued curation of our phylogenetic sequence placing tool, hgPhyloPlace, whose tree has now reached over 12M sequences. Our GenArk resource has also grown, offering over 2500 hubs and a system for users to request any absent assemblies. We have expanded our bigBarChart display type and created new ways to visualize data via bigRmsk and dynseq display. Displaying custom annotations is now easier due to our chromAlias system which eliminates the requirement for renaming sequence names to the UCSC standard. Users involved in data generation may also be interested in our new tools and trackDb settings which facilitate the creation and display of their custom annotations.
Assuntos
Bases de Dados Genéticas , Genômica , Humanos , COVID-19/epidemiologia , COVID-19/genética , Genômica/métodos , Internet , Filogenia , SARS-CoV-2/genética , Software , NavegadorRESUMO
Euarchontoglires, once described as Supraprimates, comprise primates, colugos, tree shrews, rodents, and lagomorphs in a clade that evolved about 90 million years ago (mya) from a shared ancestor with Laurasiatheria. The rapid speciation of groups within Euarchontoglires, and the subsequent inherent incomplete marker fixation in ancestral lineages, led to challenged attempts at phylogenetic reconstructions, particularly for the phylogenetic position of tree shrews. To resolve this conundrum, we sampled genome-wide presence/absence patterns of transposed elements (TEs) from all representatives of Euarchontoglires. This specific marker system has the advantage that phylogenetic diagnostic characters can be extracted in a nearly unbiased fashion genome-wide from reference genomes. Their insertions are virtually free of homoplasy. We simultaneously employed two computational tools, the genome presence/absence compiler (GPAC) and 2-n-way, to find a maximum of diagnostic insertions from more than 3 million TE positions. From 361 extracted diagnostic TEs, 132 provide significant support for the current resolution of Primatomorpha (Primates plus Dermoptera), 94 support the union of Euarchonta (Primates, Dermoptera, plus Scandentia), and 135 marker insertion patterns support a variety of alternative phylogenetic scenarios. Thus, whole genome-level analysis and a virtually homoplasy-free marker system offer an opportunity to finally resolve the notorious phylogenetic challenges that nature produces in rapidly diversifying groups.
Assuntos
Quirópteros , Primatas , Animais , Quirópteros/genética , Genoma/genética , Filogenia , Primatas/genética , Tupaiidae/genéticaRESUMO
The UCSC Genome Browser has been an important tool for genomics and clinical genetics since the sequence of the human genome was first released in 2000. As it has grown in scope to display more types of data it has also grown more complicated. The data, which are dispersed at many locations worldwide, are collected into one view on the Browser, where the graphical interface presents the data in one location. This supports the expertise of the researcher to interpret variants in the genome. Because the analysis of single nucleotide variants and copy number variants require interpretation of data at very different genomic scales, different data resources are required. We present here several Recommended Track Sets designed to facilitate the interpretation of variants in the clinic, offering quick access to datasets relevant to the appropriate scale.
Assuntos
Bases de Dados Genéticas , Software , Variações do Número de Cópias de DNA , Genoma Humano/genética , Genômica , Humanos , InternetRESUMO
The UCSC Genome Browser, https://genome.ucsc.edu, is a graphical viewer for exploring genome annotations. The website provides integrated tools for visualizing, comparing, analyzing, and sharing both publicly available and user-generated genomic datasets. Data highlights this year include a collection of easily accessible public hub assemblies on new organisms, now featuring BLAT alignment and PCR capabilities, and new and updated clinical tracks (gnomAD, DECIPHER, CADD, REVEL). We introduced a new Track Sets feature and enhanced variant displays to aid in the interpretation of clinical data. We also added a tool to rapidly place new SARS-CoV-2 genomes in a global phylogenetic tree enabling researchers to view the context of emerging mutations in our SARS-CoV-2 Genome Browser. Other new software focuses on usability features, including more informative mouseover displays and new fonts.
Assuntos
Bases de Dados Genéticas , Navegador , Animais , Genoma Humano , Humanos , Filogenia , Reação em Cadeia da Polimerase , SARS-CoV-2/genética , Interface Usuário-Computador , Sequenciamento do ExomaRESUMO
For more than two decades, the UCSC Genome Browser database (https://genome.ucsc.edu) has provided high-quality genomics data visualization and genome annotations to the research community. As the field of genomics grows and more data become available, new modes of display are required to accommodate new technologies. New features released this past year include a Hi-C heatmap display, a phased family trio display for VCF files, and various track visualization improvements. Striving to keep data up-to-date, new updates to gene annotations include GENCODE Genes, NCBI RefSeq Genes, and Ensembl Genes. New data tracks added for human and mouse genomes include the ENCODE registry of candidate cis-regulatory elements, promoters from the Eukaryotic Promoter Database, and NCBI RefSeq Select and Matched Annotation from NCBI and EMBL-EBI (MANE). Within weeks of learning about the outbreak of coronavirus, UCSC released a genome browser, with detailed annotation tracks, for the SARS-CoV-2 RNA reference assembly.
Assuntos
COVID-19/prevenção & controle , Biologia Computacional/métodos , Bases de Dados Genéticas , Genoma/genética , Genômica/métodos , SARS-CoV-2/genética , Animais , COVID-19/epidemiologia , COVID-19/virologia , Curadoria de Dados/métodos , Epidemias , Humanos , Internet , Camundongos , Anotação de Sequência Molecular/métodos , SARS-CoV-2/fisiologia , SoftwareRESUMO
BACKGROUND: Nearly half the human genome consists of repeat elements, most of which are retrotransposons, and many of which play important biological roles. However repeat elements pose several unique challenges to current bioinformatic analyses and visualization tools, as short repeat sequences can map to multiple genomic loci resulting in their misclassification and misinterpretation. In fact, sequence data mapping to repeat elements are often discarded from analysis pipelines. Therefore, there is a continued need for standardized tools and techniques to interpret genomic data of repeats. RESULTS: We present the UCSC Repeat Browser, which consists of a complete set of human repeat reference sequences derived from annotations made by the commonly used program RepeatMasker. The UCSC Repeat Browser also provides an alignment from the human genome to these references, uses it to map the standard human genome annotation tracks, and presents all of them as a comprehensive interface to facilitate work with repetitive elements. It also provides processed tracks of multiple publicly available datasets of particular interest to the repeat community, including ChIP-seq datasets for KRAB Zinc Finger Proteins (KZNFs) - a family of proteins known to bind and repress certain classes of repeats. We used the UCSC Repeat Browser in combination with these datasets, as well as RepeatMasker annotations in several non-human primates, to trace the independent trajectories of species-specific evolutionary battles between LINE 1 retroelements and their repressors. Furthermore, we document at https://repeatbrowser.ucsc.edu how researchers can map their own human genome annotations to these reference repeat sequences. CONCLUSIONS: The UCSC Repeat Browser allows easy and intuitive visualization of genomic data on consensus repeat elements, circumventing the problem of multi-mapping, in which sequencing reads of repeat elements map to multiple locations on the human genome. By developing a reference consensus, multiple datasets and annotation tracks can easily be overlaid to reveal complex evolutionary histories of repeats in a single interactive window. Specifically, we use this approach to retrace the history of several primate specific LINE-1 families across apes, and discover several species-specific routes of evolution that correlate with the emergence and binding of KZNFs.
RESUMO
The University of California Santa Cruz Genome Browser website (https://genome.ucsc.edu) enters its 20th year of providing high-quality genomics data visualization and genome annotations to the research community. In the past year, we have added a new option to our web BLAT tool that allows search against all genomes, a single-cell expression viewer (https://cells.ucsc.edu), a 'lollipop' plot display mode for high-density variation data, a RESTful API for data extraction and a custom-track backup feature. New datasets include Tabula Muris single-cell expression data, GeneHancer regulatory annotations, The Cancer Genome Atlas Pan-Cancer variants, Genome Reference Consortium Patch sequences, new ENCODE transcription factor binding site peaks and clusters, the Database of Genomic Variants Gold Standard Variants, Genomenon Mastermind variants and three new multi-species alignment tracks.
Assuntos
Bases de Dados Genéticas , Genoma Humano , Software , Genômica , Humanos , InternetRESUMO
How reliable are the presence/absence insertion patterns of the supposedly homoplasy-free retrotransposons, which were randomly inserted in the quasi infinite genomic space? To systematically examine this question in an up-to-date, multigenome comparison, we screened millions of primate transposed Alu SINE elements for incidences of homoplasious precise insertions and deletions. In genome-wide analyses, we identified and manually verified nine cases of precise parallel Alu insertions of apparently identical elements at orthologous positions in two ape lineages and twelve incidences of precise deletions of previously established SINEs. Correspondingly, eight precise parallel insertions and no exact deletions were detected in a comparison of lemuriform primate and human insertions spanning the range of primate diversity. With an overall frequency of homoplasious Alu insertions of only 0.01% (for human-chimpanzee-rhesus macaque) and 0.02-0.04% (for human-bushbaby-lemurs) and precise Alu deletions of 0.001-0.002% (for human-chimpanzee-rhesus macaque), real homoplasy is not considered to be a quantitatively relevant source of evolutionary noise. Thus, presence/absence patterns of Alu retrotransposons and, presumably, all LINE1-mobilized elements represent indeed the virtually homoplasy-free markers they are considered to be. Therefore, ancestral incomplete lineage sorting and hybridization remain the only serious sources of conflicting presence/absence patterns of retrotransposon insertions, and as such are detectable and quantifiable. [Homoplasy; precise deletions; precise parallel insertions; primates; retrotransposons.].
Assuntos
Elementos Alu/genética , Mutagênese Insercional/genética , Primatas/genética , Retroelementos/genética , Animais , Evolução Molecular , Variação Genética , Humanos , Filogenia , Primatas/classificaçãoRESUMO
The UCSC Genome Browser (https://genome.ucsc.edu) is a graphical viewer for exploring genome annotations. For almost two decades, the Browser has provided visualization tools for genetics and molecular biology and continues to add new data and features. This year, we added a new tool that lets users interactively arrange existing graphing tracks into new groups. Other software additions include new formats for chromosome interactions, a ChIP-Seq peak display for track hubs and improved support for HGVS. On the annotation side, we have added gnomAD, TCGA expression, RefSeq Functional elements, GTEx eQTLs, CRISPR Guides, SNPpedia and created a 30-way primate alignment on the human genome. Nine assemblies now have RefSeq-mapped gene models.
Assuntos
Bases de Dados Genéticas , Genoma/genética , Genômica , Software , Animais , Mapeamento Cromossômico , Genoma Humano/genética , Humanos , Anotação de Sequência Molecular , NavegadorRESUMO
The UCSC Genome Browser (https://genome.ucsc.edu) provides a web interface for exploring annotated genome assemblies. The assemblies and annotation tracks are updated on an ongoing basis-12 assemblies and more than 28 tracks were added in the past year. Two recent additions are a display of CRISPR/Cas9 guide sequences and an interactive navigator for gene interactions. Other upgrades from the past year include a command-line version of the Variant Annotation Integrator, support for Human Genome Variation Society variant nomenclature input and output, and a revised highlighting tool that now supports multiple simultaneous regions and colors.
Assuntos
Bases de Dados Genéticas , Genoma , Navegador , Sistemas CRISPR-Cas , Apresentação de Dados , Redes Reguladoras de Genes , Genoma Humano , Humanos , Anotação de Sequência Molecular , Terminologia como Assunto , Interface Usuário-ComputadorRESUMO
Rapid species radiation due to adaptive changes or occupation of new ecospaces challenges our understanding of ancestral speciation and the relationships of modern species. At the molecular level, rapid radiation with successive speciations over short time periods-too short to fix polymorphic alleles-is described as incomplete lineage sorting. Incomplete lineage sorting leads to random fixation of genetic markers and hence, random signals of relationships in phylogenetic reconstructions. The situation is further complicated when you consider that the genome is a mosaic of ancestral and modern incompletely sorted sequence blocks that leads to reconstructed affiliations to one or the other relative, depending on the fixation of their shared ancestral polymorphic alleles. The laurasiatherian relationships among Chiroptera, Perissodactyla, Cetartiodactyla, and Carnivora present a prime example for such enigmatic affiliations. We performed whole-genome screenings for phylogenetically diagnostic retrotransposon insertions involving the representatives bat (Chiroptera), horse (Perissodactyla), cow (Cetartiodactyla), and dog (Carnivora), and extracted among 162,000 preselected cases 102 virtually homoplasy-free, phylogenetically informative retroelements to draw a complete picture of the highly complex evolutionary relations within Laurasiatheria. All possible evolutionary scenarios received considerable retrotransposon support, leaving us with a network of affiliations. However, the Cetartiodactyla-Carnivora relationship as well as the basal position of Chiroptera and an ancestral laurasiatherian hybridization process did exhibit some very clear, distinct signals. The significant accordance of retrotransposon presence/absence patterns and flanking nucleotide changes suggest an important influence of mosaic genome structures in the reconstruction of species histories.
Assuntos
Quirópteros/genética , Especiação Genética , Genoma , Cavalos/genética , Filogenia , Retroelementos , Animais , Bovinos , Quirópteros/classificação , Mapeamento Cromossômico , Cães , Marcadores Genéticos , Cavalos/classificação , Hibridização Genética , Mutagênese Insercional , Análise de Sequência de DNA , SoftwareRESUMO
Since its 2001 debut, the University of California, Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu/) team has provided continuous support to the international genomics and biomedical communities through a web-based, open source platform designed for the fast, scalable display of sequence alignments and annotations landscaped against a vast collection of quality reference genome assemblies. The browser's publicly accessible databases are the backbone of a rich, integrated bioinformatics tool suite that includes a graphical interface for data queries and downloads, alignment programs, command-line utilities and more. This year's highlights include newly designed home and gateway pages; a new 'multi-region' track display configuration for exon-only, gene-only and custom regions visualization; new genome browsers for three species (brown kiwi, crab-eating macaque and Malayan flying lemur); eight updated genome assemblies; extended support for new data types such as CRAM, RNA-seq expression data and long-range chromatin interaction pairs; and the unveiling of a new supported mirror site in Japan.
Assuntos
Bases de Dados Genéticas , Ferramenta de Busca , Navegador , Animais , Biologia Computacional/métodos , Genoma , Genômica/métodos , Humanos , Anotação de Sequência Molecular , SoftwareRESUMO
Tarsiers are phylogenetically located between the most basal strepsirrhines and the most derived anthropoid primates. While they share morphological features with both groups, they also possess uncommon primate characteristics, rendering their evolutionary history somewhat obscure. To investigate the molecular basis of such attributes, we present here a new genome assembly of the Philippine tarsier (Tarsius syrichta), and provide extended analyses of the genome and detailed history of transposable element insertion events. We describe the silencing of Alu monomers on the lineage leading to anthropoids, and recognize an unexpected abundance of long terminal repeat-derived and LINE1-mobilized transposed elements (Tarsius interspersed elements; TINEs). For the first time in mammals, we identify a complete mitochondrial genome insertion within the nuclear genome, then reveal tarsier-specific, positive gene selection and posit population size changes over time. The genomic resources and analyses presented here will aid efforts to more fully understand the ancient characteristics of primate genomes.