Search | Nursing VHL Search Portal

1.

Identification of constrained sequence elements across 239 primate genomes.

Kuderna, Lukas F K; Ulirsch, Jacob C; Rashid, Sabrina; Ameen, Mohamed; Sundaram, Laksshman; Hickey, Glenn; Cox, Anthony J; Gao, Hong; Kumar, Arvind; Aguet, Francois; Christmas, Matthew J; Clawson, Hiram; Haeussler, Maximilian; Janiak, Mareike C; Kuhlwilm, Martin; Orkin, Joseph D; Bataillon, Thomas; Manu, Shivakumara; Valenzuela, Alejandro; Bergman, Juraj; Rouselle, Marjolaine; Silva, Felipe Ennes; Agueda, Lidia; Blanc, Julie; Gut, Marta; de Vries, Dorien; Goodhead, Ian; Harris, R Alan; Raveendran, Muthuswamy; Jensen, Axel; Chuma, Idriss S; Horvath, Julie E; Hvilsom, Christina; Juan, David; Frandsen, Peter; Schraiber, Joshua G; de Melo, Fabiano R; Bertuol, Fabrício; Byrne, Hazel; Sampaio, Iracilda; Farias, Izeni; Valsecchi, João; Messias, Malu; da Silva, Maria N F; Trivedi, Mihir; Rossi, Rogerio; Hrbek, Tomas; Andriaholinirina, Nicole; Rabarivola, Clément J; Zaramody, Alphonse.

Nature ; 625(7996): 735-742, 2024 Jan.

Article in English | MEDLINE | ID: mdl-38030727

ABSTRACT

Noncoding DNA is central to our understanding of human gene regulation and complex diseases1,2, and measuring the evolutionary sequence constraint can establish the functional relevance of putative regulatory elements in the human genome3-9. Identifying the genomic elements that have become constrained specifically in primates has been hampered by the faster evolution of noncoding DNA compared to protein-coding DNA10, the relatively short timescales separating primate species11, and the previously limited availability of whole-genome sequences12. Here we construct a whole-genome alignment of 239 species, representing nearly half of all extant species in the primate order. Using this resource, we identified human regulatory elements that are under selective constraint across primates and other mammals at a 5% false discovery rate. We detected 111,318 DNase I hypersensitivity sites and 267,410 transcription factor binding sites that are constrained specifically in primates but not across other placental mammals and validate their cis-regulatory effects on gene expression. These regulatory elements are enriched for human genetic variants that affect gene expression and complex traits and diseases. Our results highlight the important role of recent evolution in regulatory sequence elements differentiating primates, including humans, from other placental mammals.

Subject(s)

Conserved Sequence , Evolution, Molecular , Genome , Primates , Animals , Female , Humans , Pregnancy , Conserved Sequence/genetics , Deoxyribonuclease I/metabolism , DNA/genetics , DNA/metabolism , Genome/genetics , Mammals/classification , Mammals/genetics , Placenta , Primates/classification , Primates/genetics , Regulatory Sequences, Nucleic Acid/genetics , Reproducibility of Results , Transcription Factors/metabolism , Proteins/genetics , Gene Expression Regulation/genetics

2.

The UCSC Genome Browser database: 2024 update.

Raney, Brian J; Barber, Galt P; Benet-Pagès, Anna; Casper, Jonathan; Clawson, Hiram; Cline, Melissa S; Diekhans, Mark; Fischer, Clayton; Navarro Gonzalez, Jairo; Hickey, Glenn; Hinrichs, Angie S; Kuhn, Robert M; Lee, Brian T; Lee, Christopher M; Le Mercier, Phillipe; Miga, Karen H; Nassar, Luis R; Nejad, Parisa; Paten, Benedict; Perez, Gerardo; Schmelter, Daniel; Speir, Matthew L; Wick, Brittney D; Zweig, Ann S; Haussler, David; Kent, W James; Haeussler, Maximilian.

Nucleic Acids Res ; 52(D1): D1082-D1088, 2024 Jan 05.

Article in English | MEDLINE | ID: mdl-37953330

ABSTRACT

The UCSC Genome Browser (https://genome.ucsc.edu) is a web-based genomic visualization and analysis tool that serves data to over 7,000 distinct users per day worldwide. It provides annotation data on thousands of genome assemblies, ranging from human to SARS-CoV2. This year, we have introduced new data from the Human Pangenome Reference Consortium and on viral genomes including SARS-CoV2. We have added 1,200 new genomes to our GenArk genome system, increasing the overall diversity of our genomic representation. We have added support for nine new user-contributed track hubs to our public hub system. Additionally, we have released 29 new tracks on the human genome and 11 new tracks on the mouse genome. Collectively, these new features expand both the breadth and depth of the genomic knowledge that we share publicly with users worldwide.

Subject(s)

Databases, Genetic , Genomics , RNA, Viral , Animals , Humans , Mice , Genome, Human , Genome, Viral , Internet , Molecular Sequence Annotation , Software

3.

The UCSC Genome Browser database: 2023 update.

Nassar, Luis R; Barber, Galt P; Benet-Pagès, Anna; Casper, Jonathan; Clawson, Hiram; Diekhans, Mark; Fischer, Clay; Gonzalez, Jairo Navarro; Hinrichs, Angie S; Lee, Brian T; Lee, Christopher M; Muthuraman, Pranav; Nguy, Beagan; Pereira, Tiana; Nejad, Parisa; Perez, Gerardo; Raney, Brian J; Schmelter, Daniel; Speir, Matthew L; Wick, Brittney D; Zweig, Ann S; Haussler, David; Kuhn, Robert M; Haeussler, Maximilian; Kent, W James.

Nucleic Acids Res ; 51(D1): D1188-D1195, 2023 01 06.

Article in English | MEDLINE | ID: mdl-36420891

ABSTRACT

The UCSC Genome Browser (https://genome.ucsc.edu) is an omics data consolidator, graphical viewer, and general bioinformatics resource that continues to serve the community as it enters its 23rd year. This year has seen an emphasis in clinical data, with new tracks and an expanded Recommended Track Sets feature on hg38 as well as the addition of a single cell track group. SARS-CoV-2 continues to remain a focus, with regular annotation updates to the browser and continued curation of our phylogenetic sequence placing tool, hgPhyloPlace, whose tree has now reached over 12M sequences. Our GenArk resource has also grown, offering over 2500 hubs and a system for users to request any absent assemblies. We have expanded our bigBarChart display type and created new ways to visualize data via bigRmsk and dynseq display. Displaying custom annotations is now easier due to our chromAlias system which eliminates the requirement for renaming sequence names to the UCSC standard. Users involved in data generation may also be interested in our new tools and trackDb settings which facilitate the creation and display of their custom annotations.

Subject(s)

Databases, Genetic , Genomics , Humans , COVID-19/epidemiology , COVID-19/genetics , Genomics/methods , Internet , Phylogeny , SARS-CoV-2/genetics , Software , Web Browser

4.

The UCSC Genome Browser database: 2022 update.

Lee, Brian T; Barber, Galt P; Benet-Pagès, Anna; Casper, Jonathan; Clawson, Hiram; Diekhans, Mark; Fischer, Clay; Gonzalez, Jairo Navarro; Hinrichs, Angie S; Lee, Christopher M; Muthuraman, Pranav; Nassar, Luis R; Nguy, Beagan; Pereira, Tiana; Perez, Gerardo; Raney, Brian J; Rosenbloom, Kate R; Schmelter, Daniel; Speir, Matthew L; Wick, Brittney D; Zweig, Ann S; Haussler, David; Kuhn, Robert M; Haeussler, Maximilian; Kent, W James.

Nucleic Acids Res ; 50(D1): D1115-D1122, 2022 01 07.

Article in English | MEDLINE | ID: mdl-34718705

ABSTRACT

The UCSC Genome Browser, https://genome.ucsc.edu, is a graphical viewer for exploring genome annotations. The website provides integrated tools for visualizing, comparing, analyzing, and sharing both publicly available and user-generated genomic datasets. Data highlights this year include a collection of easily accessible public hub assemblies on new organisms, now featuring BLAT alignment and PCR capabilities, and new and updated clinical tracks (gnomAD, DECIPHER, CADD, REVEL). We introduced a new Track Sets feature and enhanced variant displays to aid in the interpretation of clinical data. We also added a tool to rapidly place new SARS-CoV-2 genomes in a global phylogenetic tree enabling researchers to view the context of emerging mutations in our SARS-CoV-2 Genome Browser. Other new software focuses on usability features, including more informative mouseover displays and new fonts.

Subject(s)

Databases, Genetic , Web Browser , Animals , Genome, Human , Humans , Phylogeny , Polymerase Chain Reaction , SARS-CoV-2/genetics , User-Computer Interface , Exome Sequencing

5.

The UCSC Genome Browser database: 2021 update.

Navarro Gonzalez, Jairo; Zweig, Ann S; Speir, Matthew L; Schmelter, Daniel; Rosenbloom, Kate R; Raney, Brian J; Powell, Conner C; Nassar, Luis R; Maulding, Nathan D; Lee, Christopher M; Lee, Brian T; Hinrichs, Angie S; Fyfe, Alastair C; Fernandes, Jason D; Diekhans, Mark; Clawson, Hiram; Casper, Jonathan; Benet-Pagès, Anna; Barber, Galt P; Haussler, David; Kuhn, Robert M; Haeussler, Maximilian; Kent, W James.

Nucleic Acids Res ; 49(D1): D1046-D1057, 2021 01 08.

Article in English | MEDLINE | ID: mdl-33221922

ABSTRACT

For more than two decades, the UCSC Genome Browser database (https://genome.ucsc.edu) has provided high-quality genomics data visualization and genome annotations to the research community. As the field of genomics grows and more data become available, new modes of display are required to accommodate new technologies. New features released this past year include a Hi-C heatmap display, a phased family trio display for VCF files, and various track visualization improvements. Striving to keep data up-to-date, new updates to gene annotations include GENCODE Genes, NCBI RefSeq Genes, and Ensembl Genes. New data tracks added for human and mouse genomes include the ENCODE registry of candidate cis-regulatory elements, promoters from the Eukaryotic Promoter Database, and NCBI RefSeq Select and Matched Annotation from NCBI and EMBL-EBI (MANE). Within weeks of learning about the outbreak of coronavirus, UCSC released a genome browser, with detailed annotation tracks, for the SARS-CoV-2 RNA reference assembly.

Subject(s)

COVID-19/prevention & control , Computational Biology/methods , Databases, Genetic , Genome/genetics , Genomics/methods , SARS-CoV-2/genetics , Animals , COVID-19/epidemiology , COVID-19/virology , Data Curation/methods , Epidemics , Humans , Internet , Mice , Molecular Sequence Annotation/methods , SARS-CoV-2/physiology , Software

6.

Variant interpretation: UCSC Genome Browser Recommended Track Sets.

Benet-Pagès, Anna; Rosenbloom, Kate R; Nassar, Luis R; Lee, Christopher M; Raney, Brian J; Clawson, Hiram; Schmelter, Daniel; Casper, Jonathan; Gonzalez, Jairo Navarro; Perez, Gerardo; Lee, Brian T; Zweig, Ann S; Kent, W James; Haeussler, Maximillian; Kuhn, Robert M.

Hum Mutat ; 43(8): 998-1011, 2022 08.

Article in English | MEDLINE | ID: mdl-35088925

ABSTRACT

The UCSC Genome Browser has been an important tool for genomics and clinical genetics since the sequence of the human genome was first released in 2000. As it has grown in scope to display more types of data it has also grown more complicated. The data, which are dispersed at many locations worldwide, are collected into one view on the Browser, where the graphical interface presents the data in one location. This supports the expertise of the researcher to interpret variants in the genome. Because the analysis of single nucleotide variants and copy number variants require interpretation of data at very different genomic scales, different data resources are required. We present here several Recommended Track Sets designed to facilitate the interpretation of variants in the clinic, offering quick access to datasets relevant to the appropriate scale.

Subject(s)

Databases, Genetic , Software , DNA Copy Number Variations , Genome, Human/genetics , Genomics , Humans , Internet

7.

UCSC Genome Browser enters 20th year.

Lee, Christopher M; Barber, Galt P; Casper, Jonathan; Clawson, Hiram; Diekhans, Mark; Gonzalez, Jairo Navarro; Hinrichs, Angie S; Lee, Brian T; Nassar, Luis R; Powell, Conner C; Raney, Brian J; Rosenbloom, Kate R; Schmelter, Daniel; Speir, Matthew L; Zweig, Ann S; Haussler, David; Haeussler, Maximilian; Kuhn, Robert M; Kent, W James.

Nucleic Acids Res ; 48(D1): D756-D761, 2020 01 08.

Article in English | MEDLINE | ID: mdl-31691824

ABSTRACT

The University of California Santa Cruz Genome Browser website (https://genome.ucsc.edu) enters its 20th year of providing high-quality genomics data visualization and genome annotations to the research community. In the past year, we have added a new option to our web BLAT tool that allows search against all genomes, a single-cell expression viewer (https://cells.ucsc.edu), a 'lollipop' plot display mode for high-density variation data, a RESTful API for data extraction and a custom-track backup feature. New datasets include Tabula Muris single-cell expression data, GeneHancer regulatory annotations, The Cancer Genome Atlas Pan-Cancer variants, Genome Reference Consortium Patch sequences, new ENCODE transcription factor binding site peaks and clusters, the Database of Genomic Variants Gold Standard Variants, Genomenon Mastermind variants and three new multi-species alignment tracks.

Subject(s)

Databases, Genetic , Genome, Human , Software , Genomics , Humans , Internet

8.

The UCSC Genome Browser database: 2019 update.

Haeussler, Maximilian; Zweig, Ann S; Tyner, Cath; Speir, Matthew L; Rosenbloom, Kate R; Raney, Brian J; Lee, Christopher M; Lee, Brian T; Hinrichs, Angie S; Gonzalez, Jairo Navarro; Gibson, David; Diekhans, Mark; Clawson, Hiram; Casper, Jonathan; Barber, Galt P; Haussler, David; Kuhn, Robert M; Kent, W James.

Nucleic Acids Res ; 47(D1): D853-D858, 2019 01 08.

Article in English | MEDLINE | ID: mdl-30407534

ABSTRACT

The UCSC Genome Browser (https://genome.ucsc.edu) is a graphical viewer for exploring genome annotations. For almost two decades, the Browser has provided visualization tools for genetics and molecular biology and continues to add new data and features. This year, we added a new tool that lets users interactively arrange existing graphing tracks into new groups. Other software additions include new formats for chromosome interactions, a ChIP-Seq peak display for track hubs and improved support for HGVS. On the annotation side, we have added gnomAD, TCGA expression, RefSeq Functional elements, GTEx eQTLs, CRISPR Guides, SNPpedia and created a 30-way primate alignment on the human genome. Nine assemblies now have RefSeq-mapped gene models.

Subject(s)

Databases, Genetic , Genome/genetics , Genomics , Software , Animals , Chromosome Mapping , Genome, Human/genetics , Humans , Molecular Sequence Annotation , Web Browser

9.

Speciation network in Laurasiatheria: retrophylogenomic signals.

Doronina, Liliya; Churakov, Gennady; Kuritzin, Andrej; Shi, Jingjing; Baertsch, Robert; Clawson, Hiram; Schmitz, Jürgen.

Genome Res ; 27(6): 997-1003, 2017 Jun.

Article in English | MEDLINE | ID: mdl-28298429

ABSTRACT

Rapid species radiation due to adaptive changes or occupation of new ecospaces challenges our understanding of ancestral speciation and the relationships of modern species. At the molecular level, rapid radiation with successive speciations over short time periods-too short to fix polymorphic alleles-is described as incomplete lineage sorting. Incomplete lineage sorting leads to random fixation of genetic markers and hence, random signals of relationships in phylogenetic reconstructions. The situation is further complicated when you consider that the genome is a mosaic of ancestral and modern incompletely sorted sequence blocks that leads to reconstructed affiliations to one or the other relative, depending on the fixation of their shared ancestral polymorphic alleles. The laurasiatherian relationships among Chiroptera, Perissodactyla, Cetartiodactyla, and Carnivora present a prime example for such enigmatic affiliations. We performed whole-genome screenings for phylogenetically diagnostic retrotransposon insertions involving the representatives bat (Chiroptera), horse (Perissodactyla), cow (Cetartiodactyla), and dog (Carnivora), and extracted among 162,000 preselected cases 102 virtually homoplasy-free, phylogenetically informative retroelements to draw a complete picture of the highly complex evolutionary relations within Laurasiatheria. All possible evolutionary scenarios received considerable retrotransposon support, leaving us with a network of affiliations. However, the Cetartiodactyla-Carnivora relationship as well as the basal position of Chiroptera and an ancestral laurasiatherian hybridization process did exhibit some very clear, distinct signals. The significant accordance of retrotransposon presence/absence patterns and flanking nucleotide changes suggest an important influence of mosaic genome structures in the reconstruction of species histories.

Subject(s)

Chiroptera/genetics , Genetic Speciation , Genome , Horses/genetics , Phylogeny , Retroelements , Animals , Cattle , Chiroptera/classification , Chromosome Mapping , Dogs , Genetic Markers , Horses/classification , Hybridization, Genetic , Mutagenesis, Insertional , Sequence Analysis, DNA , Software

10.

True Homoplasy of Retrotransposon Insertions in Primates.

Doronina, Liliya; Reising, Olga; Clawson, Hiram; Ray, David A; Schmitz, Jürgen.

Syst Biol ; 68(3): 482-493, 2019 05 01.

Article in English | MEDLINE | ID: mdl-30445649

ABSTRACT

How reliable are the presence/absence insertion patterns of the supposedly homoplasy-free retrotransposons, which were randomly inserted in the quasi infinite genomic space? To systematically examine this question in an up-to-date, multigenome comparison, we screened millions of primate transposed Alu SINE elements for incidences of homoplasious precise insertions and deletions. In genome-wide analyses, we identified and manually verified nine cases of precise parallel Alu insertions of apparently identical elements at orthologous positions in two ape lineages and twelve incidences of precise deletions of previously established SINEs. Correspondingly, eight precise parallel insertions and no exact deletions were detected in a comparison of lemuriform primate and human insertions spanning the range of primate diversity. With an overall frequency of homoplasious Alu insertions of only 0.01% (for human-chimpanzee-rhesus macaque) and 0.02-0.04% (for human-bushbaby-lemurs) and precise Alu deletions of 0.001-0.002% (for human-chimpanzee-rhesus macaque), real homoplasy is not considered to be a quantitatively relevant source of evolutionary noise. Thus, presence/absence patterns of Alu retrotransposons and, presumably, all LINE1-mobilized elements represent indeed the virtually homoplasy-free markers they are considered to be. Therefore, ancestral incomplete lineage sorting and hybridization remain the only serious sources of conflicting presence/absence patterns of retrotransposon insertions, and as such are detectable and quantifiable. [Homoplasy; precise deletions; precise parallel insertions; primates; retrotransposons.].

Subject(s)

Alu Elements/genetics , Mutagenesis, Insertional/genetics , Primates/genetics , Retroelements/genetics , Animals , Evolution, Molecular , Genetic Variation , Humans , Phylogeny , Primates/classification

11.

The UCSC Genome Browser database: 2018 update.

Casper, Jonathan; Zweig, Ann S; Villarreal, Chris; Tyner, Cath; Speir, Matthew L; Rosenbloom, Kate R; Raney, Brian J; Lee, Christopher M; Lee, Brian T; Karolchik, Donna; Hinrichs, Angie S; Haeussler, Maximilian; Guruvadoo, Luvina; Navarro Gonzalez, Jairo; Gibson, David; Fiddes, Ian T; Eisenhart, Christopher; Diekhans, Mark; Clawson, Hiram; Barber, Galt P; Armstrong, Joel; Haussler, David; Kuhn, Robert M; Kent, W James.

Nucleic Acids Res ; 46(D1): D762-D769, 2018 01 04.

Article in English | MEDLINE | ID: mdl-29106570

ABSTRACT

The UCSC Genome Browser (https://genome.ucsc.edu) provides a web interface for exploring annotated genome assemblies. The assemblies and annotation tracks are updated on an ongoing basis-12 assemblies and more than 28 tracks were added in the past year. Two recent additions are a display of CRISPR/Cas9 guide sequences and an interactive navigator for gene interactions. Other upgrades from the past year include a command-line version of the Variant Annotation Integrator, support for Human Genome Variation Society variant nomenclature input and output, and a revised highlighting tool that now supports multiple simultaneous regions and colors.

Subject(s)

Databases, Genetic , Genome , Web Browser , CRISPR-Cas Systems , Data Display , Gene Regulatory Networks , Genome, Human , Humans , Molecular Sequence Annotation , Terminology as Topic , User-Computer Interface

12.

The UCSC Genome Browser database: 2017 update.

Tyner, Cath; Barber, Galt P; Casper, Jonathan; Clawson, Hiram; Diekhans, Mark; Eisenhart, Christopher; Fischer, Clayton M; Gibson, David; Gonzalez, Jairo Navarro; Guruvadoo, Luvina; Haeussler, Maximilian; Heitner, Steve; Hinrichs, Angie S; Karolchik, Donna; Lee, Brian T; Lee, Christopher M; Nejad, Parisa; Raney, Brian J; Rosenbloom, Kate R; Speir, Matthew L; Villarreal, Chris; Vivian, John; Zweig, Ann S; Haussler, David; Kuhn, Robert M; Kent, W James.

Nucleic Acids Res ; 45(D1): D626-D634, 2017 01 04.

Article in English | MEDLINE | ID: mdl-27899642

ABSTRACT

Since its 2001 debut, the University of California, Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu/) team has provided continuous support to the international genomics and biomedical communities through a web-based, open source platform designed for the fast, scalable display of sequence alignments and annotations landscaped against a vast collection of quality reference genome assemblies. The browser's publicly accessible databases are the backbone of a rich, integrated bioinformatics tool suite that includes a graphical interface for data queries and downloads, alignment programs, command-line utilities and more. This year's highlights include newly designed home and gateway pages; a new 'multi-region' track display configuration for exon-only, gene-only and custom regions visualization; new genome browsers for three species (brown kiwi, crab-eating macaque and Malayan flying lemur); eight updated genome assemblies; extended support for new data types such as CRAM, RNA-seq expression data and long-range chromatin interaction pairs; and the unveiling of a new supported mirror site in Japan.

Subject(s)

Databases, Genetic , Search Engine , Web Browser , Animals , Computational Biology/methods , Genome , Genomics/methods , Humans , Molecular Sequence Annotation , Software

13.

The UCSC Genome Browser database: 2016 update.

Speir, Matthew L; Zweig, Ann S; Rosenbloom, Kate R; Raney, Brian J; Paten, Benedict; Nejad, Parisa; Lee, Brian T; Learned, Katrina; Karolchik, Donna; Hinrichs, Angie S; Heitner, Steve; Harte, Rachel A; Haeussler, Maximilian; Guruvadoo, Luvina; Fujita, Pauline A; Eisenhart, Christopher; Diekhans, Mark; Clawson, Hiram; Casper, Jonathan; Barber, Galt P; Haussler, David; Kuhn, Robert M; Kent, W James.

Nucleic Acids Res ; 44(D1): D717-25, 2016 Jan 04.

Article in English | MEDLINE | ID: mdl-26590259

ABSTRACT

For the past 15 years, the UCSC Genome Browser (http://genome.ucsc.edu/) has served the international research community by offering an integrated platform for viewing and analyzing information from a large database of genome assemblies and their associated annotations. The UCSC Genome Browser has been under continuous development since its inception with new data sets and software features added frequently. Some release highlights of this year include new and updated genome browsers for various assemblies, including bonobo and zebrafish; new gene annotation sets; improvements to track and assembly hub support; and a new interactive tool, the "Data Integrator", for intersecting data from multiple tracks. We have greatly expanded the data sets available on the most recent human assembly, hg38/GRCh38, to include updated gene prediction sets from GENCODE, more phenotype- and disease-associated variants from ClinVar and ClinGen, more genomic regulatory data, and a new multiple genome alignment.

Subject(s)

Databases, Genetic , Genomics , Animals , Disease/genetics , Genes , Genome , Humans , Mice , Molecular Sequence Annotation , Software

14.

Alignathon: a competitive assessment of whole-genome alignment methods.

Earl, Dent; Nguyen, Ngan; Hickey, Glenn; Harris, Robert S; Fitzgerald, Stephen; Beal, Kathryn; Seledtsov, Igor; Molodtsov, Vladimir; Raney, Brian J; Clawson, Hiram; Kim, Jaebum; Kemena, Carsten; Chang, Jia-Ming; Erb, Ionas; Poliakov, Alexander; Hou, Minmei; Herrero, Javier; Kent, William James; Solovyev, Victor; Darling, Aaron E; Ma, Jian; Notredame, Cedric; Brudno, Michael; Dubchak, Inna; Haussler, David; Paten, Benedict.

Genome Res ; 24(12): 2077-89, 2014 Dec.

Article in English | MEDLINE | ID: mdl-25273068

ABSTRACT

Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.

Subject(s)

Genome , Genomics/methods , Sequence Alignment/methods , Software , Animals , Computational Biology/methods , Computer Simulation , Datasets as Topic , Genome-Wide Association Study , Humans , Mammals/genetics , Phylogeny , Reproducibility of Results

15.

A high-resolution map of human evolutionary constraint using 29 mammals.

Lindblad-Toh, Kerstin; Garber, Manuel; Zuk, Or; Lin, Michael F; Parker, Brian J; Washietl, Stefan; Kheradpour, Pouya; Ernst, Jason; Jordan, Gregory; Mauceli, Evan; Ward, Lucas D; Lowe, Craig B; Holloway, Alisha K; Clamp, Michele; Gnerre, Sante; Alföldi, Jessica; Beal, Kathryn; Chang, Jean; Clawson, Hiram; Cuff, James; Di Palma, Federica; Fitzgerald, Stephen; Flicek, Paul; Guttman, Mitchell; Hubisz, Melissa J; Jaffe, David B; Jungreis, Irwin; Kent, W James; Kostka, Dennis; Lara, Marcia; Martins, Andre L; Massingham, Tim; Moltke, Ida; Raney, Brian J; Rasmussen, Matthew D; Robinson, Jim; Stark, Alexander; Vilella, Albert J; Wen, Jiayu; Xie, Xiaohui; Zody, Michael C; Baldwin, Jen; Bloom, Toby; Chin, Chee Whye; Heiman, Dave; Nicol, Robert; Nusbaum, Chad; Young, Sarah; Wilkinson, Jane; Worley, Kim C.

Nature ; 478(7370): 476-82, 2011 Oct 12.

Article in English | MEDLINE | ID: mdl-21993624

ABSTRACT

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering â¼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for â¼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.

Subject(s)

Evolution, Molecular , Genome, Human/genetics , Genome/genetics , Mammals/genetics , Animals , Disease , Exons/genetics , Genomics , Health , Humans , Molecular Sequence Annotation , Phylogeny , RNA/classification , RNA/genetics , Selection, Genetic/genetics , Sequence Alignment , Sequence Analysis, DNA

16.

The UCSC Genome Browser database: 2015 update.

Rosenbloom, Kate R; Armstrong, Joel; Barber, Galt P; Casper, Jonathan; Clawson, Hiram; Diekhans, Mark; Dreszer, Timothy R; Fujita, Pauline A; Guruvadoo, Luvina; Haeussler, Maximilian; Harte, Rachel A; Heitner, Steve; Hickey, Glenn; Hinrichs, Angie S; Hubley, Robert; Karolchik, Donna; Learned, Katrina; Lee, Brian T; Li, Chin H; Miga, Karen H; Nguyen, Ngan; Paten, Benedict; Raney, Brian J; Smit, Arian F A; Speir, Matthew L; Zweig, Ann S; Haussler, David; Kuhn, Robert M; Kent, W James.

Nucleic Acids Res ; 43(Database issue): D670-81, 2015 Jan.

Article in English | MEDLINE | ID: mdl-25428374

ABSTRACT

Launched in 2001 to showcase the draft human genome assembly, the UCSC Genome Browser database (http://genome.ucsc.edu) and associated tools continue to grow, providing a comprehensive resource of genome assemblies and annotations to scientists and students worldwide. Highlights of the past year include the release of a browser for the first new human genome reference assembly in 4 years in December 2013 (GRCh38, UCSC hg38), a watershed comparative genomics annotation (100-species multiple alignment and conservation) and a novel distribution mechanism for the browser (GBiB: Genome Browser in a Box). We created browsers for new species (Chinese hamster, elephant shark, minke whale), 'mined the web' for DNA sequences and expanded the browser display with stacked color graphs and region highlighting. As our user community increasingly adopts the UCSC track hub and assembly hub representations for sharing large-scale genomic annotation data sets and genome sequencing projects, our menu of public data hubs has tripled.

Subject(s)

Databases, Nucleic Acid , Genomics , Animals , Cricetinae , Dogs , Ebolavirus/genetics , Gene Expression , Genome , Internet , Mice , Molecular Sequence Annotation , Phenotype , Rats , Software

17.

Exploring Massive Incomplete Lineage Sorting in Arctoids (Laurasiatheria, Carnivora).

Doronina, Liliya; Churakov, Gennady; Shi, Jingjing; Brosius, Jürgen; Baertsch, Robert; Clawson, Hiram; Schmitz, Jürgen.

Mol Biol Evol ; 32(12): 3194-204, 2015 Dec.

Article in English | MEDLINE | ID: mdl-26337548

ABSTRACT

Freed from the competition of large raptors, Paleocene carnivores could expand their newly acquired habitats in search of prey. Such changing conditions might have led to their successful distribution and rapid radiation. Today, molecular evolutionary biologists are faced, however, with the consequences of such accelerated adaptive radiations, because they led to sequential speciation more rapidly than phylogenetic markers could be fixed. The repercussions being that current genealogies based on such markers are incongruent with species trees.Our aim was to explore such conflicting phylogenetic zones of evolution during the early arctoid radiation, especially to distinguish diagnostic from misleading phylogenetic signals, and to examine other carnivore-related speciation events. We applied a combination of high-throughput computational strategies to screen carnivore and related genomes in silico for randomly inserted retroposed elements that we then used to identify inconsistent phylogenetic patterns in the Arctoidea group, which is well known for phylogenetic discordances.Our combined retrophylogenomic and in vitro wet lab approach detected hundreds of carnivore-specific insertions, many of them confirming well-established splits or identifying and solving conflicting species distributions. Our systematic genome-wide screens for Long INterspersed Elements detected homoplasy-free markers with insertion-specific truncation points that we used to distinguish phylogenetically informative markers from conflicting signals. The results were independently confirmed by phylogenetic diagnostic Short INterspersed Elements. As statistical analysis ruled out ancestral hybridization, these doubly verified but still conflicting patterns were statistically determined to be genomic remnants from a time of ancestral incomplete lineage sorting that especially accompanied large parts of Arctoidea evolution.

Subject(s)

Carnivora/genetics , Animals , Biological Evolution , Evolution, Molecular , Genetic Speciation , Genomics , Hybridization, Genetic , Long Interspersed Nucleotide Elements , Molecular Sequence Data , Phylogeny , Short Interspersed Nucleotide Elements

18.

Navigating protected genomics data with UCSC Genome Browser in a Box.

Haeussler, Maximilian; Raney, Brian J; Hinrichs, Angie S; Clawson, Hiram; Zweig, Ann S; Karolchik, Donna; Casper, Jonathan; Speir, Matthew L; Haussler, David; Kent, W James.

Bioinformatics ; 31(5): 764-6, 2015 Mar 01.

Article in English | MEDLINE | ID: mdl-25348212

ABSTRACT

UNLABELLED: Genome Browser in a Box (GBiB) is a small virtual machine version of the popular University of California Santa Cruz (UCSC) Genome Browser that can be run on a researcher's own computer. Once GBiB is installed, a standard web browser is used to access the virtual server and add personal data files from the local hard disk. Annotation data are loaded on demand through the Internet from UCSC or can be downloaded to the local computer for faster access. AVAILABILITY AND IMPLEMENTATION: Software downloads and installation instructions are freely available for non-commercial use at https://genome-store.ucsc.edu/. GBiB requires the installation of open-source software VirtualBox, available for all major operating systems, and the UCSC Genome Browser, which is open source and free for non-commercial use. Commercial use of GBiB and the Genome Browser requires a license (http://genome.ucsc.edu/license/).

Subject(s)

Databases, Genetic , Genome, Human , Genomics/methods , Information Storage and Retrieval , Sequence Analysis, DNA/methods , Computational Biology , Humans , Internet , Software , Universities , User-Computer Interface

19.

The UCSC Genome Browser database: 2014 update.

Karolchik, Donna; Barber, Galt P; Casper, Jonathan; Clawson, Hiram; Cline, Melissa S; Diekhans, Mark; Dreszer, Timothy R; Fujita, Pauline A; Guruvadoo, Luvina; Haeussler, Maximilian; Harte, Rachel A; Heitner, Steve; Hinrichs, Angie S; Learned, Katrina; Lee, Brian T; Li, Chin H; Raney, Brian J; Rhead, Brooke; Rosenbloom, Kate R; Sloan, Cricket A; Speir, Matthew L; Zweig, Ann S; Haussler, David; Kuhn, Robert M; Kent, W James.

Nucleic Acids Res ; 42(Database issue): D764-70, 2014 Jan.

Article in English | MEDLINE | ID: mdl-24270787

ABSTRACT

The University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a large collection of organisms, primarily vertebrates, with an emphasis on the human and mouse genomes. The Browser's web-based tools provide an integrated environment for visualizing, comparing, analysing and sharing both publicly available and user-generated genomic data sets. As of September 2013, the database contained genomic sequence and a basic set of annotation 'tracks' for â¼90 organisms. Significant new annotations include a 60-species multiple alignment conservation track on the mouse, updated UCSC Genes tracks for human and mouse, and several new sets of variation and ENCODE data. New software tools include a Variant Annotation Integrator that returns predicted functional effects of a set of variants uploaded as a custom track, an extension to UCSC Genes that displays haplotype alleles for protein-coding genes and an expansion of data hubs that includes the capability to display remotely hosted user-provided assembly sequence in addition to annotation data. To improve European access, we have added a Genome Browser mirror (http://genome-euro.ucsc.edu) hosted at Bielefeld University in Germany.

Subject(s)

Databases, Genetic , Genome , Genomics , Alleles , Animals , Genome, Human , Humans , Internet , Mice , Molecular Sequence Annotation , Polymorphism, Single Nucleotide , Sequence Alignment , Software

20.

Comparative assembly hubs: web-accessible browsers for comparative genomics.

Nguyen, Ngan; Hickey, Glenn; Raney, Brian J; Armstrong, Joel; Clawson, Hiram; Zweig, Ann; Karolchik, Donna; Kent, William James; Haussler, David; Paten, Benedict.

Bioinformatics ; 30(23): 3293-301, 2014 Dec 01.

Article in English | MEDLINE | ID: mdl-25138168

ABSTRACT

MOTIVATION: Researchers now have access to large volumes of genome sequences for comparative analysis, some generated by the plethora of public sequencing projects and, increasingly, from individual efforts. It is not possible, or necessarily desirable, that the public genome browsers attempt to curate all these data. Instead, a wealth of powerful tools is emerging to empower users to create their own visualizations and browsers. RESULTS: We introduce a pipeline to easily generate collections of Web-accessible UCSC Genome Browsers interrelated by an alignment. It is intended to democratize our comparative genomic browser resources, serving the broad and growing community of evolutionary genomicists and facilitating easy public sharing via the Internet. Using the alignment, all annotations and the alignment itself can be efficiently viewed with reference to any genome in the collection, symmetrically. A new, intelligently scaled alignment display makes it simple to view all changes between the genomes at all levels of resolution, from substitutions to complex structural rearrangements, including duplications. To demonstrate this work, we create a comparative assembly hub containing 57 Escherichia coli and 9 Shigella genomes and show examples that highlight their unique biology. AVAILABILITY AND IMPLEMENTATION: The source code is available as open source at: https://github.com/glennhickey/progressiveCactus The E.coli and Shigella genome hub is now a public hub listed on the UCSC browser public hubs Web page.

Subject(s)

Genomics/methods , Web Browser , Algorithms , Escherichia coli/genetics , Genome, Bacterial , Internet , Molecular Sequence Annotation , Sequence Alignment , Shigella/genetics

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL