Search | Nursing VHL Search Portal

1.

BigWig and BigBed: enabling browsing of large distributed datasets.

Kent, W J; Zweig, A S; Barber, G; Hinrichs, A S; Karolchik, D.

Bioinformatics ; 26(17): 2204-7, 2010 Sep 01.

Article in English | MEDLINE | ID: mdl-20639541

ABSTRACT

SUMMARY: BigWig and BigBed files are compressed binary indexed files containing data at several resolutions that allow the high-performance display of next-generation sequencing experiment results in the UCSC Genome Browser. The visualization is implemented using a multi-layered software approach that takes advantage of specific capabilities of web-based protocols and Linux and UNIX operating systems files, R trees and various indexing and compression tricks. As a result, only the data needed to support the current browser view is transmitted rather than the entire file, enabling fast remote access to large distributed data sets. AVAILABILITY AND IMPLEMENTATION: Binaries for the BigWig and BigBed creation and parsing utilities may be downloaded at http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/. Source code for the creation and visualization software is freely available for non-commercial use at http://hgdownload.cse.ucsc.edu/admin/jksrc.zip, implemented in C and supported on Linux. The UCSC Genome Browser is available at http://genome.ucsc.edu.

Subject(s)

Data Mining , Genomics/methods , Software , Computational Biology/methods , Data Compression , Internet

2.

The UCSC Genome Browser Database: update 2009.

Kuhn, R M; Karolchik, D; Zweig, A S; Wang, T; Smith, K E; Rosenbloom, K R; Rhead, B; Raney, B J; Pohl, A; Pheasant, M; Meyer, L; Hsu, F; Hinrichs, A S; Harte, R A; Giardine, B; Fujita, P; Diekhans, M; Dreszer, T; Clawson, H; Barber, G P; Haussler, D; Kent, W J.

Nucleic Acids Res ; 37(Database issue): D755-61, 2009 Jan.

Article in English | MEDLINE | ID: mdl-18996895

ABSTRACT

The UCSC Genome Browser Database (GBD, http://genome.ucsc.edu) is a publicly available collection of genome assembly sequence data and integrated annotations for a large number of organisms, including extensive comparative-genomic resources. In the past year, 13 new genome assemblies have been added, including two important primate species, orangutan and marmoset, bringing the total to 46 assemblies for 24 different vertebrates and 39 assemblies for 22 different invertebrate animals. The GBD datasets may be viewed graphically with the UCSC Genome Browser, which uses a coordinate-based display system allowing users to juxtapose a wide variety of data. These data include all mRNAs from GenBank mapped to all organisms, RefSeq alignments, gene predictions, regulatory elements, gene expression data, repeats, SNPs and other variation data, as well as pairwise and multiple-genome alignments. A variety of other bioinformatics tools are also provided, including BLAT, the Table Browser, the Gene Sorter, the Proteome Browser, VisiGene and Genome Graphs.

Subject(s)

Databases, Nucleic Acid , Genomics , Animals , Chromosome Mapping , Computer Graphics , Gene Expression , Genetic Variation , Humans , RNA, Messenger/chemistry , Software , User-Computer Interface

3.

Comparative analyses of multi-species sequences from targeted genomic regions.

Thomas, J W; Touchman, J W; Blakesley, R W; Bouffard, G G; Beckstrom-Sternberg, S M; Margulies, E H; Blanchette, M; Siepel, A C; Thomas, P J; McDowell, J C; Maskeri, B; Hansen, N F; Schwartz, M S; Weber, R J; Kent, W J; Karolchik, D; Bruen, T C; Bevan, R; Cutler, D J; Schwartz, S; Elnitski, L; Idol, J R; Prasad, A B; Lee-Lin, S-Q; Maduro, V V B; Summers, T J; Portnoy, M E; Dietrich, N L; Akhter, N; Ayele, K; Benjamin, B; Cariaga, K; Brinkley, C P; Brooks, S Y; Granite, S; Guan, X; Gupta, J; Haghighi, P; Ho, S-L; Huang, M C; Karlins, E; Laric, P L; Legaspi, R; Lim, M J; Maduro, Q L; Masiello, C A; Mastrian, S D; McCloskey, J C; Pearson, R; Stantripop, S.

Nature ; 424(6950): 788-93, 2003 Aug 14.

Article in English | MEDLINE | ID: mdl-12917688

ABSTRACT

The systematic comparison of genomic sequences from different organisms represents a central focus of contemporary genome analysis. Comparative analyses of vertebrate sequences can identify coding and conserved non-coding regions, including regulatory elements, and provide insight into the forces that have rendered modern-day genomes. As a complement to whole-genome sequencing efforts, we are sequencing and comparing targeted genomic regions in multiple, evolutionarily diverse vertebrates. Here we report the generation and analysis of over 12 megabases (Mb) of sequence from 12 species, all derived from the genomic region orthologous to a segment of about 1.8 Mb on human chromosome 7 containing ten genes, including the gene mutated in cystic fibrosis. These sequences show conservation reflecting both functional constraints and the neutral mutational events that shaped this genomic region. In particular, we identify substantial numbers of conserved non-coding segments beyond those previously identified experimentally, most of which are not detectable by pair-wise sequence comparisons alone. Analysis of transposable element insertions highlights the variation in genome dynamics among these species and confirms the placement of rodents as a sister group to the primates.

Subject(s)

Conserved Sequence/genetics , Evolution, Molecular , Genomics , Vertebrates/genetics , Animals , Chromosomes, Human, Pair 7/genetics , Cystic Fibrosis Transmembrane Conductance Regulator/genetics , DNA Transposable Elements/genetics , Genome , Humans , Mammals/genetics , Mutagenesis/genetics , Phylogeny , Sequence Alignment , Sequence Homology, Nucleic Acid , Species Specificity

4.

The UCSC Genome Browser Database: 2008 update.

Karolchik, D; Kuhn, R M; Baertsch, R; Barber, G P; Clawson, H; Diekhans, M; Giardine, B; Harte, R A; Hinrichs, A S; Hsu, F; Kober, K M; Miller, W; Pedersen, J S; Pohl, A; Raney, B J; Rhead, B; Rosenbloom, K R; Smith, K E; Stanke, M; Thakkapallayil, A; Trumbower, H; Wang, T; Zweig, A S; Haussler, D; Kent, W J.

Nucleic Acids Res ; 36(Database issue): D773-9, 2008 Jan.

Article in English | MEDLINE | ID: mdl-18086701

ABSTRACT

The University of California, Santa Cruz, Genome Browser Database (GBD) provides integrated sequence and annotation data for a large collection of vertebrate and model organism genomes. Seventeen new assemblies have been added to the database in the past year, for a total coverage of 19 vertebrate and 21 invertebrate species as of September 2007. For each assembly, the GBD contains a collection of annotation data aligned to the genomic sequence. Highlights of this year's additions include a 28-species human-based vertebrate conservation annotation, an enhanced UCSC Genes set, and more human variation, MGC, and ENCODE data. The database is optimized for fast interactive performance with a set of web-based tools that may be used to view, manipulate, filter and download the annotation data. New toolset features include the Genome Graphs tool for displaying genome-wide data sets, session saving and sharing, better custom track management, expanded Genome Browser configuration options and a Genome Browser wiki site. The downloadable GBD data, the companion Genome Browser toolset and links to documentation and related information can be found at: http://genome.ucsc.edu/.

Subject(s)

Databases, Nucleic Acid , Genomics , Animals , Computer Graphics , Genetic Variation , Humans , Internet , Invertebrates/genetics , Sequence Alignment , User-Computer Interface , Vertebrates/genetics

5.

The UCSC genome browser database: update 2007.

Kuhn, R M; Karolchik, D; Zweig, A S; Trumbower, H; Thomas, D J; Thakkapallayil, A; Sugnet, C W; Stanke, M; Smith, K E; Siepel, A; Rosenbloom, K R; Rhead, B; Raney, B J; Pohl, A; Pedersen, J S; Hsu, F; Hinrichs, A S; Harte, R A; Diekhans, M; Clawson, H; Bejerano, G; Barber, G P; Baertsch, R; Haussler, D; Kent, W J.

Nucleic Acids Res ; 35(Database issue): D668-73, 2007 Jan.

Article in English | MEDLINE | ID: mdl-17142222

ABSTRACT

The University of California, Santa Cruz Genome Browser Database contains, as of September 2006, sequence and annotation data for the genomes of 13 vertebrate and 19 invertebrate species. The Genome Browser displays a wide variety of annotations at all scales from the single nucleotide level up to a full chromosome and includes assembly data, genes and gene predictions, mRNA and EST alignments, and comparative genomics, regulation, expression and variation data. The database is optimized for fast interactive performance with web tools that provide powerful visualization and querying capabilities for mining the data. In the past year, 22 new assemblies and several new sets of human variation annotation have been released. New features include VisiGene, a fully integrated in situ hybridization image browser; phyloGif, for drawing evolutionary tree diagrams; a redesigned Custom Track feature; an expanded SNP annotation track; and many new display options. The Genome Browser, other tools, downloadable data files and links to documentation and other information can be found at http://genome.ucsc.edu/.

Subject(s)

Databases, Genetic , Genomics , Animals , Base Sequence , Cattle , Computer Graphics , Conserved Sequence , Genome, Human , Humans , Internet , Linkage Disequilibrium , Mice , Open Reading Frames , Polymorphism, Single Nucleotide , Rats , Regulatory Sequences, Nucleic Acid , User-Computer Interface

6.

The UCSC Genome Browser Database: update 2006.

Hinrichs, A S; Karolchik, D; Baertsch, R; Barber, G P; Bejerano, G; Clawson, H; Diekhans, M; Furey, T S; Harte, R A; Hsu, F; Hillman-Jackson, J; Kuhn, R M; Pedersen, J S; Pohl, A; Raney, B J; Rosenbloom, K R; Siepel, A; Smith, K E; Sugnet, C W; Sultan-Qurraie, A; Thomas, D J; Trumbower, H; Weber, R J; Weirauch, M; Zweig, A S; Haussler, D; Kent, W J.

Nucleic Acids Res ; 34(Database issue): D590-8, 2006 Jan 01.

Article in English | MEDLINE | ID: mdl-16381938

ABSTRACT

The University of California Santa Cruz Genome Browser Database (GBD) contains sequence and annotation data for the genomes of about a dozen vertebrate species and several major model organisms. Genome annotations typically include assembly data, sequence composition, genes and gene predictions, mRNA and expressed sequence tag evidence, comparative genomics, regulation, expression and variation data. The database is optimized to support fast interactive performance with web tools that provide powerful visualization and querying capabilities for mining the data. The Genome Browser displays a wide variety of annotations at all scales from single nucleotide level up to a full chromosome. The Table Browser provides direct access to the database tables and sequence data, enabling complex queries on genome-wide datasets. The Proteome Browser graphically displays protein properties. The Gene Sorter allows filtering and comparison of genes by several metrics including expression data and several gene properties. BLAT and In Silico PCR search for sequences in entire genomes in seconds. These tools are highly integrated and provide many hyperlinks to other databases and websites. The GBD, browsing tools, downloadable data files and links to documentation and other information can be found at http://genome.ucsc.edu/.

Subject(s)

Databases, Genetic , Genomics , Amino Acid Sequence , Animals , California , Computer Graphics , Dogs , Gene Expression , Genes , Humans , Internet , Mice , Polymorphism, Single Nucleotide , Proteins/chemistry , Proteins/genetics , Proteins/metabolism , Proteomics , Rats , Sequence Alignment , Software , User-Computer Interface

7.

The UCSC Genome Browser Database.

Karolchik, D; Baertsch, R; Diekhans, M; Furey, T S; Hinrichs, A; Lu, Y T; Roskin, K M; Schwartz, M; Sugnet, C W; Thomas, D J; Weber, R J; Haussler, D; Kent, W J.

Nucleic Acids Res ; 31(1): 51-4, 2003 Jan 01.

Article in English | MEDLINE | ID: mdl-12519945

ABSTRACT

The University of California Santa Cruz (UCSC) Genome Browser Database is an up to date source for genome sequence data integrated with a large collection of related annotations. The database is optimized to support fast interactive performance with the web-based UCSC Genome Browser, a tool built on top of the database for rapid visualization and querying of the data at many levels. The annotations for a given genome are displayed in the browser as a series of tracks aligned with the genomic sequence. Sequence data and annotations may also be viewed in a text-based tabular format or downloaded as tab-delimited flat files. The Genome Browser Database, browsing tools and downloadable data files can all be found on the UCSC Genome Bioinformatics website (http://genome.ucsc.edu), which also contains links to documentation and related technical information.

Subject(s)

Databases, Genetic , Genome, Human , Genomics , Animals , California , Database Management Systems , Humans , Information Storage and Retrieval , Mice

8.

Exploring relationships and mining data with the UCSC Gene Sorter.

Kent, W J; Hsu, Fan; Karolchik, Donna; Kuhn, Robert M; Clawson, Hiram; Trumbower, Heather; Haussler, David.

Genome Res ; 15(5): 737-41, 2005 May.

Article in English | MEDLINE | ID: mdl-15867434

ABSTRACT

In parallel with the human genome sequencing and assembly effort, many tools have been developed to examine the structure and function of the human gene set. The University of California Santa Cruz (UCSC) Gene Sorter has been created as a gene-based counterpart to the chromosome-oriented UCSC Genome Browser to facilitate the study of gene function and evolution. This simple, but powerful tool provides a graphical display of related genes that can be sorted and filtered based on a variety of criteria. Genes may be ordered based on such characteristics as expression profiles, proximity in genome, shared Gene Ontology (GO) terms, and protein similarity. The display can be restricted to a gene set meeting a specific set of constraints by filtering on expression levels, gene name or ID, chromosomal position, and so on. The default set of information for each gene entry-gene name, selected expression data, a BLASTP E-value, genomic position, and a description-can be configured to include many other types of data, including expanded expression data, related accession numbers and IDs, orthologs in other species, GO terms, and much more. The Gene Sorter, a CGI-based Web application written in C with a MySQL database, is tightly integrated with the other applications in the UCSC Genome Browser suite. Available on a selected subset of the genome assemblies found in the Genome Browser, it further enhances the usefulness of the UCSC tool set in interactive genomic exploration and analysis.

Subject(s)

Computational Biology/methods , Databases, Genetic , Genome, Human , Genomics/methods , Software , Database Management Systems , Humans

9.

Conservation, regulation, synteny, and introns in a large-scale C. briggsae-C. elegans genomic alignment.

Kent, W J; Zahler, A M.

Genome Res ; 10(8): 1115-25, 2000 Aug.

Article in English | MEDLINE | ID: mdl-10958630

ABSTRACT

A new algorithm, WABA, was developed for doing large-scale alignments between genomic DNA of different species. WABA was used to align 8 million bases of Caenorhabditis briggsae genomic DNA against the entire 97-million-base Caenorhabditis elegans genome. The alignment, including C. briggsae homologs of 154 genetically characterized C. elegans genes and many times this number of largely uncharacterized ORFs, can be browsed and searched on the Web (http://www.cse.ucsc.edu/ approximately kent/intronerator). The alignment confirms that patterns of conservation can be useful in identifying regulatory regions and rarely expressed coding regions. Conserved regulatory elements can be identified inside coding exons by examining the level of divergence at the wobble position of codons. The alignment reveals a bimodal size distribution of syntenic regions. Over 250 introns are present in one species but not the other. The 3' and 5' intron splice sites have more similarity to each other in introns unique to one species than in C. elegans introns as a whole, suggesting a possible mechanism for intron removal.

Subject(s)

Caenorhabditis elegans/genetics , Conserved Sequence/genetics , Gene Expression Regulation , Genome , Introns , Sequence Alignment/methods , Algorithms , Alternative Splicing/genetics , Animals , Chromosome Mapping/methods , Exons , Internet , Molecular Sequence Data , Promoter Regions, Genetic , RNA Splicing , Species Specificity

10.

The intronerator: exploring introns and alternative splicing in Caenorhabditis elegans.

Kent, W J; Zahler, A M.

Nucleic Acids Res ; 28(1): 91-3, 2000 Jan 01.

Article in English | MEDLINE | ID: mdl-10592190

ABSTRACT

The Intronerator (http://www.cse.ucsc.edu/ approximately kent/intronerator/ ) is a set of web-based tools for exploring RNA splicing and gene structure in Caenorhabditis elegans. It includes a display of cDNA alignments with the genomic sequence, a catalog of alternatively spliced genes and a database of introns. The cDNA alignments include >100 000 ESTs and almost 1000 full-length cDNAs. ESTs from embryos and mixed stage animals as well as full-length cDNAs can be compared in the alignment display with each other and with predicted genes. The alt-splicing catalog includes 844 open reading frames for which there is evidence of alternative splicing of pre-mRNA. The intron database includes 28 478 introns, and can be searched for patterns near the splice junctions.

Subject(s)

Alternative Splicing , Caenorhabditis elegans/genetics , Internet , Introns , Animals , Base Sequence , DNA Primers , Database Management Systems , Databases, Factual , Sequence Alignment

11.

Assembly of the working draft of the human genome with GigAssembler.

Kent, W J; Haussler, D.

Genome Res ; 11(9): 1541-8, 2001 Sep.

Article in English | MEDLINE | ID: mdl-11544197

ABSTRACT

The data for the public working draft of the human genome contains roughly 400,000 initial sequence contigs in approximately 30,000 large insert clones. Many of these initial sequence contigs overlap. A program, GigAssembler, was built to merge them and to order and orient the resulting larger sequence contigs based on mRNA, paired plasmid ends, EST, BAC end pairs, and other information. This program produced the first publicly available assembly of the human genome, a working draft containing roughly 2.7 billion base pairs and covering an estimated 88% of the genome that has been used for several recent studies of the genome. Here we describe the algorithm used by GigAssembler.

Subject(s)

Algorithms , Genome, Human , Human Genome Project , Software , Chromosomes, Artificial, Bacterial/genetics , Computational Biology/methods , Contig Mapping/methods , Expressed Sequence Tags , Humans , RNA, Messenger/genetics , Repetitive Sequences, Nucleic Acid , Sequence Alignment/methods

12.

Transcriptome and genome conservation of alternative splicing events in humans and mice.

Sugnet, C W; Kent, W J; Ares, M; Haussler, D.

Pac Symp Biocomput ; : 66-77, 2004.

Article in English | MEDLINE | ID: mdl-14992493

ABSTRACT

Combining mRNA and EST data in splicing graphs with whole genome alignments, we discover alternative splicing events that are conserved in both human and mouse transcriptomes. 1,964 of 19,156 (10%) loci examined contain one or more such alternative splicing events, with 2,698 total events. These events represent a lower bound on the amount of alternative splicing in the human genome. Also, as these alternative splicing events are conserved between the human and mouse transcriptomes they should be enriched for functionally significant alternative splicing events, free from much of the noise found in the EST libraries. Further classification of these alternative splicing events reveals that 1,037 (38.4%) are due to exon skipping, 497 (18.4%) are due to alternative 3' splice sites, 214 (7.9%) are due to alternative 5' splice sites, 75 (2.8%) are due to intron retention and the other 875 (32.4%) are due to other, more complicated, alternative splicing events. In addition, genomic sequences nearby these alternative splicing events display increased sequence conservation. Both the alternatively spliced exons and the proximal intron show increased levels of genomic conservation relative to constitutively spliced exons. For exon skipping events both intron regions flanking the exon are conserved while for alternative 5' and 3' splicing events the conservation is greater near the alternative splice site.

Subject(s)

Alternative Splicing , Computational Biology , Algorithms , Animals , Conserved Sequence , Databases, Nucleic Acid , Expressed Sequence Tags , Genome , Genome, Human , Humans , Mice , RNA, Messenger/genetics , Sequence Alignment/statistics & numerical data , Species Specificity

13.

The share of human genomic DNA under selection estimated from human-mouse genomic alignments.

Chiaromonte, F; Weber, R J; Roskin, K M; Diekhans, M; Kent, W J; Haussler, D.

Cold Spring Harb Symp Quant Biol ; 68: 245-54, 2003.

Article in English | MEDLINE | ID: mdl-15338624

Subject(s)

DNA/genetics , Genome, Human , Selection, Genetic , Animals , Conserved Sequence , Evolution, Molecular , Humans , Mice , Repetitive Sequences, Nucleic Acid , Sequence Alignment/statistics & numerical data , Species Specificity

14.

Global predictions and tests of erythroid regulatory regions.

Hardison, R C; Chiaromonte, F; Kolbe, D; Wang, H; Petrykowska, H; Elnitski, L; Yang, S; Giardine, B; Zhang, Y; Riemer, C; Schwartz, S; Haussler, D; Roskin, K M; Weber, R J; Diekhans, M; Kent, W J; Weiss, M J; Welch, J; Miller, W.

Cold Spring Harb Symp Quant Biol ; 68: 335-44, 2003.

Article in English | MEDLINE | ID: mdl-15338635

Subject(s)

Erythropoiesis/genetics , Rats , Animals , Conserved Sequence , DNA/genetics , Evolution, Molecular , Genes, Regulator , Genome , Genomics/methods , Humans , Mice , Selection, Genetic , Sequence Alignment

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL