Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
PLoS Biol ; 22(6): e3002661, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38829909

RESUMO

Deuterostomes are a monophyletic group of animals that includes Hemichordata, Echinodermata (together called Ambulacraria), and Chordata. The diversity of deuterostome body plans has made it challenging to reconstruct their ancestral condition and to decipher the genetic changes that drove the diversification of deuterostome lineages. Here, we generate chromosome-level genome assemblies of 2 hemichordate species, Ptychodera flava and Schizocardium californicum, and use comparative genomic approaches to infer the chromosomal architecture of the deuterostome common ancestor and delineate lineage-specific chromosomal modifications. We show that hemichordate chromosomes (1N = 23) exhibit remarkable chromosome-scale macrosynteny when compared to other deuterostomes and can be derived from 24 deuterostome ancestral linkage groups (ALGs). These deuterostome ALGs in turn match previously inferred bilaterian ALGs, consistent with a relatively short transition from the last common bilaterian ancestor to the origin of deuterostomes. Based on this deuterostome ALG complement, we deduced chromosomal rearrangement events that occurred in different lineages. For example, a fusion-with-mixing event produced an Ambulacraria-specific ALG that subsequently split into 2 chromosomes in extant hemichordates, while this homologous ALG further fused with another chromosome in sea urchins. Orthologous genes distributed in these rearranged chromosomes are enriched for functions in various developmental processes. We found that the deeply conserved Hox clusters are located in highly rearranged chromosomes and that maintenance of the clusters are likely due to lower densities of transposable elements within the clusters. We also provide evidence that the deuterostome-specific pharyngeal gene cluster was established via the combination of 3 pre-assembled microsyntenic blocks. We suggest that since chromosomal rearrangement events and formation of new gene clusters may change the regulatory controls of developmental genes, these events may have contributed to the evolution of diverse body plans among deuterostomes.


Assuntos
Cromossomos , Evolução Molecular , Genoma , Filogenia , Animais , Cromossomos/genética , Genoma/genética , Sintenia , Ligação Genética , Cordados/genética
2.
Sci Data ; 7(1): 399, 2020 11 17.
Artigo em Inglês | MEDLINE | ID: mdl-33203859

RESUMO

The PacBio® HiFi sequencing method yields highly accurate long-read sequencing datasets with read lengths averaging 10-25 kb and accuracies greater than 99.5%. These accurate long reads can be used to improve results for complex applications such as single nucleotide and structural variant detection, genome assembly, assembly of difficult polyploid or highly repetitive genomes, and assembly of metagenomes. Currently, there is a need for sample data sets to both evaluate the benefits of these long accurate reads as well as for development of bioinformatic tools including genome assemblers, variant callers, and haplotyping algorithms. We present deep coverage HiFi datasets for five complex samples including the two inbred model genomes Mus musculus and Zea mays, as well as two complex genomes, octoploid Fragaria × ananassa and the diploid anuran Rana muscosa. Additionally, we release sequence data from a mock metagenome community. The datasets reported here can be used without restriction to develop new algorithms and explore complex genome structure and evolution. Data were generated on the PacBio Sequel II System.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Camundongos/genética , Zea mays/genética , Animais , Fragaria/genética , Genoma de Planta , Metagenoma , Ranidae/genética , Análise de Sequência de DNA
3.
Nat Biotechnol ; 37(10): 1155-1162, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31406327

RESUMO

The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the 'genome in a bottle' (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.


Assuntos
DNA Circular/genética , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Sequência de Bases , Variação Genética , Haplótipos , Humanos
4.
Nature ; 546(7659): 524-527, 2017 06 22.
Artigo em Inglês | MEDLINE | ID: mdl-28605751

RESUMO

Complete and accurate reference genomes and annotations provide fundamental tools for characterization of genetic and functional variation. These resources facilitate the determination of biological processes and support translation of research findings into improved and sustainable agricultural technologies. Many reference genomes for crop plants have been generated over the past decade, but these genomes are often fragmented and missing complex repeat regions. Here we report the assembly and annotation of a reference genome of maize, a genetic and agricultural model species, using single-molecule real-time sequencing and high-resolution optical mapping. Relative to the previous reference genome, our assembly features a 52-fold increase in contig length and notable improvements in the assembly of intergenic spaces and centromeres. Characterization of the repetitive portion of the genome revealed more than 130,000 intact transposable elements, allowing us to identify transposable element lineage expansions that are unique to maize. Gene annotations were updated using 111,000 full-length transcripts obtained by single-molecule real-time sequencing. In addition, comparative optical mapping of two other inbred maize lines revealed a prevalence of deletions in regions of low gene density and maize lineage-specific genes.


Assuntos
Genoma de Planta/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Imagem Individual de Molécula/métodos , Zea mays/genética , Centrômero/genética , Cromossomos de Plantas/genética , Mapeamento de Sequências Contíguas , Produtos Agrícolas/genética , Elementos de DNA Transponíveis/genética , DNA Intergênico/genética , Genes de Plantas/genética , Anotação de Sequência Molecular , Óptica e Fotônica , Filogenia , RNA Mensageiro/análise , RNA Mensageiro/genética , Padrões de Referência , Sorghum/genética
5.
Nat Methods ; 13(12): 1050-1054, 2016 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-27749838

RESUMO

While genome assembly projects have been successful in many haploid and inbred species, the assembly of noninbred or rearranged heterozygous genomes remains a major challenge. To address this challenge, we introduce the open-source FALCON and FALCON-Unzip algorithms (https://github.com/PacificBiosciences/FALCON/) to assemble long-read sequencing data into highly accurate, contiguous, and correctly phased diploid genomes. We generate new reference sequences for heterozygous samples including an F1 hybrid of Arabidopsis thaliana, the widely cultivated Vitis vinifera cv. Cabernet Sauvignon, and the coral fungus Clavicorona pyxidata, samples that have challenged short-read assembly approaches. The FALCON-based assemblies are substantially more contiguous and complete than alternate short- or long-read approaches. The phased diploid assembly enabled the study of haplotype structure and heterozygosities between homologous chromosomes, including the identification of widespread heterozygous structural variation within coding sequences.


Assuntos
Diploide , Genoma Fúngico/genética , Genoma de Planta/genética , Genômica/métodos , Polimorfismo de Nucleotídeo Único/genética , Algoritmos , Arabidopsis/genética , Basidiomycota/genética , DNA Fúngico/genética , DNA de Plantas/genética , Haplótipos , Heterozigoto , Humanos , Análise de Sequência de DNA , Vitis/genética
6.
Sci Data ; 1: 140045, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25977796

RESUMO

Single molecule, real-time (SMRT) sequencing from Pacific Biosciences is increasingly used in many areas of biological research including de novo genome assembly, structural-variant identification, haplotype phasing, mRNA isoform discovery, and base-modification analyses. High-quality, public datasets of SMRT sequences can spur development of analytic tools that can accommodate unique characteristics of SMRT data (long read lengths, lack of GC or amplification bias, and a random error profile leading to high consensus accuracy). In this paper, we describe eight high-coverage SMRT sequence datasets from five organisms (Escherichia coli, Saccharomyces cerevisiae, Neurospora crassa, Arabidopsis thaliana, and Drosophila melanogaster) that have been publicly released to the general scientific community (NCBI Sequence Read Archive ID SRP040522). Data were generated using two sequencing chemistries (P4C2 and P5C3) on the PacBio RS II instrument. The datasets reported here can be used without restriction by the research community to generate whole-genome assemblies, test new algorithms, investigate genome structure and evolution, and identify base modifications in some of the most widely-studied model systems in biological research.


Assuntos
Arabidopsis/genética , Drosophila melanogaster/genética , Escherichia coli/genética , Genoma Bacteriano , Genoma Fúngico , Genoma de Inseto , Genoma de Planta , Neurospora crassa/genética , Saccharomyces cerevisiae/genética , Análise de Sequência de DNA , Animais , Modelos Animais
7.
Nucleic Acids Res ; 38(15): e159, 2010 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-20571086

RESUMO

A novel template design for single-molecule sequencing is introduced, a structure we refer to as a SMRTbell template. This structure consists of a double-stranded portion, containing the insert of interest, and a single-stranded hairpin loop on either end, which provides a site for primer binding. Structurally, this format resembles a linear double-stranded molecule, and yet it is topologically circular. When placed into a single-molecule sequencing reaction, the SMRTbell template format enables a consensus sequence to be obtained from multiple passes on a single molecule. Furthermore, this consensus sequence is obtained from both the sense and antisense strands of the insert region. In this article, we present a universal method for constructing these templates, as well as an application of their use. We demonstrate the generation of high-quality consensus accuracy from single molecules, as well as the use of SMRTbell templates in the identification of rare sequence variants.


Assuntos
DNA/química , Oligonucleotídeos/química , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos , Sequência de Bases , Sequência Consenso , Staphylococcus aureus/genética , Moldes Genéticos
8.
Mol Pharmacol ; 67(4): 1360-8, 2005 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-15662043

RESUMO

Transcriptional profiling via microarrays holds great promise for toxicant classification and hazard prediction. Unfortunately, the use of different microarray platforms, protocols, and informatics often hinders the meaningful comparison of transcriptional profiling data across laboratories. One solution to this problem is to provide a low-cost and centralized resource that enables researchers to share toxicogenomic data that has been generated on a common platform. In an effort to create such a resource, we developed a standardized set of microarray reagents and reproducible protocols to simplify the analysis of liver gene expression in the mouse model. This resource, referred to as EDGE, was then used to generate a training set of 117 publicly accessible transcriptional profiles that can be accessed at http://edge.oncology.wisc.edu/. The Web-accessible database was also linked to an informatics suite that allows on-line clustering and K-means analyses as well as Boolean and sequence-based searches of the data. We propose that EDGE can serve as a prototype resource for the sharing of toxicogenomics information and be used to develop algorithms for efficient chemical classification and hazard prediction.


Assuntos
Bases de Dados Genéticas , Perfilação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Toxicogenética , Animais , Lipopolissacarídeos/farmacologia , Fígado/efeitos dos fármacos , Fígado/metabolismo , Camundongos , PPAR alfa/agonistas , Receptores de Hidrocarboneto Arílico/agonistas
10.
Bioinformatics ; 18(8): 1064-72, 2002 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-12176829

RESUMO

MOTIVATION: In many microarray experiments, relatively few intra- and inter-array replicate measurements are made due to significant cost limitations and sample availability. Compounding this problem is a lack of robust statistical methods for analyzing gene expression data with limited experimental replicates. As a result, the interpretation of the results of these experiments are difficult with little understanding of the probability of type I and type II errors. RESULTS: The variability in a series of replicate microarray measurements was modelled using a combination of parametric and non-parametric methods. A 3-dimensional surface was created for the conditional distribution of the variability given the mean signal intensity in both the Cy3 and Cy5 channels. The results were used as the basis for developing statistical methods for analyzing gene expression data with limited experimental replicates. AVAILABILITY: The statistical analysis scripts are available upon request.


Assuntos
DNA/genética , Expressão Gênica , Modelos Genéticos , Modelos Estatísticos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Simulação por Computador , Replicação do DNA/genética , Perfilação da Expressão Gênica/métodos , Perfilação da Expressão Gênica/estatística & dados numéricos , Regulação da Expressão Gênica , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
11.
Pharmacogenetics ; 12(2): 151-63, 2002 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-11875369

RESUMO

The Ahr locus encodes for the aryl hydrocarbon receptor (AHR), which plays an important toxicological and developmental role. Sequence variation in this gene was studied in 13 different mouse lines that included eight laboratory strains, two Mus musculus subspecies and three additional Mus species. The data presented represent the largest study of sequence variation across multiple mouse lines in a single gene (approximately equal to 15.9 kb/mouse line). Among all mice, the average frequency of all polymorphisms in the intronic regions was 20.3 variants/kb and the average exonic frequency was 14.1 variants/kb. For substitutions alone, the average frequencies in the intronic and exonic regions for all mice were 13.3 and 8.9 substitutions/kb, respectively. Between laboratory strains, the average intronic and exonic frequencies for all polymorphisms dropped to 5.4 and 2.9 variants/kb, respectively. There were 111 non-synonymous polymorphisms that resulted in 42 different amino acid changes, of which only 10 amino acid changes had been previously identified. Based on the nucleotide sequence, the phylogenetic history of the gene showed mice from the Ahr(b2) and Ahr(d) alleles in separate branches while mice from the Ahr(b1) and Ahr(b3) alleles exhibited a more complex history. Evolutionarily, the AHR protein as a whole appears to be under purifying selective pressure (K(a) : K(s) ratio = 0.237). Despite significant functional constraint in the basic helix-loop-helix and PAS domains, ligand binding is not constrained to the high-affinity allele, which supports further the role of the AHR in development and its importance beyond the adaptive response to environmental toxicants.


Assuntos
Variação Genética , Camundongos Endogâmicos/genética , Polimorfismo Genético , Receptores de Hidrocarboneto Arílico/genética , Sequência de Aminoácidos , Animais , Evolução Molecular , Ligação Genética , Camundongos , Dados de Sequência Molecular , Filogenia , Seleção Genética , Homologia de Sequência de Aminoácidos , Especificidade da Espécie
12.
Environ Health Perspect ; 110 Suppl 6: 919-23, 2002 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-12634120

RESUMO

Traditional models of toxicity have relied on dissecting chemical action into pharmacokinetic and pharmacodynamic processes. However, the integration of genomic information with toxicology will enhance our basic understanding of these processes and significantly change the way we apply toxicological information to risk assessment and regulatory problems. In this article, we summarize the application of gene expression information and polymorphism discovery to four areas in toxicology: toxicity testing, cross-species extrapolation, understanding mechanism of action, and susceptibility.


Assuntos
Regulação da Expressão Gênica , Genômica , Polimorfismo Genético , Toxicologia/tendências , Animais , Modelos Animais de Doenças , Poluentes Ambientais/efeitos adversos , Previsões , Humanos , Análise de Sequência com Séries de Oligonucleotídeos , Testes de Toxicidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA