Rechercher | Portail Régional BVS

1.

Chromosome-level genome assemblies of 2 hemichordates provide new insights into deuterostome origin and chromosome evolution.

Lin, Che-Yi; Marlétaz, Ferdinand; Pérez-Posada, Alberto; Martínez-García, Pedro Manuel; Schloissnig, Siegfried; Peluso, Paul; Conception, Greg T; Bump, Paul; Chen, Yi-Chih; Chou, Cindy; Lin, Ching-Yi; Fan, Tzu-Pei; Tsai, Chang-Tai; Gómez Skarmeta, José Luis; Tena, Juan J; Lowe, Christopher J; Rank, David R; Rokhsar, Daniel S; Yu, Jr-Kai; Su, Yi-Hsien.

PLoS Biol ; 22(6): e3002661, 2024 Jun.

Article de Anglais | MEDLINE | ID: mdl-38829909

RÉSUMÉ

Deuterostomes are a monophyletic group of animals that includes Hemichordata, Echinodermata (together called Ambulacraria), and Chordata. The diversity of deuterostome body plans has made it challenging to reconstruct their ancestral condition and to decipher the genetic changes that drove the diversification of deuterostome lineages. Here, we generate chromosome-level genome assemblies of 2 hemichordate species, Ptychodera flava and Schizocardium californicum, and use comparative genomic approaches to infer the chromosomal architecture of the deuterostome common ancestor and delineate lineage-specific chromosomal modifications. We show that hemichordate chromosomes (1N = 23) exhibit remarkable chromosome-scale macrosynteny when compared to other deuterostomes and can be derived from 24 deuterostome ancestral linkage groups (ALGs). These deuterostome ALGs in turn match previously inferred bilaterian ALGs, consistent with a relatively short transition from the last common bilaterian ancestor to the origin of deuterostomes. Based on this deuterostome ALG complement, we deduced chromosomal rearrangement events that occurred in different lineages. For example, a fusion-with-mixing event produced an Ambulacraria-specific ALG that subsequently split into 2 chromosomes in extant hemichordates, while this homologous ALG further fused with another chromosome in sea urchins. Orthologous genes distributed in these rearranged chromosomes are enriched for functions in various developmental processes. We found that the deeply conserved Hox clusters are located in highly rearranged chromosomes and that maintenance of the clusters are likely due to lower densities of transposable elements within the clusters. We also provide evidence that the deuterostome-specific pharyngeal gene cluster was established via the combination of 3 pre-assembled microsyntenic blocks. We suggest that since chromosomal rearrangement events and formation of new gene clusters may change the regulatory controls of developmental genes, these events may have contributed to the evolution of diverse body plans among deuterostomes.

Sujet(s)

Chromosomes , Évolution moléculaire , Génome , Phylogenèse , Animaux , Chromosomes/génétique , Génome/génétique , Synténie , Liaison génétique , Chordés/génétique

2.

Highly accurate long-read HiFi sequencing data for five complex genomes.

Hon, Ting; Mars, Kristin; Young, Greg; Tsai, Yu-Chih; Karalius, Joseph W; Landolin, Jane M; Maurer, Nicholas; Kudrna, David; Hardigan, Michael A; Steiner, Cynthia C; Knapp, Steven J; Ware, Doreen; Shapiro, Beth; Peluso, Paul; Rank, David R.

Sci Data ; 7(1): 399, 2020 11 17.

Article de Anglais | MEDLINE | ID: mdl-33203859

RÉSUMÉ

The PacBio® HiFi sequencing method yields highly accurate long-read sequencing datasets with read lengths averaging 10-25 kb and accuracies greater than 99.5%. These accurate long reads can be used to improve results for complex applications such as single nucleotide and structural variant detection, genome assembly, assembly of difficult polyploid or highly repetitive genomes, and assembly of metagenomes. Currently, there is a need for sample data sets to both evaluate the benefits of these long accurate reads as well as for development of bioinformatic tools including genome assemblers, variant callers, and haplotyping algorithms. We present deep coverage HiFi datasets for five complex samples including the two inbred model genomes Mus musculus and Zea mays, as well as two complex genomes, octoploid Fragaria × ananassa and the diploid anuran Rana muscosa. Additionally, we release sequence data from a mock metagenome community. The datasets reported here can be used without restriction to develop new algorithms and explore complex genome structure and evolution. Data were generated on the PacBio Sequel II System.

Sujet(s)

Séquençage nucléotidique à haut débit , Souris/génétique , Zea mays/génétique , Animaux , Fragaria/génétique , Génome végétal , Métagénome , Ranidae/génétique , Analyse de séquence d'ADN

3.

Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome.

Wenger, Aaron M; Peluso, Paul; Rowell, William J; Chang, Pi-Chuan; Hall, Richard J; Concepcion, Gregory T; Ebler, Jana; Fungtammasan, Arkarachai; Kolesnikov, Alexey; Olson, Nathan D; Töpfer, Armin; Alonge, Michael; Mahmoud, Medhat; Qian, Yufeng; Chin, Chen-Shan; Phillippy, Adam M; Schatz, Michael C; Myers, Gene; DePristo, Mark A; Ruan, Jue; Marschall, Tobias; Sedlazeck, Fritz J; Zook, Justin M; Li, Heng; Koren, Sergey; Carroll, Andrew; Rank, David R; Hunkapiller, Michael W.

Nat Biotechnol ; 37(10): 1155-1162, 2019 10.

Article de Anglais | MEDLINE | ID: mdl-31406327

RÉSUMÉ

The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the 'genome in a bottle' (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.

Sujet(s)

ADN circulaire/génétique , Génome humain , Séquençage nucléotidique à haut débit/méthodes , Analyse de séquence d'ADN/méthodes , Séquence nucléotidique , Variation génétique , Haplotypes , Humains

4.

Improved maize reference genome with single-molecule technologies.

Jiao, Yinping; Peluso, Paul; Shi, Jinghua; Liang, Tiffany; Stitzer, Michelle C; Wang, Bo; Campbell, Michael S; Stein, Joshua C; Wei, Xuehong; Chin, Chen-Shan; Guill, Katherine; Regulski, Michael; Kumari, Sunita; Olson, Andrew; Gent, Jonathan; Schneider, Kevin L; Wolfgruber, Thomas K; May, Michael R; Springer, Nathan M; Antoniou, Eric; McCombie, W Richard; Presting, Gernot G; McMullen, Michael; Ross-Ibarra, Jeffrey; Dawe, R Kelly; Hastie, Alex; Rank, David R; Ware, Doreen.

Nature ; 546(7659): 524-527, 2017 06 22.

Article de Anglais | MEDLINE | ID: mdl-28605751

RÉSUMÉ

Complete and accurate reference genomes and annotations provide fundamental tools for characterization of genetic and functional variation. These resources facilitate the determination of biological processes and support translation of research findings into improved and sustainable agricultural technologies. Many reference genomes for crop plants have been generated over the past decade, but these genomes are often fragmented and missing complex repeat regions. Here we report the assembly and annotation of a reference genome of maize, a genetic and agricultural model species, using single-molecule real-time sequencing and high-resolution optical mapping. Relative to the previous reference genome, our assembly features a 52-fold increase in contig length and notable improvements in the assembly of intergenic spaces and centromeres. Characterization of the repetitive portion of the genome revealed more than 130,000 intact transposable elements, allowing us to identify transposable element lineage expansions that are unique to maize. Gene annotations were updated using 111,000 full-length transcripts obtained by single-molecule real-time sequencing. In addition, comparative optical mapping of two other inbred maize lines revealed a prevalence of deletions in regions of low gene density and maize lineage-specific genes.

Sujet(s)

Génome végétal/génétique , Séquençage nucléotidique à haut débit/méthodes , Imagerie de molécules uniques/méthodes , Zea mays/génétique , Centromère/génétique , Chromosomes de plante/génétique , Cartographie de contigs , Produits agricoles/génétique , Éléments transposables d'ADN/génétique , ADN intergénique/génétique , Gènes de plante/génétique , Annotation de séquence moléculaire , Optique et photonique , Phylogenèse , ARN messager/analyse , ARN messager/génétique , Normes de référence , Sorghum/génétique

5.

Phased diploid genome assembly with single-molecule real-time sequencing.

Chin, Chen-Shan; Peluso, Paul; Sedlazeck, Fritz J; Nattestad, Maria; Concepcion, Gregory T; Clum, Alicia; Dunn, Christopher; O'Malley, Ronan; Figueroa-Balderas, Rosa; Morales-Cruz, Abraham; Cramer, Grant R; Delledonne, Massimo; Luo, Chongyuan; Ecker, Joseph R; Cantu, Dario; Rank, David R; Schatz, Michael C.

Nat Methods ; 13(12): 1050-1054, 2016 Dec.

Article de Anglais | MEDLINE | ID: mdl-27749838

RÉSUMÉ

While genome assembly projects have been successful in many haploid and inbred species, the assembly of noninbred or rearranged heterozygous genomes remains a major challenge. To address this challenge, we introduce the open-source FALCON and FALCON-Unzip algorithms (https://github.com/PacificBiosciences/FALCON/) to assemble long-read sequencing data into highly accurate, contiguous, and correctly phased diploid genomes. We generate new reference sequences for heterozygous samples including an F1 hybrid of Arabidopsis thaliana, the widely cultivated Vitis vinifera cv. Cabernet Sauvignon, and the coral fungus Clavicorona pyxidata, samples that have challenged short-read assembly approaches. The FALCON-based assemblies are substantially more contiguous and complete than alternate short- or long-read approaches. The phased diploid assembly enabled the study of haplotype structure and heterozygosities between homologous chromosomes, including the identification of widespread heterozygous structural variation within coding sequences.

Sujet(s)

Diploïdie , Génome fongique/génétique , Génome végétal/génétique , Génomique/méthodes , Polymorphisme de nucléotide simple/génétique , Algorithmes , Arabidopsis/génétique , Basidiomycota/génétique , ADN fongique/génétique , ADN des plantes/génétique , Haplotypes , Hétérozygote , Humains , Analyse de séquence d'ADN , Vitis/génétique

6.

Long-read, whole-genome shotgun sequence data for five model organisms.

Kim, Kristi E; Peluso, Paul; Babayan, Primo; Yeadon, P Jane; Yu, Charles; Fisher, William W; Chin, Chen-Shan; Rapicavoli, Nicole A; Rank, David R; Li, Joachim; Catcheside, David E A; Celniker, Susan E; Phillippy, Adam M; Bergman, Casey M; Landolin, Jane M.

Sci Data ; 1: 140045, 2014.

Article de Anglais | MEDLINE | ID: mdl-25977796

RÉSUMÉ

Single molecule, real-time (SMRT) sequencing from Pacific Biosciences is increasingly used in many areas of biological research including de novo genome assembly, structural-variant identification, haplotype phasing, mRNA isoform discovery, and base-modification analyses. High-quality, public datasets of SMRT sequences can spur development of analytic tools that can accommodate unique characteristics of SMRT data (long read lengths, lack of GC or amplification bias, and a random error profile leading to high consensus accuracy). In this paper, we describe eight high-coverage SMRT sequence datasets from five organisms (Escherichia coli, Saccharomyces cerevisiae, Neurospora crassa, Arabidopsis thaliana, and Drosophila melanogaster) that have been publicly released to the general scientific community (NCBI Sequence Read Archive ID SRP040522). Data were generated using two sequencing chemistries (P4C2 and P5C3) on the PacBio RS II instrument. The datasets reported here can be used without restriction by the research community to generate whole-genome assemblies, test new algorithms, investigate genome structure and evolution, and identify base modifications in some of the most widely-studied model systems in biological research.

Sujet(s)

Arabidopsis/génétique , Drosophila melanogaster/génétique , Escherichia coli/génétique , Génome bactérien , Génome fongique , Génome d'insecte , Génome végétal , Neurospora crassa/génétique , Saccharomyces cerevisiae/génétique , Analyse de séquence d'ADN , Animaux , Modèles animaux

7.

A flexible and efficient template format for circular consensus sequencing and SNP detection.

Travers, Kevin J; Chin, Chen-Shan; Rank, David R; Eid, John S; Turner, Stephen W.

Nucleic Acids Res ; 38(15): e159, 2010 Aug.

Article de Anglais | MEDLINE | ID: mdl-20571086

RÉSUMÉ

A novel template design for single-molecule sequencing is introduced, a structure we refer to as a SMRTbell template. This structure consists of a double-stranded portion, containing the insert of interest, and a single-stranded hairpin loop on either end, which provides a site for primer binding. Structurally, this format resembles a linear double-stranded molecule, and yet it is topologically circular. When placed into a single-molecule sequencing reaction, the SMRTbell template format enables a consensus sequence to be obtained from multiple passes on a single molecule. Furthermore, this consensus sequence is obtained from both the sense and antisense strands of the insert region. In this article, we present a universal method for constructing these templates, as well as an application of their use. We demonstrate the generation of high-quality consensus accuracy from single molecules, as well as the use of SMRTbell templates in the identification of rare sequence variants.

Sujet(s)

ADN/composition chimique , Oligonucléotides/composition chimique , Polymorphisme de nucléotide simple , Analyse de séquence d'ADN/méthodes , Séquence nucléotidique , Séquence consensus , Staphylococcus aureus/génétique , Matrices (génétique)

8.

EDGE: a centralized resource for the comparison, analysis, and distribution of toxicogenomic information.

Hayes, Kevin R; Vollrath, Aaron L; Zastrow, Gina M; McMillan, Brian J; Craven, Mark; Jovanovich, Stevan; Rank, David R; Penn, Sharon; Walisser, Jacqueline A; Reddy, Janardan K; Thomas, Russell S; Bradfield, Christopher A.

Mol Pharmacol ; 67(4): 1360-8, 2005 Apr.

Article de Anglais | MEDLINE | ID: mdl-15662043

RÉSUMÉ

Transcriptional profiling via microarrays holds great promise for toxicant classification and hazard prediction. Unfortunately, the use of different microarray platforms, protocols, and informatics often hinders the meaningful comparison of transcriptional profiling data across laboratories. One solution to this problem is to provide a low-cost and centralized resource that enables researchers to share toxicogenomic data that has been generated on a common platform. In an effort to create such a resource, we developed a standardized set of microarray reagents and reproducible protocols to simplify the analysis of liver gene expression in the mouse model. This resource, referred to as EDGE, was then used to generate a training set of 117 publicly accessible transcriptional profiles that can be accessed at http://edge.oncology.wisc.edu/. The Web-accessible database was also linked to an informatics suite that allows on-line clustering and K-means analyses as well as Boolean and sequence-based searches of the data. We propose that EDGE can serve as a prototype resource for the sharing of toxicogenomics information and be used to develop algorithms for efficient chemical classification and hazard prediction.

Sujet(s)

Bases de données génétiques , Analyse de profil d'expression de gènes , Séquençage par oligonucléotides en batterie/méthodes , Toxicogénétique , Animaux , Lipopolysaccharides/pharmacologie , Foie/effets des médicaments et des substances chimiques , Foie/métabolisme , Souris , Récepteur PPAR alpha/agonistes , Récepteurs à hydrocarbure aromatique/agonistes

9.

Developing toxicologically predictive gene sets using cDNA microarrays and Bayesian classification.

Thomas, Russell S; Rank, David R; Penn, Sharron G; Craven, Mark W; Drinkwater, Norman R; Bradfield, Christopher A.

Methods Enzymol ; 357: 198-205, 2002.

Article de Anglais | MEDLINE | ID: mdl-12424911

Sujet(s)

Théorème de Bayes , Séquençage par oligonucléotides en batterie , Toxicologie/méthodes , Animaux , Étiquettes de séquences exprimées , Régulation de l'expression des gènes/effets des médicaments et des substances chimiques , Traitement d'image par ordinateur , Foie/effets des médicaments et des substances chimiques , Souris , Séquençage par oligonucléotides en batterie/méthodes , Toxines biologiques/métabolisme , Toxines biologiques/pharmacologie

10.

A conditional density error model for the statistical analysis of microarray data.

Love, Brad; Rank, David R; Penn, Sharron G; Jenkins, David A; Thomas, Russell S.

Bioinformatics ; 18(8): 1064-72, 2002 Aug.

Article de Anglais | MEDLINE | ID: mdl-12176829

RÉSUMÉ

MOTIVATION: In many microarray experiments, relatively few intra- and inter-array replicate measurements are made due to significant cost limitations and sample availability. Compounding this problem is a lack of robust statistical methods for analyzing gene expression data with limited experimental replicates. As a result, the interpretation of the results of these experiments are difficult with little understanding of the probability of type I and type II errors. RESULTS: The variability in a series of replicate microarray measurements was modelled using a combination of parametric and non-parametric methods. A 3-dimensional surface was created for the conditional distribution of the variability given the mean signal intensity in both the Cy3 and Cy5 channels. The results were used as the basis for developing statistical methods for analyzing gene expression data with limited experimental replicates. AVAILABILITY: The statistical analysis scripts are available upon request.

Sujet(s)

ADN/génétique , Expression des gènes , Modèles génétiques , Modèles statistiques , Séquençage par oligonucléotides en batterie/méthodes , Simulation numérique , Réplication de l'ADN/génétique , Analyse de profil d'expression de gènes/méthodes , Analyse de profil d'expression de gènes/statistiques et données numériques , Régulation de l'expression des gènes , Humains , Séquençage par oligonucléotides en batterie/statistiques et données numériques , Reproductibilité des résultats , Sensibilité et spécificité

11.

Sequence variation and phylogenetic history of the mouse Ahr gene.

Thomas, Russell S; Penn, Sharron G; Holden, Kevin; Bradfield, Christopher A; Rank, David R.

Pharmacogenetics ; 12(2): 151-63, 2002 Mar.

Article de Anglais | MEDLINE | ID: mdl-11875369

RÉSUMÉ

The Ahr locus encodes for the aryl hydrocarbon receptor (AHR), which plays an important toxicological and developmental role. Sequence variation in this gene was studied in 13 different mouse lines that included eight laboratory strains, two Mus musculus subspecies and three additional Mus species. The data presented represent the largest study of sequence variation across multiple mouse lines in a single gene (approximately equal to 15.9 kb/mouse line). Among all mice, the average frequency of all polymorphisms in the intronic regions was 20.3 variants/kb and the average exonic frequency was 14.1 variants/kb. For substitutions alone, the average frequencies in the intronic and exonic regions for all mice were 13.3 and 8.9 substitutions/kb, respectively. Between laboratory strains, the average intronic and exonic frequencies for all polymorphisms dropped to 5.4 and 2.9 variants/kb, respectively. There were 111 non-synonymous polymorphisms that resulted in 42 different amino acid changes, of which only 10 amino acid changes had been previously identified. Based on the nucleotide sequence, the phylogenetic history of the gene showed mice from the Ahr(b2) and Ahr(d) alleles in separate branches while mice from the Ahr(b1) and Ahr(b3) alleles exhibited a more complex history. Evolutionarily, the AHR protein as a whole appears to be under purifying selective pressure (K(a) : K(s) ratio = 0.237). Despite significant functional constraint in the basic helix-loop-helix and PAS domains, ligand binding is not constrained to the high-affinity allele, which supports further the role of the AHR in development and its importance beyond the adaptive response to environmental toxicants.

Sujet(s)

Variation génétique , Lignées consanguines de souris/génétique , Polymorphisme génétique , Récepteurs à hydrocarbure aromatique/génétique , Séquence d'acides aminés , Animaux , Évolution moléculaire , Liaison génétique , Souris , Données de séquences moléculaires , Phylogenèse , Sélection génétique , Similitude de séquences d'acides aminés , Spécificité d'espèce

12.

Application of genomics to toxicology research.

Thomas, Russell S; Rank, David R; Penn, Sharron G; Zastrow, Gina M; Hayes, Kevin R; Hu, Tianhua; Pande, Kalyan; Lewis, Mark; Jovanovich, Stevan B; Bradfield, Christopher A.

Environ Health Perspect ; 110 Suppl 6: 919-23, 2002 Dec.

Article de Anglais | MEDLINE | ID: mdl-12634120

RÉSUMÉ

Traditional models of toxicity have relied on dissecting chemical action into pharmacokinetic and pharmacodynamic processes. However, the integration of genomic information with toxicology will enhance our basic understanding of these processes and significantly change the way we apply toxicological information to risk assessment and regulatory problems. In this article, we summarize the application of gene expression information and polymorphism discovery to four areas in toxicology: toxicity testing, cross-species extrapolation, understanding mechanism of action, and susceptibility.

Sujet(s)

Régulation de l'expression des gènes , Génomique , Polymorphisme génétique , Toxicologie/tendances , Animaux , Modèles animaux de maladie humaine , Polluants environnementaux/effets indésirables , Prévision , Humains , Séquençage par oligonucléotides en batterie , Tests de toxicité

RÉSUMÉ

Sujet(s)

RÉSUMÉ

Sujet(s)

RÉSUMÉ

Sujet(s)

RÉSUMÉ

Sujet(s)

RÉSUMÉ

Sujet(s)

RÉSUMÉ

Sujet(s)

RÉSUMÉ

Sujet(s)

RÉSUMÉ

Sujet(s)

Sujet(s)

RÉSUMÉ

Sujet(s)

RÉSUMÉ

Sujet(s)

RÉSUMÉ

Sujet(s)

ENVOYER À:

SÉLECTION CITATIONS

DÉTAIL DE RECHERCHE