Búsqueda | Portal Regional de la BVS

1.

Finding Hopf bifurcation islands and identifying thresholds for success or failure in oncolytic viral therapy.

Jahedi, Sana; Wang, Lin; Yorke, James A; Watmough, James.

Math Biosci ; 376: 109275, 2024 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-39127095

RESUMEN

We model interactions between cancer cells and viruses during oncolytic viral therapy. One of our primary goals is to identify parameter regions that yield treatment failure or success. We show that the tumor size under therapy at a particular time is less than the size without therapy. Our analysis demonstrates two thresholds for the horizontal transmission rate: a "failure threshold" below which treatment fails, and a "success threshold" above which infection prevalence reaches 100% and the tumor shrinks to its smallest size. Moreover, we explain how changes in the virulence of the virus alter the success threshold and the minimum tumor size. Our study suggests that the optimal virulence of an oncolytic virus depends on the timescale of virus dynamics. We identify a threshold for the virulence of the virus and show how this threshold depends on the timescale of virus dynamics. Our results suggest that when the timescale of virus dynamics is fast, administering a more virulent virus leads to a greater reduction in the tumor size. Conversely, when the viral timescale is slow, higher virulence can induce oscillations with high amplitude in the tumor size. Furthermore, we introduce the concept of a "Hopf bifurcation Island" in the parameter space, an idea that has applications far beyond the results of this paper and is applicable to many mathematical models. We elucidate what a Hopf bifurcation Island is, and we prove that small Islands can imply very slowly growing oscillatory solutions.

Asunto(s)

Neoplasias , Viroterapia Oncolítica , Virus Oncolíticos , Viroterapia Oncolítica/métodos , Humanos , Neoplasias/terapia , Neoplasias/virología , Virus Oncolíticos/fisiología , Modelos Biológicos , Virulencia , Conceptos Matemáticos

2.

Robust steady states in ecosystems with symmetries.

Jahedi, Sana; Sauer, Timothy; Yorke, James A.

J Biol Dyn ; 17(1): 2259223, 2023 12.

Artículo en Inglés | MEDLINE | ID: mdl-37728890

RESUMEN

Steady states of dynamical systems, whether stable or unstable, are critical for understanding future evolution. Robust steady states, ones that persist under small changes in the model parameters, are desired when modelling ecological systems, where it is common for accurate and detailed information on functional form and parameters to be unavailable. Previous work by Jahedi et al. [Robustness of solutions of almost every system of equations, SIAM J. Appl. Math. 82(5) (2022), pp. 1791-1807; Structured systems of nonlinear equations, SIAM J. Appl. Math. 83(4) (2023), pp. 1696-1716.] has established criteria to imply the prevalence of robust steady states for systems with minimal predetermined structure, including conventional structured systems. We review that work and extend it by allowing symmetries in the system structure, which present added obstructions to robustness.

Asunto(s)

Ecosistema , Modelos Biológicos

3.

When the Best Pandemic Models are the Simplest.

Jahedi, Sana; Yorke, James A.

Biology (Basel) ; 9(11)2020 Oct 23.

Artículo en Inglés | MEDLINE | ID: mdl-33114047

RESUMEN

As the coronavirus pandemic spreads across the globe, people are debating policies to mitigate its severity. Many complex, highly detailed models have been developed to help policy setters make better decisions. However, the basis of these models is unlikely to be understood by non-experts. We describe the advantages of simple models for COVID-19. We say a model is "simple" if its only parameter is the rate of contact between people in the population. This contact rate can vary over time, depending on choices by policy setters. Such models can be understood by a broad audience, and thus can be helpful in explaining the policy decisions to the public. They can be used to evaluate the outcomes of different policies. However, simple models have a disadvantage when dealing with inhomogeneous populations. To augment the power of a simple model to evaluate complicated situations, we add what we call "satellite" equations that do not change the original model. For example, with the help of a satellite equation, one could know what his/her chance is of remaining uninfected through the end of an epidemic. Satellite equations can model the effects of the epidemic on high-risk individuals, death rates, and nursing homes and other isolated populations. To compare simple models with complex models, we introduce our "slightly complex" Model J. We find the conclusions of simple and complex models can be quite similar. However, for each added complexity, a modeler may have to choose additional parameter values describing who will infect whom under what conditions, choices for which there is often little rationale but that can have big impacts on predictions. Our simulations suggest that the added complexity offers little predictive advantage.

4.

Low-dimensional paradigms for high-dimensional hetero-chaos.

Saiki, Yoshitaka; Sanjuán, Miguel A F; Yorke, James A.

Chaos ; 28(10): 103110, 2018 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-30384627

RESUMEN

The dynamics on a chaotic attractor can be quite heterogeneous, being much more unstable in some regions than others. Some regions of a chaotic attractor can be expanding in more dimensions than other regions. Imagine a situation where two such regions and each contains trajectories that stay in the region for all time-while typical trajectories wander throughout the attractor. Furthermore, if arbitrarily close to each point of the attractor there are points on periodic orbits that have different unstable dimensions, then we say such an attractor is "hetero-chaotic" (i.e., it has heterogeneous chaos). This is hard to picture but we believe that most physical systems possessing a high-dimensional attractor are of this type. We have created simplified models with that behavior to give insight into real high-dimensional phenomena.

5.

Erratum to: An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing.

Zimin, Aleksey V; Stevens, Kristian A; Crepeau, Marc W; Puiu, Daniela; Wegrzyn, Jill L; Yorke, James A; Langley, Charles H; Neale, David B; Salzberg, Steven L.

Gigascience ; 6(10): 1, 2017 10 01.

Artículo en Inglés | MEDLINE | ID: mdl-29020755

RESUMEN

The 22-gigabase genome of loblolly pine (Pinus taeda) is one of the largest ever sequenced. The draft assembly published in 2014 was built entirely from short Illumina reads, with lengths ranging from 100 to 250 base pairs (bp). The assembly was quite fragmented, containing over 11 million contigs whose weighted average (N50) size was 8206 bp. To improve this result, we generated approximately 12-fold coverage in long reads using the Single Molecule Real Time sequencing technology developed at Pacific Biosciences. We assembled the long and short reads together using the MaSuRCA mega-reads assembly algorithm, which produced a substantially better assembly, P. taeda version 2.0. The new assembly has an N50 contig size of 25 361, more than three times as large as achieved in the original assembly, and an N50 scaffold size of 107 821, 61% larger than the previous assembly.

6.

The Douglas-Fir Genome Sequence Reveals Specialization of the Photosynthetic Apparatus in Pinaceae.

Neale, David B; McGuire, Patrick E; Wheeler, Nicholas C; Stevens, Kristian A; Crepeau, Marc W; Cardeno, Charis; Zimin, Aleksey V; Puiu, Daniela; Pertea, Geo M; Sezen, U Uzay; Casola, Claudio; Koralewski, Tomasz E; Paul, Robin; Gonzalez-Ibeas, Daniel; Zaman, Sumaira; Cronn, Richard; Yandell, Mark; Holt, Carson; Langley, Charles H; Yorke, James A; Salzberg, Steven L; Wegrzyn, Jill L.

G3 (Bethesda) ; 7(9): 3157-3167, 2017 09 07.

Artículo en Inglés | MEDLINE | ID: mdl-28751502

RESUMEN

A reference genome sequence for Pseudotsuga menziesii var. menziesii (Mirb.) Franco (Coastal Douglas-fir) is reported, thus providing a reference sequence for a third genus of the family Pinaceae. The contiguity and quality of the genome assembly far exceeds that of other conifer reference genome sequences (contig N50 = 44,136 bp and scaffold N50 = 340,704 bp). Incremental improvements in sequencing and assembly technologies are in part responsible for the higher quality reference genome, but it may also be due to a slightly lower exact repeat content in Douglas-fir vs. pine and spruce. Comparative genome annotation with angiosperm species reveals gene-family expansion and contraction in Douglas-fir and other conifers which may account for some of the major morphological and physiological differences between the two major plant groups. Notable differences in the size of the NDH-complex gene family and genes underlying the functional basis of shade tolerance/intolerance were observed. This reference genome sequence not only provides an important resource for Douglas-fir breeders and geneticists but also sheds additional light on the evolutionary processes that have led to the divergence of modern angiosperms from the more ancient gymnosperms.

Asunto(s)

Genoma de Planta , Fotosíntesis/genética , Pinaceae/genética , Pinaceae/metabolismo , Pseudotsuga/genética , Pseudotsuga/metabolismo , Secuenciación Completa del Genoma , Adaptación Biológica/genética , Biología Computacional , Evolución Molecular , Duplicación de Gen , Redes Reguladoras de Genes , Genómica , Anotación de Secuencia Molecular , Familia de Multigenes , Filogenia , Pinaceae/clasificación , Proteómica/métodos , Pseudotsuga/clasificación , Secuencias Repetitivas de Ácidos Nucleicos

7.

An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing.

Zimin, Aleksey V; Stevens, Kristian A; Crepeau, Marc W; Puiu, Daniela; Wegrzyn, Jill L; Yorke, James A; Langley, Charles H; Neale, David B; Salzberg, Steven L.

Gigascience ; 6(1): 1-4, 2017 01 01.

Artículo en Inglés | MEDLINE | ID: mdl-28369353

RESUMEN

The 22-gigabase genome of loblolly pine (Pinus taeda) is one of the largest ever sequenced. The draft assembly published in 2014 was built entirely from short Illumina reads, with lengths ranging from 100 to 250 base pairs (bp). The assembly was quite fragmented, containing over 11 million contigs whose weighted average (N50) size was 8206 bp. To improve this result, we generated approximately 12-fold coverage in long reads using the Single Molecule Real Time sequencing technology developed at Pacific Biosciences. We assembled the long and short reads together using the MaSuRCA mega-reads assembly algorithm, which produced a substantially better assembly, P. taeda version 2.0. The new assembly has an N50 contig size of 25 361, more than three times as large as achieved in the original assembly, and an N50 scaffold size of 107 821, 61% larger than the previous assembly.

Asunto(s)

Mapeo Contig , Genoma de Planta , Secuenciación de Nucleótidos de Alto Rendimiento , Pinus taeda/genética , Análisis de Secuencia de ADN , Algoritmos , Genómica

8.

Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm.

Zimin, Aleksey V; Puiu, Daniela; Luo, Ming-Cheng; Zhu, Tingting; Koren, Sergey; Marçais, Guillaume; Yorke, James A; Dvorák, Jan; Salzberg, Steven L.

Genome Res ; 27(5): 787-792, 2017 05.

Artículo en Inglés | MEDLINE | ID: mdl-28130360

RESUMEN

Long sequencing reads generated by single-molecule sequencing technology offer the possibility of dramatically improving the contiguity of genome assemblies. The biggest challenge today is that long reads have relatively high error rates, currently around 15%. The high error rates make it difficult to use this data alone, particularly with highly repetitive plant genomes. Errors in the raw data can lead to insertion or deletion errors (indels) in the consensus genome sequence, which in turn create significant problems for downstream analysis; for example, a single indel may shift the reading frame and incorrectly truncate a protein sequence. Here, we describe an algorithm that solves the high error rate problem by combining long, high-error reads with shorter but much more accurate Illumina sequencing reads, whose error rates average <1%. Our hybrid assembly algorithm combines these two types of reads to construct mega-reads, which are both long and accurate, and then assembles the mega-reads using the CABOG assembler, which was designed for long reads. We apply this technique to a large data set of Illumina and PacBio sequences from the species Aegilops tauschii, a large and extremely repetitive plant genome that has resisted previous attempts at assembly. We show that the resulting assembled contigs are far larger than in any previous assembly, with an N50 contig size of 486,807 nucleotides. We compare the contigs to independently produced optical maps to evaluate their large-scale accuracy, and to a set of high-quality bacterial artificial chromosome (BAC)-based assemblies to evaluate base-level accuracy.

Asunto(s)

Mapeo Contig/métodos , Genoma de Planta , Genómica/métodos , Poaceae/genética , Secuencias Repetitivas de Ácidos Nucleicos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Mapeo Contig/normas , Tamaño del Genoma , Genómica/normas , Análisis de Secuencia de ADN/normas

9.

Partially controlling transient chaos in the Lorenz equations.

Capeáns, Rubén; Sabuco, Juan; Sanjuán, Miguel A F; Yorke, James A.

Philos Trans A Math Phys Eng Sci ; 375(2088)2017 Mar 06.

Artículo en Inglés | MEDLINE | ID: mdl-28115608

RESUMEN

Transient chaos is a characteristic behaviour in nonlinear dynamics where trajectories in a certain region of phase space behave chaotically for a while, before escaping to an external attractor. In some situations, the escapes are highly undesirable, so that it would be necessary to avoid such a situation. In this paper, we apply a control method known as partial control that allows one to prevent the escapes of the trajectories to the external attractors, keeping the trajectories in the chaotic region forever. We also show, for the first time, the application of this method in three dimensions, which is the major step forward in this work. To illustrate how the method works, we have chosen the Lorenz system for a choice of parameters where transient chaos appears, as a paradigmatic example in nonlinear dynamics. We analyse three quite different ways to implement the method. First, we apply this method by building an one-dimensional map using the successive maxima of one of the variables. Next, we implement it by building a two-dimensional map through a Poincaré section. Finally, we built a three-dimensional map, which has the advantage of using a fixed time interval between application of the control, which can be useful for practical applications.This article is part of the themed issue 'Horizons of cybernetical physics'.

10.

Sequence of the Sugar Pine Megagenome.

Stevens, Kristian A; Wegrzyn, Jill L; Zimin, Aleksey; Puiu, Daniela; Crepeau, Marc; Cardeno, Charis; Paul, Robin; Gonzalez-Ibeas, Daniel; Koriabine, Maxim; Holtz-Morris, Ann E; Martínez-García, Pedro J; Sezen, Uzay U; Marçais, Guillaume; Jermstad, Kathy; McGuire, Patrick E; Loopstra, Carol A; Davis, John M; Eckert, Andrew; de Jong, Pieter; Yorke, James A; Salzberg, Steven L; Neale, David B; Langley, Charles H.

Genetics ; 204(4): 1613-1626, 2016 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-27794028

RESUMEN

Until very recently, complete characterization of the megagenomes of conifers has remained elusive. The diploid genome of sugar pine (Pinus lambertiana Dougl.) has a highly repetitive, 31 billion bp genome. It is the largest genome sequenced and assembled to date, and the first from the subgenus Strobus, or white pines, a group that is notable for having the largest genomes among the pines. The genome represents a unique opportunity to investigate genome "obesity" in conifers and white pines. Comparative analysis of P. lambertiana and P. taeda L. reveals new insights on the conservation, age, and diversity of the highly abundant transposable elements, the primary factor determining genome size. Like most North American white pines, the principal pathogen of P. lambertiana is white pine blister rust (Cronartium ribicola J.C. Fischer ex Raben.). Identification of candidate genes for resistance to this pathogen is of great ecological importance. The genome sequence afforded us the opportunity to make substantial progress on locating the major dominant gene for simple resistance hypersensitive response, Cr1 We describe new markers and gene annotation that are both tightly linked to Cr1 in a mapping population, and associated with Cr1 in unrelated sugar pine individuals sampled throughout the species' range, creating a solid foundation for future mapping. This genomic variation and annotated candidate genes characterized in our study of the Cr1 region are resources for future marker-assisted breeding efforts as well as for investigations of fundamental mechanisms of invasive disease and evolutionary response.

Asunto(s)

Genoma de Planta , Pinus/genética , Basidiomycota/patogenicidad , Elementos Transponibles de ADN , Variación Genética , Tamaño del Genoma , Pinus/inmunología , Pinus/microbiología , Inmunidad de la Planta/genética

11.

The Atlantic salmon genome provides insights into rediploidization.

Lien, Sigbjørn; Koop, Ben F; Sandve, Simen R; Miller, Jason R; Kent, Matthew P; Nome, Torfinn; Hvidsten, Torgeir R; Leong, Jong S; Minkley, David R; Zimin, Aleksey; Grammes, Fabian; Grove, Harald; Gjuvsland, Arne; Walenz, Brian; Hermansen, Russell A; von Schalburg, Kris; Rondeau, Eric B; Di Genova, Alex; Samy, Jeevan K A; Olav Vik, Jon; Vigeland, Magnus D; Caler, Lis; Grimholt, Unni; Jentoft, Sissel; Våge, Dag Inge; de Jong, Pieter; Moen, Thomas; Baranski, Matthew; Palti, Yniv; Smith, Douglas R; Yorke, James A; Nederbragt, Alexander J; Tooming-Klunderud, Ave; Jakobsen, Kjetill S; Jiang, Xuanting; Fan, Dingding; Hu, Yan; Liberles, David A; Vidal, Rodrigo; Iturra, Patricia; Jones, Steven J M; Jonassen, Inge; Maass, Alejandro; Omholt, Stig W; Davidson, William S.

Nature ; 533(7602): 200-5, 2016 05 12.

Artículo en Inglés | MEDLINE | ID: mdl-27088604

RESUMEN

The whole-genome duplication 80 million years ago of the common ancestor of salmonids (salmonid-specific fourth vertebrate whole-genome duplication, Ss4R) provides unique opportunities to learn about the evolutionary fate of a duplicated vertebrate genome in 70 extant lineages. Here we present a high-quality genome assembly for Atlantic salmon (Salmo salar), and show that large genomic reorganizations, coinciding with bursts of transposon-mediated repeat expansions, were crucial for the post-Ss4R rediploidization process. Comparisons of duplicate gene expression patterns across a wide range of tissues with orthologous genes from a pre-Ss4R outgroup unexpectedly demonstrate far more instances of neofunctionalization than subfunctionalization. Surprisingly, we find that genes that were retained as duplicates after the teleost-specific whole-genome duplication 320 million years ago were not more likely to be retained after the Ss4R, and that the duplicate retention was not influenced to a great extent by the nature of the predicted protein interactions of the gene products. Finally, we demonstrate that the Atlantic salmon assembly can serve as a reference sequence for the study of other salmonids for a range of purposes.

Asunto(s)

Diploidia , Evolución Molecular , Duplicación de Gen/genética , Genes Duplicados/genética , Genoma/genética , Salmo salar/genética , Animales , Elementos Transponibles de ADN/genética , Femenino , Genómica , Masculino , Modelos Genéticos , Mutagénesis/genética , Filogenia , Estándares de Referencia , Salmo salar/clasificación , Homología de Secuencia

12.

Evolution of transcriptional networks in yeast: alternative teams of transcriptional factors for different species.

Muñoz, Adriana; Santos Muñoz, Daniella; Zimin, Aleksey; Yorke, James A.

BMC Genomics ; 17(Suppl 10): 826, 2016 11 11.

Artículo en Inglés | MEDLINE | ID: mdl-28185554

RESUMEN

BACKGROUND: The diversity in eukaryotic life reflects a diversity in regulatory pathways. Nocedal and Johnson argue that the rewiring of gene regulatory networks is a major force for the diversity of life, that changes in regulation can create new species. RESULTS: We have created a method (based on our new "ping-pong algorithm) for detecting more complicated rewirings, where several transcription factors can substitute for one or more transcription factors in the regulation of a family of co-regulated genes. An example is illustrative. A rewiring has been reported by Hogues et al. that RAP1 in Saccharomyces cerevisiae substitutes for TBF1/CBF1 in Candida albicans for ribosomal RP genes. There one transcription factor substitutes for another on some collection of genes. Such a substitution is referred to as a "rewiring". We agree with this finding of rewiring as far as it goes but the situation is more complicated. Many transcription factors can regulate a gene and our algorithm finds that in this example a "team" (or collection) of three transcription factors including RAP1 substitutes for TBF1 for 19 genes. The switch occurs for a branch of the phylogenetic tree containing 10 species (including Saccharomyces cerevisiae), while the remaining 13 species (Candida albicans) are regulated by TBF1. CONCLUSIONS: To gain insight into more general evolutionary mechanisms, we have created a mathematical algorithm that finds such general switching events and we prove that it converges. Of course any such computational discovery should be validated in the biological tests. For each branch of the phylogenetic tree and each gene module, our algorithm finds a sub-group of co-regulated genes and a team of transcription factors that substitutes for another team of transcription factors. In most cases the signal will be small but in some cases we find a strong signal of switching. We report our findings for 23 Ascomycota fungi species.

Asunto(s)

Algoritmos , Evolución Molecular , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Factores de Transcripción/genética , Candida albicans/clasificación , Candida albicans/genética , Candida albicans/metabolismo , Redes Reguladoras de Genes , Filogenia , Saccharomyces cerevisiae/clasificación , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Complejo Shelterina , Proteínas de Unión a Telómeros/genética , Factores de Transcripción/metabolismo , Transcripción Genética

13.

Testing for Basins of Wada.

Daza, Alvar; Wagemakers, Alexandre; Sanjuán, Miguel A F; Yorke, James A.

Sci Rep ; 5: 16579, 2015 Nov 10.

Artículo en Inglés | MEDLINE | ID: mdl-26553444

RESUMEN

Nonlinear systems often give rise to fractal boundaries in phase space, hindering predictability. When a single boundary separates three or more different basins of attraction, we say that the set of basins has the Wada property and initial conditions near that boundary are even more unpredictable. Many physical systems of interest with this topological property appear in the literature. However, so far the only approach to study Wada basins has been restricted to two-dimensional phase spaces. Here we report a simple algorithm whose purpose is to look for the Wada property in a given dynamical system. Another benefit of this procedure is the possibility to classify and study intermediate situations known as partially Wada boundaries.

14.

Geometry of the edge of chaos in a low-dimensional turbulent shear flow model.

Joglekar, Madhura; Feudel, Ulrike; Yorke, James A.

Phys Rev E Stat Nonlin Soft Matter Phys ; 91(5): 052903, 2015 May.

Artículo en Inglés | MEDLINE | ID: mdl-26066225

RESUMEN

We investigate the geometry of the edge of chaos for a nine-dimensional sinusoidal shear flow model and show how the shape of the edge of chaos changes with increasing Reynolds number. Furthermore, we numerically compute the scaling of the minimum perturbation required to drive the laminar attracting state into the turbulent region. We find this minimum perturbation to scale with the Reynolds number as Re(-2).

15.

QuorUM: An Error Corrector for Illumina Reads.

Marçais, Guillaume; Yorke, James A; Zimin, Aleksey.

PLoS One ; 10(6): e0130821, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-26083032

RESUMEN

MOTIVATION: Illumina Sequencing data can provide high coverage of a genome by relatively short (most often 100 bp to 150 bp) reads at a low cost. Even with low (advertised 1%) error rate, 100 × coverage Illumina data on average has an error in some read at every base in the genome. These errors make handling the data more complicated because they result in a large number of low-count erroneous k-mers in the reads. However, there is enough information in the reads to correct most of the sequencing errors, thus making subsequent use of the data (e.g. for mapping or assembly) easier. Here we use the term "error correction" to denote the reduction in errors due to both changes in individual bases and trimming of unusable sequence. We developed an error correction software called QuorUM. QuorUM is mainly aimed at error correcting Illumina reads for subsequent assembly. It is designed around the novel idea of minimizing the number of distinct erroneous k-mers in the output reads and preserving the most true k-mers, and we introduce a composite statistic π that measures how successful we are at achieving this dual goal. We evaluate the performance of QuorUM by correcting actual Illumina reads from genomes for which a reference assembly is available. RESULTS: We produce trimmed and error-corrected reads that result in assemblies with longer contigs and fewer errors. We compared QuorUM against several published error correctors and found that it is the best performer in most metrics we use. QuorUM is efficiently implemented making use of current multi-core computing architectures and it is suitable for large data sets (1 billion bases checked and corrected per day per core). We also demonstrate that a third-party assembler (SOAPdenovo) benefits significantly from using QuorUM error-corrected reads. QuorUM error corrected reads result in a factor of 1.1 to 4 improvement in N50 contig size compared to using the original reads with SOAPdenovo for the data sets investigated. AVAILABILITY: QuorUM is distributed as an independent software package and as a module of the MaSuRCA assembly software. Both are available under the GPL open source license at http://www.genome.umd.edu. CONTACT: gmarcais@umd.edu.

Asunto(s)

Biología Computacional/métodos , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Algoritmos , Animales , Genoma , Humanos

16.

A new rhesus macaque assembly and annotation for next-generation sequencing analyses.

Zimin, Aleksey V; Cornish, Adam S; Maudhoo, Mnirnal D; Gibbs, Robert M; Zhang, Xiongfei; Pandey, Sanjit; Meehan, Daniel T; Wipfler, Kristin; Bosinger, Steven E; Johnson, Zachary P; Tharp, Gregory K; Marçais, Guillaume; Roberts, Michael; Ferguson, Betsy; Fox, Howard S; Treangen, Todd; Salzberg, Steven L; Yorke, James A; Norgren, Robert B.

Biol Direct ; 9(1): 20, 2014 Oct 14.

Artículo en Inglés | MEDLINE | ID: mdl-25319552

RESUMEN

BACKGROUND: The rhesus macaque (Macaca mulatta) is a key species for advancing biomedical research. Like all draft mammalian genomes, the draft rhesus assembly (rheMac2) has gaps, sequencing errors and misassemblies that have prevented automated annotation pipelines from functioning correctly. Another rhesus macaque assembly, CR_1.0, is also available but is substantially more fragmented than rheMac2 with smaller contigs and scaffolds. Annotations for these two assemblies are limited in completeness and accuracy. High quality assembly and annotation files are required for a wide range of studies including expression, genetic and evolutionary analyses. RESULTS: We report a new de novo assembly of the rhesus macaque genome (MacaM) that incorporates both the original Sanger sequences used to assemble rheMac2 and new Illumina sequences from the same animal. MacaM has a weighted average (N50) contig size of 64 kilobases, more than twice the size of the rheMac2 assembly and almost five times the size of the CR_1.0 assembly. The MacaM chromosome assembly incorporates information from previously unutilized mapping data and preliminary annotation of scaffolds. Independent assessment of the assemblies using Ion Torrent read alignments indicates that MacaM is more complete and accurate than rheMac2 and CR_1.0. We assembled messenger RNA sequences from several rhesus tissues into transcripts which allowed us to identify a total of 11,712 complete proteins representing 9,524 distinct genes. Using a combination of our assembled rhesus macaque transcripts and human transcripts, we annotated 18,757 transcripts and 16,050 genes with complete coding sequences in the MacaM assembly. Further, we demonstrate that the new annotations provide greatly improved accuracy as compared to the current annotations of rheMac2. Finally, we show that the MacaM genome provides an accurate resource for alignment of reads produced by RNA sequence expression studies. CONCLUSIONS: The MacaM assembly and annotation files provide a substantially more complete and accurate representation of the rhesus macaque genome than rheMac2 or CR_1.0 and will serve as an important resource for investigators conducting next-generation sequencing studies with nonhuman primates. REVIEWERS: This article was reviewed by Dr. Lutz Walter, Dr. Soojin Yi and Dr. Kateryna Makova.

Asunto(s)

Genoma , Macaca mulatta/genética , Secuencia de Aminoácidos , Animales , Perfilación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Anotación de Secuencia Molecular , Datos de Secuencia Molecular , ARN Mensajero/metabolismo , Alineación de Secuencia

17.

Scaling of chaos versus periodicity: how certain is it that an attractor is chaotic?

Joglekar, Madhura; Ott, Edward; Yorke, James A.

Phys Rev Lett ; 113(8): 084101, 2014 Aug 22.

Artículo en Inglés | MEDLINE | ID: mdl-25192099

RESUMEN

The character of the time-asymptotic evolution of physical systems can have complex, singular behavior with variation of a system parameter, particularly when chaos is involved. A perturbation of the parameter by a small amount Îµ can convert an attractor from chaotic to nonchaotic or vice versa. We call a parameter value where this can happen Îµ uncertain. The probability that a random choice of the parameter is Îµ uncertain commonly scales like a power law in Îµ. Surprisingly, two seemingly similar ways of defining this scaling, both of physical interest, yield different numerical values for the scaling exponent. We show why this happens and present a quantitative analysis of this phenomenon.

18.

Sequencing and assembly of the 22-gb loblolly pine genome.

Zimin, Aleksey; Stevens, Kristian A; Crepeau, Marc W; Holtz-Morris, Ann; Koriabine, Maxim; Marçais, Guillaume; Puiu, Daniela; Roberts, Michael; Wegrzyn, Jill L; de Jong, Pieter J; Neale, David B; Salzberg, Steven L; Yorke, James A; Langley, Charles H.

Genetics ; 196(3): 875-90, 2014 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-24653210

RESUMEN

Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the largest genome assembled to date. Most of the sequence data were derived from whole-genome shotgun sequencing of a single megagametophyte, the haploid tissue of a single pine seed. Although that constrained the quantity of available DNA, the resulting haploid sequence data were well-suited for assembly. The haploid sequence was augmented with multiple linking long-fragment mate pair libraries from the parental diploid DNA. For the longest fragments, we used novel fosmid DiTag libraries. Sequences from the linking libraries that did not match the megagametophyte were identified and removed. Assembly of the sequence data were aided by condensing the enormous number of paired-end reads into a much smaller set of longer "super-reads," rendering subsequent assembly with an overlap-based assembly algorithm computationally feasible. To further improve the contiguity and biological utility of the genome sequence, additional scaffolding methods utilizing independent genome and transcriptome assemblies were implemented. The combination of these strategies resulted in a draft genome sequence of 20.15 billion bases, with an N50 scaffold size of 66.9 kbp.

Asunto(s)

Genoma de Planta , Óvulo Vegetal/genética , Pinus taeda/genética , Genómica , Haploidia , Análisis de Secuencia de ADN , Transcriptoma

19.

Unique features of the loblolly pine (Pinus taeda L.) megagenome revealed through sequence annotation.

Wegrzyn, Jill L; Liechty, John D; Stevens, Kristian A; Wu, Le-Shin; Loopstra, Carol A; Vasquez-Gross, Hans A; Dougherty, William M; Lin, Brian Y; Zieve, Jacob J; Martínez-García, Pedro J; Holt, Carson; Yandell, Mark; Zimin, Aleksey V; Yorke, James A; Crepeau, Marc W; Puiu, Daniela; Salzberg, Steven L; Dejong, Pieter J; Mockaitis, Keithanne; Main, Doreen; Langley, Charles H; Neale, David B.

Genetics ; 196(3): 891-909, 2014 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-24653211

RESUMEN

The largest genus in the conifer family Pinaceae is Pinus, with over 100 species. The size and complexity of their genomes (â¼20-40 Gb, 2n = 24) have delayed the arrival of a well-annotated reference sequence. In this study, we present the annotation of the first whole-genome shotgun assembly of loblolly pine (Pinus taeda L.), which comprises 20.1 Gb of sequence. The MAKER-P annotation pipeline combined evidence-based alignments and ab initio predictions to generate 50,172 gene models, of which 15,653 are classified as high confidence. Clustering these gene models with 13 other plant species resulted in 20,646 gene families, of which 1554 are predicted to be unique to conifers. Among the conifer gene families, 159 are composed exclusively of loblolly pine members. The gene models for loblolly pine have the highest median and mean intron lengths of 24 fully sequenced plant genomes. Conifer genomes are full of repetitive DNA, with the most significant contributions from long-terminal-repeat retrotransposons. In depth analysis of the tandem and interspersed repetitive content yielded a combined estimate of 82%.

Asunto(s)

Genoma de Planta , Anotación de Secuencia Molecular/métodos , Pinus taeda/genética , ADN de Plantas/análisis , Evolución Molecular , Genes de Plantas , Familia de Multigenes , Filogenia , Alineación de Secuencia

20.

Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies.

Neale, David B; Wegrzyn, Jill L; Stevens, Kristian A; Zimin, Aleksey V; Puiu, Daniela; Crepeau, Marc W; Cardeno, Charis; Koriabine, Maxim; Holtz-Morris, Ann E; Liechty, John D; Martínez-García, Pedro J; Vasquez-Gross, Hans A; Lin, Brian Y; Zieve, Jacob J; Dougherty, William M; Fuentes-Soriano, Sara; Wu, Le-Shin; Gilbert, Don; Marçais, Guillaume; Roberts, Michael; Holt, Carson; Yandell, Mark; Davis, John M; Smith, Katherine E; Dean, Jeffrey F D; Lorenz, W Walter; Whetten, Ross W; Sederoff, Ronald; Wheeler, Nicholas; McGuire, Patrick E; Main, Doreen; Loopstra, Carol A; Mockaitis, Keithanne; deJong, Pieter J; Yorke, James A; Salzberg, Steven L; Langley, Charles H.

Genome Biol ; 15(3): R59, 2014 Mar 04.

Artículo en Inglés | MEDLINE | ID: mdl-24647006

RESUMEN

BACKGROUND: The size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly. The large research community and economic importance of loblolly pine, Pinus taeda L., made it an early candidate for reference sequence determination. RESULTS: We develop a novel strategy to sequence the genome of loblolly pine that combines unique aspects of pine reproductive biology and genome assembly methodology. We use a whole genome shotgun approach relying primarily on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding. The resulting sequence and assembly was used to generate a draft genome spanning 23.2 Gbp and containing 20.1 Gbp with an N50 scaffold size of 66.9 kbp, making it a significant improvement over available conifer genomes. The long scaffold lengths allow the annotation of 50,172 gene models with intron lengths averaging over 2.7 kbp and sometimes exceeding 100 kbp in length. Analysis of orthologous gene sets identifies gene families that may be unique to conifers. We further characterize and expand the existing repeat library based on the de novo analysis of the repetitive content, estimated to encompass 82% of the genome. CONCLUSIONS: In addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied.

Asunto(s)

Mapeo Contig/métodos , Genoma de Planta , Pinus taeda/genética , Análisis de Secuencia de ADN/métodos , ADN de Plantas/genética , Haploidia

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA