Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Microb Genom ; 10(5)2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38785221

RESUMEN

Wastewater-based surveillance (WBS) is an important epidemiological and public health tool for tracking pathogens across the scale of a building, neighbourhood, city, or region. WBS gained widespread adoption globally during the SARS-CoV-2 pandemic for estimating community infection levels by qPCR. Sequencing pathogen genes or genomes from wastewater adds information about pathogen genetic diversity, which can be used to identify viral lineages (including variants of concern) that are circulating in a local population. Capturing the genetic diversity by WBS sequencing is not trivial, as wastewater samples often contain a diverse mixture of viral lineages with real mutations and sequencing errors, which must be deconvoluted computationally from short sequencing reads. In this study we assess nine different computational tools that have recently been developed to address this challenge. We simulated 100 wastewater sequence samples consisting of SARS-CoV-2 BA.1, BA.2, and Delta lineages, in various mixtures, as well as a Delta-Omicron recombinant and a synthetic 'novel' lineage. Most tools performed well in identifying the true lineages present and estimating their relative abundances and were generally robust to variation in sequencing depth and read length. While many tools identified lineages present down to 1 % frequency, results were more reliable above a 5 % threshold. The presence of an unknown synthetic lineage, which represents an unclassified SARS-CoV-2 lineage, increases the error in relative abundance estimates of other lineages, but the magnitude of this effect was small for most tools. The tools also varied in how they labelled novel synthetic lineages and recombinants. While our simulated dataset represents just one of many possible use cases for these methods, we hope it helps users understand potential sources of error or bias in wastewater sequencing analysis and to appreciate the commonalities and differences across methods.


Asunto(s)
COVID-19 , Genoma Viral , SARS-CoV-2 , Aguas Residuales , Aguas Residuales/virología , SARS-CoV-2/genética , SARS-CoV-2/clasificación , COVID-19/virología , COVID-19/epidemiología , Humanos , Biología Computacional/métodos , Genómica/métodos , Monitoreo Epidemiológico Basado en Aguas Residuales , Filogenia
2.
Cell Rep Methods ; 2(10): 100313, 2022 10 24.
Artículo en Inglés | MEDLINE | ID: mdl-36159190

RESUMEN

Wastewater surveillance has become essential for monitoring the spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The quantification of SARS-CoV-2 RNA in wastewater correlates with the coronavirus disease 2019 (COVID-19) caseload in a community. However, estimating the proportions of different SARS-CoV-2 haplotypes has remained technically difficult. We present a phylogenetic imputation method for improving the SARS-CoV-2 reference database and a method for estimating the relative proportions of SARS-CoV-2 haplotypes from wastewater samples. The phylogenetic imputation method uses the global SARS-CoV-2 phylogeny and imputes based on the maximum of the posterior probability of each nucleotide. We show that the imputation method has error rates comparable to, or lower than, typical sequencing error rates, which substantially improves the reference database and allows for accurate inferences of haplotype composition. Our method for estimating relative proportions of haplotypes uses an initial step to remove unlikely haplotypes and an expectation maximization (EM) algorithm for obtaining maximum likelihood estimates of the proportions of different haplotypes in a sample. Using simulations with a reference database of >3 million SARS-CoV-2 genomes, we show that the estimated proportions reflect the true proportions given sufficiently high sequencing depth.


Asunto(s)
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/epidemiología , Haplotipos , Filogenia , ARN Viral/genética , Aguas Residuales , Monitoreo Epidemiológico Basado en Aguas Residuales , Funciones de Verosimilitud
3.
Bioinformatics ; 38(3): 663-670, 2022 01 12.
Artículo en Inglés | MEDLINE | ID: mdl-34668516

RESUMEN

MOTIVATION: Clustering is a fundamental task in the analysis of nucleotide sequences. Despite the exponential increase in the size of sequence databases of homologous genes, few methods exist to cluster divergent sequences. Traditional clustering methods have mostly focused on optimizing high speed clustering of highly similar sequences. We develop a phylogenetic clustering method which infers ancestral sequences for a set of initial clusters and then uses a greedy algorithm to cluster sequences. RESULTS: We describe a clustering program AncestralClust, which is developed for clustering divergent sequences. We compare this method with other state-of-the-art clustering methods using datasets of homologous sequences from different species. We show that, in divergent datasets, AncestralClust has higher accuracy and more even cluster sizes than current popular methods. AVAILABILITY AND IMPLEMENTATION: AncestralClust is an Open Source program available at https://github.com/lpipes/ancestralclust. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Programas Informáticos , Filogenia , Secuencia de Bases , Análisis por Conglomerados
4.
Virus Evol ; 7(1): veaa098, 2021 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-33500788

RESUMEN

Human severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is most closely related, by average genetic distance, to two coronaviruses isolated from bats, RaTG13 and RmYN02. However, there is a segment of high amino acid similarity between human SARS-CoV-2 and a pangolin-isolated strain, GD410721, in the receptor-binding domain (RBD) of the spike protein, a pattern that can be caused by either recombination or by convergent amino acid evolution driven by natural selection. We perform a detailed analysis of the synonymous divergence, which is less likely to be affected by selection than amino acid divergence, between human SARS-CoV-2 and related strains. We show that the synonymous divergence between the bat-derived viruses and SARS-CoV-2 is larger than between GD410721 and SARS-CoV-2 in the RBD, providing strong additional support for the recombination hypothesis. However, the synonymous divergence between pangolin strain and SARS-CoV-2 is also relatively high, which is not consistent with a recent recombination between them, instead, it suggests a recombination into RaTG13. We also find a 14-fold increase in the dN /dS ratio from the lineage leading to SARS-CoV-2 to the strains of the current pandemic, suggesting that the vast majority of nonsynonymous mutations currently segregating within the human strains have a negative impact on viral fitness. Finally, we estimate that the time to the most recent common ancestor of SARS-CoV-2 and RaTG13 or RmYN02 based on synonymous divergence is 51.71 years (95% CI, 28.11-75.31) and 37.02 years (95% CI, 18.19-55.85), respectively.

5.
Mol Biol Evol ; 38(4): 1537-1543, 2021 04 13.
Artículo en Inglés | MEDLINE | ID: mdl-33295605

RESUMEN

The rooting of the SARS-CoV-2 phylogeny is important for understanding the origin and early spread of the virus. Previously published phylogenies have used different rootings that do not always provide consistent results. We investigate several different strategies for rooting the SARS-CoV-2 tree and provide measures of statistical uncertainty for all methods. We show that methods based on the molecular clock tend to place the root in the B clade, whereas methods based on outgroup rooting tend to place the root in the A clade. The results from the two approaches are statistically incompatible, possibly as a consequence of deviations from a molecular clock or excess back-mutations. We also show that none of the methods provide strong statistical support for the placement of the root in any particular edge of the tree. These results suggest that phylogenetic evidence alone is unlikely to identify the origin of the SARS-CoV-2 virus and we caution against strong inferences regarding the early spread of the virus based solely on such evidence.


Asunto(s)
COVID-19/virología , Genoma Viral , Mutación , Filogenia , SARS-CoV-2/genética , Algoritmos , Animales , Teorema de Bayes , Evolución Molecular , Humanos , Funciones de Verosimilitud , Cadenas de Markov , Modelos Genéticos , Modelos Estadísticos , Método de Montecarlo , Mutación Missense , ARN Viral/genética , Incertidumbre
6.
Proc Natl Acad Sci U S A ; 115(41): 10398-10403, 2018 10 09.
Artículo en Inglés | MEDLINE | ID: mdl-30228118

RESUMEN

Animal domestication efforts have led to a shared spectrum of striking behavioral and morphological changes. To recapitulate this process, silver foxes have been selectively bred for tame and aggressive behaviors for more than 50 generations at the Institute for Cytology and Genetics in Novosibirsk, Russia. To understand the genetic basis and molecular mechanisms underlying the phenotypic changes, we profiled gene expression levels and coding SNP allele frequencies in two brain tissue specimens from 12 aggressive foxes and 12 tame foxes. Expression analysis revealed 146 genes in the prefrontal cortex and 33 genes in the basal forebrain that were differentially expressed, with a 5% false discovery rate (FDR). These candidates include genes in key pathways known to be critical to neurologic processing, including the serotonin and glutamate receptor pathways. In addition, 295 of the 31,000 exonic SNPs show significant allele frequency differences between the tame and aggressive populations (1% FDR), including genes with a role in neural crest cell fate determination.


Asunto(s)
Agresión , Conducta Animal , Encéfalo/metabolismo , Zorros/genética , Genoma , Selección Genética , Transcriptoma , Animales , Zorros/psicología , Genómica , Masculino , Polimorfismo de Nucleótido Simple , Federación de Rusia
7.
Nature ; 553(7686): 77-81, 2018 01 03.
Artículo en Inglés | MEDLINE | ID: mdl-29300007

RESUMEN

In contrast to infections with human immunodeficiency virus (HIV) in humans and simian immunodeficiency virus (SIV) in macaques, SIV infection of a natural host, sooty mangabeys (Cercocebus atys), is non-pathogenic despite high viraemia. Here we sequenced and assembled the genome of a captive sooty mangabey. We conducted genome-wide comparative analyses of transcript assemblies from C. atys and AIDS-susceptible species, such as humans and macaques, to identify candidates for host genetic factors that influence susceptibility. We identified several immune-related genes in the genome of C. atys that show substantial sequence divergence from macaques or humans. One of these sequence divergences, a C-terminal frameshift in the toll-like receptor-4 (TLR4) gene of C. atys, is associated with a blunted in vitro response to TLR-4 ligands. In addition, we found a major structural change in exons 3-4 of the immune-regulatory protein intercellular adhesion molecule 2 (ICAM-2); expression of this variant leads to reduced cell surface expression of ICAM-2. These data provide a resource for comparative genomic studies of HIV and/or SIV pathogenesis and may help to elucidate the mechanisms by which SIV-infected sooty mangabeys avoid AIDS.


Asunto(s)
Síndrome de Inmunodeficiencia Adquirida/genética , Cercocebus atys/genética , Cercocebus atys/virología , Predisposición Genética a la Enfermedad , Genoma/genética , Especificidad del Huésped/genética , Virus de la Inmunodeficiencia de los Simios , Síndrome de Inmunodeficiencia Adquirida/virología , Secuencia de Aminoácidos , Animales , Moléculas de Adhesión Celular/química , Moléculas de Adhesión Celular/genética , Moléculas de Adhesión Celular/metabolismo , Cercocebus atys/inmunología , Exones/genética , Femenino , Mutación del Sistema de Lectura/genética , Variación Genética , Genómica , VIH/patogenicidad , Humanos , Macaca/virología , Eliminación de Secuencia , Síndrome de Inmunodeficiencia Adquirida del Simio/genética , Síndrome de Inmunodeficiencia Adquirida del Simio/virología , Virus de la Inmunodeficiencia de los Simios/patogenicidad , Especificidad de la Especie , Receptor Toll-Like 4/química , Receptor Toll-Like 4/genética , Receptor Toll-Like 4/inmunología , Transcriptoma/genética , Secuenciación Completa del Genoma
8.
J Comp Neurol ; 524(2): 288-308, 2016 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-26132897

RESUMEN

The human brain and human cognitive abilities are strikingly different from those of other great apes despite relatively modest genome sequence divergence. However, little is presently known about the interspecies divergence in gene structure and transcription that might contribute to these phenotypic differences. To date, most comparative studies of gene structure in the brain have examined humans, chimpanzees, and macaque monkeys. To add to this body of knowledge, we analyze here the brain transcriptome of the western lowland gorilla (Gorilla gorilla gorilla), an African great ape species that is phylogenetically closely related to humans, but with a brain that is approximately one-third the size. Manual transcriptome curation from a sample of the planum temporale region of the neocortex revealed 12 protein-coding genes and one noncoding-RNA gene with exons in the gorilla unmatched by public transcriptome data from the orthologous human loci. These interspecies gene structure differences accounted for a total of 134 amino acids in proteins found in the gorilla that were absent from protein products of the orthologous human genes. Proteins varying in structure between human and gorilla were involved in immunity and energy metabolism, suggesting their relevance to phenotypic differences. This gorilla neocortical transcriptome comprises an empirical, not homology- or prediction-driven, resource for orthologous gene comparisons between human and gorilla. These findings provide a unique repository of the sequences and structures of thousands of genes transcribed in the gorilla brain, pointing to candidate genes that may contribute to the traits distinguishing humans from other closely related great apes.


Asunto(s)
Encéfalo/metabolismo , Expresión Génica/fisiología , Secuenciación de Nucleótidos de Alto Rendimiento , ARN/metabolismo , Animales , Proteínas Portadoras/genética , Proteínas Portadoras/metabolismo , Perfilación de la Expresión Génica , Gorilla gorilla/anatomía & histología , Humanos/anatomía & histología , Péptidos y Proteínas de Señalización Intracelular , Modelos Moleculares , Proteínas Musculares/genética , Proteínas Musculares/metabolismo , Coactivador 1-alfa del Receptor Activado por Proliferadores de Peroxisomas gamma , Fosfoproteínas Fosfatasas/genética , Fosfoproteínas Fosfatasas/metabolismo , Filogenia , Especificidad de la Especie , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , beta 2 Glicoproteína I/genética , beta 2 Glicoproteína I/metabolismo
9.
Nucleic Acids Res ; 43(Database issue): D737-42, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25392405

RESUMEN

The non-human primate reference transcriptome resource (NHPRTR, available online at http://nhprtr.org/) aims to generate comprehensive RNA-seq data from a wide variety of non-human primates (NHPs), from lemurs to hominids. In the 2012 Phase I of the NHPRTR project, 19 billion fragments or 3.8 terabases of transcriptome sequences were collected from pools of ∼ 20 tissues in 15 species and subspecies. Here we describe a major expansion of NHPRTR by adding 10.1 billion fragments of tissue-specific RNA-seq data. For this effort, we selected 11 of the original 15 NHP species and subspecies and constructed total RNA libraries for the same ∼ 15 tissues in each. The sequence quality is such that 88% of the reads align to human reference sequences, allowing us to compute the full list of expression abundance across all tissues for each species, using the reads mapped to human genes. This update also includes improved transcript annotations derived from RNA-seq data for rhesus and cynomolgus macaques, two of the most commonly used NHP models and additional RNA-seq data compiled from related projects. Together, these comprehensive reference transcriptomes from multiple primates serve as a valuable community resource for genome annotation, gene dynamics and comparative functional analysis.


Asunto(s)
Bases de Datos Genéticas , Perfilación de la Expresión Génica , Primates/genética , Análisis de Secuencia de ARN , Animales , Internet , Macaca , Anotación de Secuencia Molecular , Especificidad de Órganos , Estándares de Referencia , Alineación de Secuencia/normas
10.
Concurr Comput ; 26(13): 2157-2166, 2014 Sep 10.
Artículo en Inglés | MEDLINE | ID: mdl-25294974

RESUMEN

A variety of extremely challenging biological sequence analyses were conducted on the XSEDE large shared memory resource Blacklight, using current bioinformatics tools and encompassing a wide range of scientific applications. These include genomic sequence assembly, very large metagenomic sequence assembly, transcriptome assembly, and sequencing error correction. The data sets used in these analyses included uncategorized fungal species, reference microbial data, very large soil and human gut microbiome sequence data, and primate transcriptomes, composed of both short-read and long-read sequence data. A new parallel command execution program was developed on the Blacklight resource to handle some of these analyses. These results, initially reported previously at XSEDE13 and expanded here, represent significant advances for their respective scientific communities. The breadth and depth of the results achieved demonstrate the ease of use, versatility, and unique capabilities of the Blacklight XSEDE resource for scientific analysis of genomic and transcriptomic sequence data, and the power of these resources, together with XSEDE support, in meeting the most challenging scientific problems.

11.
J Med Primatol ; 43(5): 317-28, 2014 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-24810475

RESUMEN

BACKGROUND: The genome annotations of rhesus (Macaca mulatta) and cynomolgus (Macaca fascicularis) macaques, two of the most common non-human primate animal models, are limited. METHODS: We analyzed large-scale macaque RNA-based next-generation sequencing (RNAseq) data to identify un-annotated macaque transcripts. RESULTS: For both macaque species, we uncovered thousands of novel isoforms for annotated genes and thousands of un-annotated intergenic transcripts enriched with non-coding RNAs. We also identified thousands of transcript sequences which are partially or completely 'missing' from current macaque genome assemblies. We showed that many newly identified transcripts were differentially expressed during SIV infection of rhesus macaques or during Ebola virus infection of cynomolgus macaques. CONCLUSIONS: For two important macaque species, we uncovered thousands of novel isoforms and un-annotated intergenic transcripts including coding and non-coding RNAs, polyadenylated and non-polyadenylated transcripts. This resource will greatly improve future macaque studies, as demonstrated by their applications in infectious disease studies.


Asunto(s)
Fiebre Hemorrágica Ebola/genética , Macaca fascicularis , Macaca mulatta , Enfermedades de los Monos/genética , Síndrome de Inmunodeficiencia Adquirida del Simio/genética , Transcriptoma , Animales , Ebolavirus/fisiología , Fiebre Hemorrágica Ebola/virología , Secuenciación de Nucleótidos de Alto Rendimiento , India , Mauricio , Datos de Secuencia Molecular , Enfermedades de los Monos/virología , ARN no Traducido/genética , ARN no Traducido/metabolismo , Análisis de Secuencia de ARN , Síndrome de Inmunodeficiencia Adquirida del Simio/virología , Virus de la Inmunodeficiencia de los Simios/fisiología
12.
Nucleic Acids Res ; 41(Database issue): D906-14, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23203872

RESUMEN

RNA-based next-generation sequencing (RNA-Seq) provides a tremendous amount of new information regarding gene and transcript structure, expression and regulation. This is particularly true for non-coding RNAs where whole transcriptome analyses have revealed that the much of the genome is transcribed and that many non-coding transcripts have widespread functionality. However, uniform resources for raw, cleaned and processed RNA-Seq data are sparse for most organisms and this is especially true for non-human primates (NHPs). Here, we describe a large-scale RNA-Seq data and analysis infrastructure, the NHP reference transcriptome resource (http://nhprtr.org); it presently hosts data from12 species of primates, to be expanded to 15 species/subspecies spanning great apes, old world monkeys, new world monkeys and prosimians. Data are collected for each species using pools of RNA from comparable tissues. We provide data access in advance of its deposition at NCBI, as well as browsable tracks of alignments against the human genome using the UCSC genome browser. This resource will continue to host additional RNA-Seq data, alignments and assemblies as they are generated over the coming years and provide a key resource for the annotation of NHP genomes as well as informing primate studies on evolution, reproduction, infection, immunity and pharmacology.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Genómica , Primates/genética , Transcriptoma , Animales , Genoma Humano , Humanos , Internet , Primates/metabolismo , Alineación de Secuencia , Análisis de Secuencia de ARN
13.
Methods Enzymol ; 454: 367-404, 2009.
Artículo en Inglés | MEDLINE | ID: mdl-19216935

RESUMEN

This work presents a new approach to the analysis of aperiodic pulsatile heteroscedastic time-series data, specifically hormone pulsatility. We have utilized growth hormone (GH) concentration time-series data as an example for the utilization of this new algorithm. While many previously published approaches used for the analysis of GH pulsatility are both subjective and cumbersome to use, AutoDecon is a nonsubjective, standardized, and completely automated algorithm. We have employed computer simulations to evaluate the true-positive, the false-positive, the false-negative, and the sensitivity percentages of several of the routinely employed algorithms when applied to GH concentration time-series data. Based on these simulations, it was concluded that this new algorithm provides a substantial improvement over the previous methods. This novel method has many direct applications in addition to hormone pulsatility, for example, to time-domain fluorescence lifetime measurements, as the mathematical forms that describe these experimental systems are both convolution integrals.


Asunto(s)
Algoritmos , Programas Informáticos
14.
Anal Biochem ; 381(1): 8-17, 2008 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-18639514

RESUMEN

Hormone signaling is often pulsatile, and multiparameter deconvolution procedures have long been used to identify and characterize secretory events. However, the existing programs have serious limitations, including the subjective nature of initial peak selection, lack of statistical verification of presumed bursts, and user-unfriendliness of the application. Here we describe a novel deconvolution program, AutoDecon, which addresses these concerns. We validate AutoDecon for application to serum luteinizing hormone (LH) concentration time series using synthetic data mimicking real data from normal women and then comparing the performance of AutoDecon with the performance of the widely employed hormone pulsatility analysis program Cluster. The sensitivity of AutoDecon is higher than that of Cluster ( approximately 96% vs. 80%, P=0.001). However, Cluster had a lower false-positive detection rate than did AutoDecon (6% vs. 1%, P=0.001). Further analysis demonstrated that the pulsatility parameters recovered by AutoDecon were indistinguishable from those characterizing the synthetic data and that sampling at 5- or 10-min intervals was optimal for maximizing the sensitivity rates for LH. Accordingly, AutoDecon presents a viable nonsubjective alternative to previous pulse detection algorithms for the analysis of LH data. It is applicable to other pulsatile hormone concentration time series and many other pulsatile phenomena. The software is free and downloadable at http://mljohnson.pharm.virginia.edu/home.html.


Asunto(s)
Algoritmos , Hormona Luteinizante/metabolismo , Modelos Biológicos , Programas Informáticos , Adulto , Animales , Reacciones Falso Positivas , Femenino , Hormona Liberadora de Gonadotropina/sangre , Semivida , Humanos , Hormona Luteinizante/sangre , Ciclo Menstrual/sangre , Persona de Mediana Edad , Posmenopausia/sangre , Premenopausia/sangre , Reproducibilidad de los Resultados , Ovinos , Factores de Tiempo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...