Pesquisa | Portal de Pesquisa da BVS Enfermagem

1.

How to optimally sample a sequence for rapid analysis.

Frith, Martin C; Shaw, Jim; Spouge, John L.

Bioinformatics ; 39(2)2023 02 03.

Artigo em Inglês | MEDLINE | ID: mdl-36702468

RESUMO

MOTIVATION: We face an increasing flood of genetic sequence data, from diverse sources, requiring rapid computational analysis. Rapid analysis can be achieved by sampling a subset of positions in each sequence. Previous sequence-sampling methods, such as minimizers, syncmers and minimally overlapping words, were developed by heuristic intuition, and are not optimal. RESULTS: We present a sequence-sampling approach that provably optimizes sensitivity for a whole class of sequence comparison methods, for randomly evolving sequences. It is likely near-optimal for a wide range of alignment-based and alignment-free analyses. For real biological DNA, it increases specificity by avoiding simple repeats. Our approach generalizes universal hitting sets (which guarantee to sample a sequence at least once) and polar sets (which guarantee to sample a sequence at most once). This helps us understand how to do rapid sequence analysis as accurately as possible. AVAILABILITY AND IMPLEMENTATION: Source code is freely available at https://gitlab.com/mcfrith/noverlap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Software , Análise de Sequência de DNA/métodos

2.

Discovery of Kaposi's sarcoma herpesvirus-encoded circular RNAs and a human antiviral circular RNA.

Tagawa, Takanobu; Gao, Shaojian; Koparde, Vishal N; Gonzalez, Mileidy; Spouge, John L; Serquiña, Anna P; Lurain, Kathryn; Ramaswami, Ramya; Uldrick, Thomas S; Yarchoan, Robert; Ziegelbauer, Joseph M.

Proc Natl Acad Sci U S A ; 115(50): 12805-12810, 2018 12 11.

Artigo em Inglês | MEDLINE | ID: mdl-30455306

RESUMO

Noncoding RNAs have substantial effects in host-virus interactions. Circular RNAs (circRNAs) are novel single-stranded noncoding RNAs which can decoy other RNAs or RNA-binding proteins to inhibit their functions. The role of circRNAs is largely unknown in the context of Kaposi's sarcoma herpesvirus (KSHV). We hypothesized that circRNAs influence viral infection by inhibiting host and/or viral factors. Transcriptome analysis of KSHV-infected primary endothelial cells and a B cell line identified human circRNAs that are differentially regulated upon infection. We confirmed the expression changes with divergent PCR primers and RNase R treatment of specific circRNAs. Ectopic expression of hsa_circ_0001400, a circRNA induced by infection, suppressed expression of key viral latent gene LANA and lytic gene RTA in KSHV de novo infections. Since human herpesviruses express noncoding RNAs like microRNAs, we searched for viral circRNAs encoded in the KSHV genome. We performed circRNA-Seq analysis with RNase R-treated, circRNA-enriched RNA from KSHV-infected cells. We identified multiple circRNAs encoded by the KSHV genome that are expressed in KSHV-infected endothelial cells and primary effusion lymphoma (PEL) cells. The KSHV circRNAs are located within ORFs of viral lytic genes, are up-regulated upon the induction of the lytic cycle, and alter cell growth. Viral circRNAs were also detected in lymph nodes from patients of KSHV-driven diseases such as PEL, Kaposi's sarcoma, and multicentric Castleman's disease. We revealed new host-virus interactions of circRNAs: human antiviral circRNAs are activated in response to KSHV infection, and viral circRNA expression is induced in the lytic phase of infection.

Assuntos

Herpesvirus Humano 8/genética , RNA/genética , Sarcoma de Kaposi/genética , Sarcoma de Kaposi/virologia , Linfócitos B/virologia , Hiperplasia do Linfonodo Gigante/genética , Hiperplasia do Linfonodo Gigante/virologia , Linhagem Celular , Células Endoteliais/virologia , Perfilação da Expressão Gênica/métodos , Regulação Viral da Expressão Gênica/genética , Genes Virais/genética , Células HEK293 , Células Endoteliais da Veia Umbilical Humana , Humanos , Linfoma de Efusão Primária/genética , Linfoma de Efusão Primária/virologia , MicroRNAs/genética , Fases de Leitura Aberta/genética , RNA Circular , RNA Viral/genética

3.

Hemolysis Pathways during Storage of Erythrocytes and Inter-Donor Variability in Erythrocyte Morphology.

Melzak, Kathryn A; Spouge, John L; Boecker, Clemens; Kirschhöfer, Frank; Brenner-Weiss, Gerald; Bieback, Karen.

Transfus Med Hemother ; 48(1): 39-47, 2021 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-33708051

RESUMO

BACKGROUND: Red blood cells (RBCs) stored for transfusions can lyse over the course of the storage period. The lysis is traditionally assumed to occur via the formation of spiculated echinocyte forms, so that cells that appear smoother are assumed to have better storage quality. We investigate this hypothesis by comparing the morphological distribution to the hemolysis for samples from different donors. METHODS: Red cell concentrates were obtained from a regional blood bank quality control laboratory. Out of 636 units processed by the laboratory, we obtained 26 high hemolysis units and 24 low hemolysis units for assessment of RBC morphology. The association between the morphology and the hemolysis was tested with the Wilcoxon-Mann-Whitney U test. RESULTS: Samples with high stomatocyte counts (p = 0.0012) were associated with increased hemolysis, implying that cells can lyse via the formation of stomatocytes. CONCLUSION: RBCs can lyse without significant echinocyte formation. Lower degrees of spiculation are not a good indicator of low hemolysis when RBCs from different donors are compared.

4.

Patterns of conserved gp120 epitope presentation on attached HIV-1 virions.

Mengistu, Meron; Tang, Ai-Hui; Foulke, James S; Blanpied, Thomas A; Gonzalez, Mileidy W; Spouge, John L; Gallo, Robert C; Lewis, George K; DeVico, Anthony L.

Proc Natl Acad Sci U S A ; 114(46): E9893-E9902, 2017 11 14.

Artigo em Inglês | MEDLINE | ID: mdl-29087304

RESUMO

A complete picture of HIV antigenicity during early replication is needed to elucidate the full range of options for controlling infection. Such information is frequently gained through analyses of isolated viral envelope antigens, host CD4 receptors, and cognate antibodies. However, direct examination of viral particles and virus-cell interactions is now possible via advanced microscopy techniques and reagents. Using such methods, we recently determined that CD4-induced (CD4i) transition state epitopes in the HIV surface antigen, gp120, while not exposed on free particles, rapidly become immunoreactive upon virus-cell binding. Here, we use 3D direct stochastic optical reconstruction microscopy (dSTORM) to show that certain CD4i epitopes specific to transition state structures are exposed across the surface of cell-bound virions, thus explaining their immunoreactivity. Moreover, such structures and their marker epitopes are dispersed to regions of virions distal to CD4 contact. We further show that the appearance and positioning of distal CD4i exposures is partially dependent on Gag maturation and intact matrix-gp41 interactions within the virion. Collectively, these observations provide a unique perspective of HIV during early replication. These features may define unique insights for understanding how humoral responses target virions and for developing related antiviral countermeasures.

Assuntos

Epitopos/imunologia , Proteína gp120 do Envelope de HIV/imunologia , Infecções por HIV/virologia , HIV-1/imunologia , Vírion/imunologia , Ligação Viral , Antígenos CD4/metabolismo , Contagem de Linfócito CD4 , Linhagem Celular , Epitopos/química , Anticorpos Anti-HIV/imunologia , Antígenos HIV/imunologia , Proteína gp120 do Envelope de HIV/química , Proteína gp41 do Envelope de HIV/química , Proteína gp41 do Envelope de HIV/imunologia , Infecções por HIV/imunologia , HIV-1/química , Humanos , Vírion/química , Vírion/metabolismo

5.

MultiDomainBenchmark: a multi-domain query and subject database suite.

Carroll, Hyrum D; Spouge, John L; Gonzalez, Mileidy.

BMC Bioinformatics ; 20(1): 77, 2019 Feb 14.

Artigo em Inglês | MEDLINE | ID: mdl-30764761

RESUMO

BACKGROUND: Genetic sequence database retrieval benchmarks play an essential role in evaluating the performance of sequence searching tools. To date, all phylogenetically diverse benchmarks known to the authors include only query sequences with single protein domains. Domains are the primary building blocks of protein structure and function. Independently, each domain can fulfill a single function, but most proteins (>80% in Metazoa) exist as multi-domain proteins. Multiple domain units combine in various arrangements or architectures to create different functions and are often under evolutionary pressures to yield new ones. Thus, it is crucial to create gold standards reflecting the multi-domain complexity of real proteins to more accurately evaluate sequence searching tools. DESCRIPTION: This work introduces MultiDomainBenchmark (MDB), a database suite of 412 curated multi-domain queries and 227,512 target sequences, representing at least 5108 species and 1123 phylogenetically divergent protein families, their relevancy annotation, and domain location. Here, we use the benchmark to evaluate the performance of two commonly used sequence searching tools, BLAST/PSI-BLAST and HMMER. Additionally, we introduce a novel classification technique for multi-domain proteins to evaluate how well an algorithm recovers a domain architecture. CONCLUSION: MDB is publicly available at http://csc.columbusstate.edu/carroll/MDB/ .

Assuntos

Algoritmos , Benchmarking , Bases de Dados de Proteínas , Proteínas/química , Sequência de Aminoácidos , Filogenia , Estrutura Terciária de Proteína , Alinhamento de Sequência

6.

An accurate approximation for the expected site frequency spectrum in a Galton-Watson process under an infinite sites mutation model.

Spouge, John L.

Theor Popul Biol ; 127: 7-15, 2019 06.

Artigo em Inglês | MEDLINE | ID: mdl-30876864

RESUMO

If viruses or other pathogens infect a single host, the outcome of infection often hinges on the fate of the initial invaders. The initial basic reproduction number R0, the expected number of cells infected by a single infected cell, helps determine whether the initial viruses can establish a successful beachhead. To determine R0, the Kingman coalescent or continuous-time birth-and-death process can be used to infer the rate of exponential growth in an historical population. Given M sequences sampled in the present, the two models can make the inference from the site frequency spectrum (SFS), the count of mutations that appear in exactly k sequences (k=1,2,,M). In the case of viruses, however, if R0 is large and an infected cell bursts while propagating virus, the two models are suspect, because they are Markovian with only binary branching. Accordingly, this article develops an approximation for the SFS of a discrete-time branching process with synchronous generations (i.e., a Galton-Watson process). When evaluated in simulations with an asynchronous, non-Markovian model (a Bellman-Harris process) with parameters intended to mimic the bursting viral reproduction of HIV, the approximation proved superior to approximations derived from the Kingman coalescent or continuous-time birth-and-death process. This article demonstrates that in analogy to methods in human genetics, the SFS of viral sequences sampled well after latent infection can remain informative about the initial R0. Thus, it suggests the utility of analyzing the SFS of sequences derived from patient and animal trials of viral therapies, because in some cases, the initial R0 may be able to indicate subtle therapeutic progress, even in the absence of statistically significant differences in the infection of treatment and control groups.

Assuntos

Frequência do Gene/genética , Modelos Genéticos , Mutação , Virulência/genética , Vírus/genética , Número Básico de Reprodução , Humanos

7.

Alignment-free genome comparison enables accurate geographic sourcing of white oak DNA.

Tang, Kujin; Ren, Jie; Cronn, Richard; Erickson, David L; Milligan, Brook G; Parker-Forney, Meaghan; Spouge, John L; Sun, Fengzhu.

BMC Genomics ; 19(1): 896, 2018 Dec 10.

Artigo em Inglês | MEDLINE | ID: mdl-30526482

RESUMO

BACKGROUND: The application of genomic data and bioinformatics for the identification of restricted or illegally-sourced natural products is urgently needed. The taxonomic identity and geographic provenance of raw and processed materials have implications in sustainable-use commercial practices, and relevance to the enforcement of laws that regulate or restrict illegally harvested materials, such as timber. Improvements in genomics make it possible to capture and sequence partial-to-complete genomes from challenging tissues, such as wood and wood products. RESULTS: In this paper, we report the success of an alignment-free genome comparison method, [Formula: see text] that differentiates different geographic sources of white oak (Quercus) species with a high level of accuracy with very small amount of genomic data. The method is robust to sequencing errors, different sequencing laboratories and sequencing platforms. CONCLUSIONS: This method offers an approach based on genome-scale data, rather than panels of pre-selected markers for specific taxa. The method provides a generalizable platform for the identification and sourcing of materials using a unified next generation sequencing and analysis framework.

Assuntos

DNA de Plantas/genética , Genoma de Planta , Geografia , Quercus/genética , Alinhamento de Sequência/métodos , Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Componente Principal

8.

A closed formula relevant to 'Theory of local k-mer selection with applications to long-read alignment' by Jim Shaw and Yun William Yu.

Spouge, John L.

Bioinformatics ; 38(20): 4848-4849, 2022 10 14.

Artigo em Inglês | MEDLINE | ID: mdl-36063041

Assuntos

Alinhamento de Sequência , Análise de Sequência de DNA

9.

Empirical null estimation using zero-inflated discrete mixture distributions and its application to protein domain data.

Gauran, Iris Ivy M; Park, Junyong; Lim, Johan; Park, DoHwan; Zylstra, John; Peterson, Thomas; Kann, Maricel; Spouge, John L.

Biometrics ; 74(2): 458-471, 2018 06.

Artigo em Inglês | MEDLINE | ID: mdl-28940296

RESUMO

In recent mutation studies, analyses based on protein domain positions are gaining popularity over gene-centric approaches since the latter have limitations in considering the functional context that the position of the mutation provides. This presents a large-scale simultaneous inference problem, with hundreds of hypothesis tests to consider at the same time. This article aims to select significant mutation counts while controlling a given level of Type I error via False Discovery Rate (FDR) procedures. One main assumption is that the mutation counts follow a zero-inflated model in order to account for the true zeros in the count model and the excess zeros. The class of models considered is the Zero-inflated Generalized Poisson (ZIGP) distribution. Furthermore, we assumed that there exists a cut-off value such that smaller counts than this value are generated from the null distribution. We present several data-dependent methods to determine the cut-off value. We also consider a two-stage procedure based on screening process so that the number of mutations exceeding a certain value should be considered as significant mutations. Simulated and protein domain data sets are used to illustrate this procedure in estimation of the empirical null using a mixture of discrete distributions. Overall, while maintaining control of the FDR, the proposed two-stage testing procedure has superior empirical power.

Assuntos

Biometria/métodos , Interpretação Estatística de Dados , Domínios Proteicos , Distribuições Estatísticas , Análise Mutacional de DNA , Bases de Dados de Proteínas , Humanos , Taxa de Mutação , Distribuição de Poisson

10.

Conserved signatures indicate HIV-1 transmission is under strong selection and thus is not a "stochastic" process.

Gonzalez, Mileidy; DeVico, Anthony L; Spouge, John L.

Retrovirology ; 14(1): 13, 2017 02 24.

Artigo em Inglês | MEDLINE | ID: mdl-28231858

RESUMO

Recently, Oberle et al. published a paper in Retrovirology evaluating the question of whether selection plays a role in HIV transmission. The Oberle study found no obvious genotypic or phenotypic differences between donors and recipients of epidemiologically linked pairs from the Swiss cohort. Thus, Oberle et al. characterized HIV-1 B transmission as largely "stochastic", an imprecise and potentially misleading term. Here, we re-analyzed their data and placed them in the context of transmission data for over 20 other human and animal trials. The present study finds that the transmitted/founder (T/F) viruses from the Swiss cohort show the same non-random genetic signatures conserved in 118 HIV-1, 40 SHIV, and 12 SIV T/F viruses previously published by two independent groups. We provide alternative interpretations of the Swiss cohort data and conclude that the sequences of their donor viruses lacked variability at the specific sites where other studies were able to demonstrate genotypic selection. Oberle et al. observed no phenotypic selection in vitro, so the problem of determining the in vivo phenotypic mechanisms that cause genotypic selection in HIV remains open.

Assuntos

Infecções por HIV , HIV-1/genética , Animais , Genótipo , Humanos

11.

ALP & FALP: C++ libraries for pairwise local alignment E-values.

Sheetlin, Sergey; Park, Yonil; Frith, Martin C; Spouge, John L.

Bioinformatics ; 32(2): 304-5, 2016 Jan 15.

Artigo em Inglês | MEDLINE | ID: mdl-26428291

RESUMO

MOTIVATION: Pairwise local alignment is an indispensable tool for molecular biologists. In real time (i.e. in about 1 s), ALP (Ascending Ladder Program) calculates the E-values for protein-protein or DNA-DNA local alignments of random sequences, for arbitrary substitution score matrix, gap costs and letter abundances; and FALP (Frameshift Ascending Ladder Program) performs a similar task, although more slowly, for frameshifting DNA-protein alignments. AVAILABILITY AND IMPLEMENTATION: To permit other C++ programmers to implement the computational efficiencies in ALP and FALP directly within their own programs, C++ source codes are available in the public domain at http://go.usa.gov/3GTSW under 'ALP' and 'FALP', along with the standalone programs ALP and FALP. CONTACT: spouge@nih.gov SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Biologia Computacional/métodos , DNA/química , Proteínas/química , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos , Software , DNA/metabolismo , Bases de Dados Factuais , Humanos , Proteínas/metabolismo , Alinhamento de Sequência

12.

Most of the tight positional conservation of transcription factor binding sites near the transcription start site reflects their co-localization within regulatory modules.

Acevedo-Luna, Natalia; Mariño-Ramírez, Leonardo; Halbert, Armand; Hansen, Ulla; Landsman, David; Spouge, John L.

BMC Bioinformatics ; 17(1): 479, 2016 Nov 21.

Artigo em Inglês | MEDLINE | ID: mdl-27871221

RESUMO

BACKGROUND: Transcription factors (TFs) form complexes that bind regulatory modules (RMs) within DNA, to control specific sets of genes. Some transcription factor binding sites (TFBSs) near the transcription start site (TSS) display tight positional preferences relative to the TSS. Furthermore, near the TSS, RMs can co-localize TFBSs with each other and the TSS. The proportion of TFBS positional preferences due to TFBS co-localization within RMs is unknown, however. ChIP experiments confirm co-localization of some TFBSs genome-wide, including near the TSS, but they typically examine only a few TFs at a time, using non-physiological conditions that can vary from lab to lab. In contrast, sequence analysis can examine many TFs uniformly and methodically, broadly surveying the co-localization of TFBSs with tight positional preferences relative to the TSS. RESULTS: Our statistics found 43 significant sets of human motifs in the JASPAR TF Database with positional preferences relative to the TSS, with 38 preferences tight (±5 bp). Each set of motifs corresponded to a gene group of 135 to 3304 genes, with 42/43 (98%) gene groups independently validated by DAVID, a gene ontology database, with FDR < 0.05. Motifs corresponding to two TFBSs in a RM should co-occur more than by chance alone, enriching the intersection of the gene groups corresponding to the two TFs. Thus, a gene-group intersection systematically enriched beyond chance alone provides evidence that the two TFs participate in an RM. Of the 903 = 43*42/2 intersections of the 43 significant gene groups, we found 768/903 (85%) pairs of gene groups with significantly enriched intersections, with 564/768 (73%) intersections independently validated by DAVID with FDR < 0.05. A user-friendly web site at http://go.usa.gov/3kjsH permits biologists to explore the interaction network of our TFBSs to identify candidate subunit RMs. CONCLUSIONS: Gene duplication and convergent evolution within a genome provide obvious biological mechanisms for replicating an RM near the TSS that binds a particular TF subunit. Of all intersections of our 43 significant gene groups, 85% were significantly enriched, with 73% of the significant enrichments independently validated by gene ontology. The co-localization of TFBSs within RMs therefore likely explains much of the tight TFBS positional preferences near the TSS.

Assuntos

DNA/metabolismo , Regulação da Expressão Gênica , Elementos Reguladores de Transcrição/genética , Análise de Sequência de DNA/métodos , Fatores de Transcrição/metabolismo , Sítio de Iniciação de Transcrição , Sítios de Ligação , DNA/química , DNA/genética , Ontologia Genética , Humanos , Ligação Proteica

13.

A new method for estimating species age supports the coexistence of malaria parasites and their Mammalian hosts.

Silva, Joana C; Egan, Amy; Arze, Cesar; Spouge, John L; Harris, David G.

Mol Biol Evol ; 32(5): 1354-64, 2015 May.

Artigo em Inglês | MEDLINE | ID: mdl-25589738

RESUMO

Species in the genus Plasmodium cause malaria in humans and infect a variety of mammals and other vertebrates. Currently, estimated ages for several mammalian Plasmodium parasites differ by as much as one order of magnitude, an inaccuracy that frustrates reliable estimation of evolutionary rates of disease-related traits. We developed a novel statistical approach to dating the relative age of evolutionary lineages, based on Total Least Squares regression. We validated this lineage dating approach by applying it to the genus Drosophila. Using data from the Drosophila 12 Genomes project, our approach accurately reconstructs the age of well-established Drosophila clades, including the speciation event that led to the subgenera Drosophila and Sophophora, and age of the melanogaster species subgroup. We applied this approach to hundreds of loci from seven mammalian Plasmodium species. We demonstrate the existence of a molecular clock specific to individual Plasmodium proteins, and estimate the relative age of mammalian-infecting Plasmodium. These analyses indicate that: 1) the split between the human parasite Plasmodium vivax and P. knowlesi, from Old World monkeys, occurred 6.1 times earlier than that between P. falciparum and P. reichenowi, parasites of humans and chimpanzees, respectively; and 2) mammalian Plasmodium parasites originated 22 times earlier than the split between P. falciparum and P. reichenowi. Calibrating the absolute divergence times for Plasmodium with eukaryotic substitution rates, we show that the split between P. falciparum and P. reichenowi occurred 3.0-5.5 Ma, and that mammalian Plasmodium parasites originated over 64 Ma. Our results indicate that mammalian-infecting Plasmodium evolved contemporaneously with their hosts, with little evidence for parasite host-switching on an evolutionary scale, and provide a solid timeframe within which to place the evolution of new Plasmodium species.

Assuntos

Evolução Molecular , Interações Hospedeiro-Parasita/genética , Malária Falciparum/genética , Plasmodium falciparum/genética , Animais , Humanos , Malária Falciparum/parasitologia , Pan troglodytes/genética , Filogenia , Plasmodium falciparum/patogenicidade , Plasmodium vivax/genética , Plasmodium vivax/patogenicidade , Alinhamento de Sequência

14.

Conserved molecular signatures in gp120 are associated with the genetic bottleneck during simian immunodeficiency virus (SIV), SIV-human immunodeficiency virus (SHIV), and HIV type 1 (HIV-1) transmission.

Gonzalez, Mileidy W; DeVico, Anthony L; Lewis, George K; Spouge, John L.

J Virol ; 89(7): 3619-29, 2015 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-25589663

RESUMO

UNLABELLED: Human immunodeficiency virus (HIV) transmission typically results from infection by a single transmitted/founder (T/F) variant. Are T/F variants chosen uniformly at random from the donor pool, or are they selected based on advantageous traits facilitating transmission? Finding evidence for selection during transmission is of particular interest, because it would indicate that phenotypic and/or genetic properties of the viruses might be harnessed as potential vaccine targets or immunotherapies. Here, we systematically evaluated the differences between the Env proteins of simian immunodeficiency virus/simian HIV (SIV/SHIV) stock and T/F variants in search of "signature" sites of transmission. We also surveyed residue preferences in HIV at the SIV/SHIV signature sites. Four sites of gp120 showed significant selection, and an additional two sites showed a similar trend. Therefore, the six sites clearly differentiate T/F viruses from the majority of circulating variants in the stocks. The selection of SIV/SHIV could be inferred reasonably across both vaccinated and unvaccinated subjects, with infections resulting from vaginal, rectal, and intravenous routes of transmission and regardless of viral dosage. The evidence for selection in SIV and SHIV T/F variants is strong and plentiful, and in HIV the evidence is suggestive though commensurate with the availability of suitable data for analysis. Two of the signature residues are completely conserved across the SIV, SHIV, and HIV variants we examined. Five of the signature residues map to the C1 region of gp120 and one to the signal peptide. Our data raise the possibility that C1, while governing the association between gp120 and gp41, modulates transmission efficiency, replicative fitness, and/or host cell tropism at the level of virus-cell attachment and entry. IMPORTANCE: The present study finds significant evidence of selection on gp120 molecules of SIV/SHIV T/F viruses. The data provide ancillary evidence suggesting the same sites are under selection in HIV. Our findings suggest that the signature residues are involved in increasing the transmissibility of infecting viruses; therefore, they are potential targets for developing a vaccine or other protective measures. A recent study identified the same T/F signature motif but interpreted it as an effect of neutralization resistance. Here, we show that the T/F motif has broader functional significance beyond neutralization sensitivity, because it is present in nonimmune subjects. Also, a vaccine regimen popular in animal trials might have increased the transmission of variants with otherwise low transmission fitness. Our observations might explain why many animal vaccine trials have not faithfully predicted outcomes in human vaccine trials and suggest that current practices in vaccine design need to be reexamined accordingly.

Assuntos

Sequência Conservada , Proteína gp120 do Envelope de HIV/genética , Proteína gp120 do Envelope de HIV/metabolismo , Infecções por HIV/transmissão , Glicoproteínas de Membrana/genética , Glicoproteínas de Membrana/metabolismo , Síndrome de Imunodeficiência Adquirida dos Símios/transmissão , Proteínas do Envelope Viral/genética , Proteínas do Envelope Viral/metabolismo , Animais , Feminino , Genótipo , HIV/genética , HIV/fisiologia , Infecções por HIV/virologia , Humanos , Macaca mulatta , Masculino , Seleção Genética , Síndrome de Imunodeficiência Adquirida dos Símios/virologia , Vírus da Imunodeficiência Símia/genética , Vírus da Imunodeficiência Símia/fisiologia , Tropismo Viral , Ligação Viral , Replicação Viral

15.

Frameshift alignment: statistics and post-genomic applications.

Sheetlin, Sergey L; Park, Yonil; Frith, Martin C; Spouge, John L.

Bioinformatics ; 30(24): 3575-82, 2014 Dec 15.

Artigo em Inglês | MEDLINE | ID: mdl-25172925

RESUMO

MOTIVATION: The alignment of DNA sequences to proteins, allowing for frameshifts, is a classic method in sequence analysis. It can help identify pseudogenes (which accumulate mutations), analyze raw DNA and RNA sequence data (which may have frameshift sequencing errors), investigate ribosomal frameshifts, etc. Often, however, only ad hoc approximations or simulations are available to provide the statistical significance of a frameshift alignment score. RESULTS: We describe a method to estimate statistical significance of frameshift alignments, similar to classic BLAST statistics. (BLAST presently does not permit its alignments to include frameshifts.) We also illustrate the continuing usefulness of frameshift alignment with two 'post-genomic' applications: (i) when finding pseudogenes within the human genome, frameshift alignments show that most anciently conserved non-coding human elements are recent pseudogenes with conserved ancestral genes; and (ii) when analyzing metagenomic DNA reads from polluted soil, frameshift alignments show that most alignable metagenomic reads contain frameshifts, suggesting that metagenomic analysis needs to use frameshift alignment to derive accurate results.

Assuntos

Mutação da Fase de Leitura , Alinhamento de Sequência/métodos , Algoritmos , Interpretação Estatística de Dados , Genoma Humano , Genômica , Humanos , Metagenômica , Pseudogenes , Análise de Sequência de DNA , Análise de Sequência de Proteína , Análise de Sequência de RNA , Software

16.

Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi.

Schoch, Conrad L; Seifert, Keith A; Huhndorf, Sabine; Robert, Vincent; Spouge, John L; Levesque, C André; Chen, Wen.

Proc Natl Acad Sci U S A ; 109(16): 6241-6, 2012 Apr 17.

Artigo em Inglês | MEDLINE | ID: mdl-22454494

RESUMO

Six DNA regions were evaluated as potential DNA barcodes for Fungi, the second largest kingdom of eukaryotic life, by a multinational, multilaboratory consortium. The region of the mitochondrial cytochrome c oxidase subunit 1 used as the animal barcode was excluded as a potential marker, because it is difficult to amplify in fungi, often includes large introns, and can be insufficiently variable. Three subunits from the nuclear ribosomal RNA cistron were compared together with regions of three representative protein-coding genes (largest subunit of RNA polymerase II, second largest subunit of RNA polymerase II, and minichromosome maintenance protein). Although the protein-coding gene regions often had a higher percent of correct identification compared with ribosomal markers, low PCR amplification and sequencing success eliminated them as candidates for a universal fungal barcode. Among the regions of the ribosomal cistron, the internal transcribed spacer (ITS) region has the highest probability of successful identification for the broadest range of fungi, with the most clearly defined barcode gap between inter- and intraspecific variation. The nuclear ribosomal large subunit, a popular phylogenetic marker in certain groups, had superior species resolution in some taxonomic groups, such as the early diverging lineages and the ascomycete yeasts, but was otherwise slightly inferior to the ITS. The nuclear ribosomal small subunit has poor species-level resolution in fungi. ITS will be formally proposed for adoption as the primary fungal barcode marker to the Consortium for the Barcode of Life, with the possibility that supplementary barcodes may be developed for particular narrowly circumscribed taxonomic groups.

Assuntos

Código de Barras de DNA Taxonômico/métodos , DNA Fúngico/genética , DNA Espaçador Ribossômico/genética , Fungos/genética , Núcleo Celular/genética , Fungos/classificação , Filogenia , Reação em Cadeia da Polimerase , Reprodutibilidade dos Testes , Especificidade da Espécie

17.

Within a sample from a population, the distribution of the number of descendants of a subsample's most recent common ancestor.

Spouge, John L.

Theor Popul Biol ; 92: 51-4, 2014 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-24321308

RESUMO

Sample n individuals uniformly at random from a population, and then sample m individuals uniformly at random from the sample. Consider the most recent common ancestor (MRCA) of the subsample of m individuals. Let the subsample MRCA have j descendants in the sample (m â©½ j â©½ n). Under a Moran or coalescent model (and therefore under many other models), the probability that j = n is known. In this case, the subsample MRCA is an ancestor of every sampled individual, and the subsample and sample MRCAs are identical. The probability that j = m is also known. In this case, the subsample MRCA is an ancestor of no sampled individual outside the subsample. This article derives the complete distribution of j, enabling inferences from the corresponding p-value. The text presents hypothetical statistical applications pertinent to taxonomy (the gene flow between Neanderthals and anatomically modern humans) and medicine (the association of genetic markers with disease).

Assuntos

Genética Populacional , Modelos Teóricos

18.

NEXT-peak: a normal-exponential two-peak model for peak-calling in ChIP-seq data.

Kim, Nak-Kyeong; Jayatillake, Rasika V; Spouge, John L.

BMC Genomics ; 14: 349, 2013 May 25.

Artigo em Inglês | MEDLINE | ID: mdl-23706083

RESUMO

BACKGROUND: Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) can locate transcription factor binding sites on genomic scale. Although many models and programs are available to call peaks, none has dominated its competition in comparison studies. RESULTS: We propose a rigorous statistical model, the normal-exponential two-peak (NEXT-peak) model, which parallels the physical processes generating the empirical data, and which can naturally incorporate mappability information. The model therefore estimates total strength of binding (even if some binding locations do not map uniquely into a reference genome, effectively censoring them); it also assigns an error to an estimated binding location. The comparison study with existing programs on real ChIP-seq datasets (STAT1, NRSF, and ZNF143) demonstrates that the NEXT-peak model performs well both in calling peaks and locating them. The model also provides a goodness-of-fit test, to screen out spurious peaks and to infer multiple binding events in a region. CONCLUSIONS: The NEXT-peak program calls peaks on any test dataset about as accurately as any other, but provides unusual accuracy in the estimated location of the peaks it calls. NEXT-peak is based on rigorous statistics, so its model also provides a principled foundation for a more elaborate statistical analysis of ChIP-seq data.

Assuntos

Imunoprecipitação da Cromatina/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Modelos Estatísticos , Motivos de Nucleotídeos , Software

19.

Estimating age-stratified transmission and reproduction numbers during the early exponential phase of an epidemic: A case study with COVID-19 data.

Stanke, Zachary; Spouge, John L.

Epidemics ; 44: 100714, 2023 09.

Artigo em Inglês | MEDLINE | ID: mdl-37595401

RESUMO

In a pending pandemic, early knowledge of age-specific disease parameters, e.g., susceptibility, infectivity, and the clinical fraction (the fraction of infections coming to clinical attention), supports targeted public health responses like school closures or sequestration of the elderly. The earlier the knowledge, the more useful it is, so the present article examines an early phase of many epidemics, exponential growth. Using age-stratified COVID-19 case counts collected in Canada, China, Israel, Italy, the Netherlands, and the United Kingdom before April 23, 2020, we present a linear analysis of the exponential phase that attempts to estimate the age-specific disease parameters given above. Some combinations of the parameters can be estimated by requiring that they change smoothly with age. The estimation yielded: (1) the case susceptibility, defined for each age-group as the product of susceptibility to infection and the clinical fraction; (2) the mean number of transmissions of infection per contact within each age-group; and (3) the reproduction number of infection within each age-group, i.e., the diagonal of the age-stratified next-generation matrix. Our restriction to data from the exponential phase indicates the combinations of epidemic parameters that are intrinsically easiest to estimate with early age-stratified case counts. For example, conclusions concerning the age-dependence of case susceptibility appeared more robust than corresponding conclusions about infectivity. Generally, the analysis produced some results consistent with conclusions confirmed much later in the COVID-19 pandemic. Notably, our analysis showed that in some countries, the reproduction number of infection within the half-decade 70-75 was unusually large compared to other half-decades. Our analysis therefore could have anticipated that without countermeasures, COVID-19 would spread rapidly once seeded in homes for the elderly.

Assuntos

COVID-19 , Pandemias , Idoso , Humanos , COVID-19/epidemiologia , Saúde Pública , Canadá , Reprodução

20.

Ancestry-attenuated effects of socioeconomic deprivation on type 2 diabetes disparities in the All of Us cohort.

Lam, Vincent; Sharma, Shivam; Gupta, Sonali; Spouge, John L; Jordan, I King; Mariño-Ramírez, Leonardo.

BMC Glob Public Health ; 12023.

Artigo em Inglês | MEDLINE | ID: mdl-38045036

RESUMO

Background: Diabetes is a common disease with a major burden on morbidity, mortality, and productivity. Type 2 diabetes (T2D) accounts for roughly 90% of all diabetes cases in the USA and has a greater observed prevalence among those who identify as Black or Hispanic. Methods: This study aimed to assess T2D racial and ethnic disparities using the All of Us Research Program data and to measure associations between genetic ancestry (GA), socioeconomic deprivation, and T2D. We used the All of Us Researcher Workbench to analyze T2D prevalence and model its associations with GA, individual-level (iSDI), and zip code-based (zSDI) socioeconomic deprivation indices among participant self-identified race and ethnicity (SIRE) groups. Results: The study cohort of 86,488 participants from the four largest SIRE groups in All of Us: Asian (n = 2311), Black (n = 16,282), Hispanic (n = 16,966), and White (n = 50,292). SIRE groups show characteristic genetic ancestry patterns, consistent with their diverse origins, together with a continuum of ancestry fractions within and between groups. The Black and Hispanic groups show the highest levels of socioeconomic deprivation, followed by the Asian and White groups. Black participants show the highest age- and sex-adjusted T2D prevalence (21.9%), followed by the Hispanic (19.9%), Asian (15.1%), and White (14.8%) groups. Minority SIRE groups and socioeconomic deprivation, both iSDI and zSDI, are positively associated with T2D, when the entire cohort is analyzed together. However, SIRE and GA both show negative interaction effects with iSDI and zSDI on T2D. Higher levels of iSDI and zSDI are negatively associated with T2D in the Black and Hispanic groups, and higher levels of iSDI and zSDI are negatively associated with T2D at high levels of African and Native American ancestry. Conclusions: Socioeconomic deprivation is associated with a higher prevalence of T2D in Black and Hispanic minority groups, compared to the majority White group. Nonetheless, socioeconomic deprivation is associated with reduced T2D risk within the Black and Hispanic groups. These results are paradoxical and have not been reported elsewhere, with possible explanations related to the nature of the All of Us data along with SIRE group differences in access to healthcare, diet, and lifestyle.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA