Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 56
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Bioinformatics ; 39(2)2023 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-36702468

RESUMEN

MOTIVATION: We face an increasing flood of genetic sequence data, from diverse sources, requiring rapid computational analysis. Rapid analysis can be achieved by sampling a subset of positions in each sequence. Previous sequence-sampling methods, such as minimizers, syncmers and minimally overlapping words, were developed by heuristic intuition, and are not optimal. RESULTS: We present a sequence-sampling approach that provably optimizes sensitivity for a whole class of sequence comparison methods, for randomly evolving sequences. It is likely near-optimal for a wide range of alignment-based and alignment-free analyses. For real biological DNA, it increases specificity by avoiding simple repeats. Our approach generalizes universal hitting sets (which guarantee to sample a sequence at least once) and polar sets (which guarantee to sample a sequence at most once). This helps us understand how to do rapid sequence analysis as accurately as possible. AVAILABILITY AND IMPLEMENTATION: Source code is freely available at https://gitlab.com/mcfrith/noverlap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Programas Informáticos , Análisis de Secuencia de ADN/métodos
2.
Proc Natl Acad Sci U S A ; 115(50): 12805-12810, 2018 12 11.
Artículo en Inglés | MEDLINE | ID: mdl-30455306

RESUMEN

Noncoding RNAs have substantial effects in host-virus interactions. Circular RNAs (circRNAs) are novel single-stranded noncoding RNAs which can decoy other RNAs or RNA-binding proteins to inhibit their functions. The role of circRNAs is largely unknown in the context of Kaposi's sarcoma herpesvirus (KSHV). We hypothesized that circRNAs influence viral infection by inhibiting host and/or viral factors. Transcriptome analysis of KSHV-infected primary endothelial cells and a B cell line identified human circRNAs that are differentially regulated upon infection. We confirmed the expression changes with divergent PCR primers and RNase R treatment of specific circRNAs. Ectopic expression of hsa_circ_0001400, a circRNA induced by infection, suppressed expression of key viral latent gene LANA and lytic gene RTA in KSHV de novo infections. Since human herpesviruses express noncoding RNAs like microRNAs, we searched for viral circRNAs encoded in the KSHV genome. We performed circRNA-Seq analysis with RNase R-treated, circRNA-enriched RNA from KSHV-infected cells. We identified multiple circRNAs encoded by the KSHV genome that are expressed in KSHV-infected endothelial cells and primary effusion lymphoma (PEL) cells. The KSHV circRNAs are located within ORFs of viral lytic genes, are up-regulated upon the induction of the lytic cycle, and alter cell growth. Viral circRNAs were also detected in lymph nodes from patients of KSHV-driven diseases such as PEL, Kaposi's sarcoma, and multicentric Castleman's disease. We revealed new host-virus interactions of circRNAs: human antiviral circRNAs are activated in response to KSHV infection, and viral circRNA expression is induced in the lytic phase of infection.


Asunto(s)
Herpesvirus Humano 8/genética , ARN/genética , Sarcoma de Kaposi/genética , Sarcoma de Kaposi/virología , Linfocitos B/virología , Enfermedad de Castleman/genética , Enfermedad de Castleman/virología , Línea Celular , Células Endoteliales/virología , Perfilación de la Expresión Génica/métodos , Regulación Viral de la Expresión Génica/genética , Genes Virales/genética , Células HEK293 , Células Endoteliales de la Vena Umbilical Humana , Humanos , Linfoma de Efusión Primaria/genética , Linfoma de Efusión Primaria/virología , MicroARNs/genética , Sistemas de Lectura Abierta/genética , ARN Circular , ARN Viral/genética
3.
Transfus Med Hemother ; 48(1): 39-47, 2021 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-33708051

RESUMEN

BACKGROUND: Red blood cells (RBCs) stored for transfusions can lyse over the course of the storage period. The lysis is traditionally assumed to occur via the formation of spiculated echinocyte forms, so that cells that appear smoother are assumed to have better storage quality. We investigate this hypothesis by comparing the morphological distribution to the hemolysis for samples from different donors. METHODS: Red cell concentrates were obtained from a regional blood bank quality control laboratory. Out of 636 units processed by the laboratory, we obtained 26 high hemolysis units and 24 low hemolysis units for assessment of RBC morphology. The association between the morphology and the hemolysis was tested with the Wilcoxon-Mann-Whitney U test. RESULTS: Samples with high stomatocyte counts (p = 0.0012) were associated with increased hemolysis, implying that cells can lyse via the formation of stomatocytes. CONCLUSION: RBCs can lyse without significant echinocyte formation. Lower degrees of spiculation are not a good indicator of low hemolysis when RBCs from different donors are compared.

4.
Proc Natl Acad Sci U S A ; 114(46): E9893-E9902, 2017 11 14.
Artículo en Inglés | MEDLINE | ID: mdl-29087304

RESUMEN

A complete picture of HIV antigenicity during early replication is needed to elucidate the full range of options for controlling infection. Such information is frequently gained through analyses of isolated viral envelope antigens, host CD4 receptors, and cognate antibodies. However, direct examination of viral particles and virus-cell interactions is now possible via advanced microscopy techniques and reagents. Using such methods, we recently determined that CD4-induced (CD4i) transition state epitopes in the HIV surface antigen, gp120, while not exposed on free particles, rapidly become immunoreactive upon virus-cell binding. Here, we use 3D direct stochastic optical reconstruction microscopy (dSTORM) to show that certain CD4i epitopes specific to transition state structures are exposed across the surface of cell-bound virions, thus explaining their immunoreactivity. Moreover, such structures and their marker epitopes are dispersed to regions of virions distal to CD4 contact. We further show that the appearance and positioning of distal CD4i exposures is partially dependent on Gag maturation and intact matrix-gp41 interactions within the virion. Collectively, these observations provide a unique perspective of HIV during early replication. These features may define unique insights for understanding how humoral responses target virions and for developing related antiviral countermeasures.


Asunto(s)
Epítopos/inmunología , Proteína gp120 de Envoltorio del VIH/inmunología , Infecciones por VIH/virología , VIH-1/inmunología , Virión/inmunología , Acoplamiento Viral , Antígenos CD4/metabolismo , Recuento de Linfocito CD4 , Línea Celular , Epítopos/química , Anticuerpos Anti-VIH/inmunología , Antígenos VIH/inmunología , Proteína gp120 de Envoltorio del VIH/química , Proteína gp41 de Envoltorio del VIH/química , Proteína gp41 de Envoltorio del VIH/inmunología , Infecciones por VIH/inmunología , VIH-1/química , Humanos , Virión/química , Virión/metabolismo
5.
BMC Bioinformatics ; 20(1): 77, 2019 Feb 14.
Artículo en Inglés | MEDLINE | ID: mdl-30764761

RESUMEN

BACKGROUND: Genetic sequence database retrieval benchmarks play an essential role in evaluating the performance of sequence searching tools. To date, all phylogenetically diverse benchmarks known to the authors include only query sequences with single protein domains. Domains are the primary building blocks of protein structure and function. Independently, each domain can fulfill a single function, but most proteins (>80% in Metazoa) exist as multi-domain proteins. Multiple domain units combine in various arrangements or architectures to create different functions and are often under evolutionary pressures to yield new ones. Thus, it is crucial to create gold standards reflecting the multi-domain complexity of real proteins to more accurately evaluate sequence searching tools. DESCRIPTION: This work introduces MultiDomainBenchmark (MDB), a database suite of 412 curated multi-domain queries and 227,512 target sequences, representing at least 5108 species and 1123 phylogenetically divergent protein families, their relevancy annotation, and domain location. Here, we use the benchmark to evaluate the performance of two commonly used sequence searching tools, BLAST/PSI-BLAST and HMMER. Additionally, we introduce a novel classification technique for multi-domain proteins to evaluate how well an algorithm recovers a domain architecture. CONCLUSION: MDB is publicly available at http://csc.columbusstate.edu/carroll/MDB/ .


Asunto(s)
Algoritmos , Benchmarking , Bases de Datos de Proteínas , Proteínas/química , Secuencia de Aminoácidos , Filogenia , Estructura Terciaria de Proteína , Alineación de Secuencia
6.
Theor Popul Biol ; 127: 7-15, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-30876864

RESUMEN

If viruses or other pathogens infect a single host, the outcome of infection often hinges on the fate of the initial invaders. The initial basic reproduction number R0, the expected number of cells infected by a single infected cell, helps determine whether the initial viruses can establish a successful beachhead. To determine R0, the Kingman coalescent or continuous-time birth-and-death process can be used to infer the rate of exponential growth in an historical population. Given M sequences sampled in the present, the two models can make the inference from the site frequency spectrum (SFS), the count of mutations that appear in exactly k sequences (k=1,2,…,M). In the case of viruses, however, if R0 is large and an infected cell bursts while propagating virus, the two models are suspect, because they are Markovian with only binary branching. Accordingly, this article develops an approximation for the SFS of a discrete-time branching process with synchronous generations (i.e., a Galton-Watson process). When evaluated in simulations with an asynchronous, non-Markovian model (a Bellman-Harris process) with parameters intended to mimic the bursting viral reproduction of HIV, the approximation proved superior to approximations derived from the Kingman coalescent or continuous-time birth-and-death process. This article demonstrates that in analogy to methods in human genetics, the SFS of viral sequences sampled well after latent infection can remain informative about the initial R0. Thus, it suggests the utility of analyzing the SFS of sequences derived from patient and animal trials of viral therapies, because in some cases, the initial R0 may be able to indicate subtle therapeutic progress, even in the absence of statistically significant differences in the infection of treatment and control groups.


Asunto(s)
Frecuencia de los Genes/genética , Modelos Genéticos , Mutación , Virulencia/genética , Virus/genética , Número Básico de Reproducción , Humanos
7.
BMC Genomics ; 19(1): 896, 2018 Dec 10.
Artículo en Inglés | MEDLINE | ID: mdl-30526482

RESUMEN

BACKGROUND: The application of genomic data and bioinformatics for the identification of restricted or illegally-sourced natural products is urgently needed. The taxonomic identity and geographic provenance of raw and processed materials have implications in sustainable-use commercial practices, and relevance to the enforcement of laws that regulate or restrict illegally harvested materials, such as timber. Improvements in genomics make it possible to capture and sequence partial-to-complete genomes from challenging tissues, such as wood and wood products. RESULTS: In this paper, we report the success of an alignment-free genome comparison method, [Formula: see text] that differentiates different geographic sources of white oak (Quercus) species with a high level of accuracy with very small amount of genomic data. The method is robust to sequencing errors, different sequencing laboratories and sequencing platforms. CONCLUSIONS: This method offers an approach based on genome-scale data, rather than panels of pre-selected markers for specific taxa. The method provides a generalizable platform for the identification and sourcing of materials using a unified next generation sequencing and analysis framework.


Asunto(s)
ADN de Plantas/genética , Genoma de Planta , Geografía , Quercus/genética , Alineación de Secuencia/métodos , Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Componente Principal
9.
Biometrics ; 74(2): 458-471, 2018 06.
Artículo en Inglés | MEDLINE | ID: mdl-28940296

RESUMEN

In recent mutation studies, analyses based on protein domain positions are gaining popularity over gene-centric approaches since the latter have limitations in considering the functional context that the position of the mutation provides. This presents a large-scale simultaneous inference problem, with hundreds of hypothesis tests to consider at the same time. This article aims to select significant mutation counts while controlling a given level of Type I error via False Discovery Rate (FDR) procedures. One main assumption is that the mutation counts follow a zero-inflated model in order to account for the true zeros in the count model and the excess zeros. The class of models considered is the Zero-inflated Generalized Poisson (ZIGP) distribution. Furthermore, we assumed that there exists a cut-off value such that smaller counts than this value are generated from the null distribution. We present several data-dependent methods to determine the cut-off value. We also consider a two-stage procedure based on screening process so that the number of mutations exceeding a certain value should be considered as significant mutations. Simulated and protein domain data sets are used to illustrate this procedure in estimation of the empirical null using a mixture of discrete distributions. Overall, while maintaining control of the FDR, the proposed two-stage testing procedure has superior empirical power.


Asunto(s)
Biometría/métodos , Interpretación Estadística de Datos , Dominios Proteicos , Distribuciones Estadísticas , Análisis Mutacional de ADN , Bases de Datos de Proteínas , Humanos , Tasa de Mutación , Distribución de Poisson
10.
Retrovirology ; 14(1): 13, 2017 02 24.
Artículo en Inglés | MEDLINE | ID: mdl-28231858

RESUMEN

Recently, Oberle et al. published a paper in Retrovirology evaluating the question of whether selection plays a role in HIV transmission. The Oberle study found no obvious genotypic or phenotypic differences between donors and recipients of epidemiologically linked pairs from the Swiss cohort. Thus, Oberle et al. characterized HIV-1 B transmission as largely "stochastic", an imprecise and potentially misleading term. Here, we re-analyzed their data and placed them in the context of transmission data for over 20 other human and animal trials. The present study finds that the transmitted/founder (T/F) viruses from the Swiss cohort show the same non-random genetic signatures conserved in 118 HIV-1, 40 SHIV, and 12 SIV T/F viruses previously published by two independent groups. We provide alternative interpretations of the Swiss cohort data and conclude that the sequences of their donor viruses lacked variability at the specific sites where other studies were able to demonstrate genotypic selection. Oberle et al. observed no phenotypic selection in vitro, so the problem of determining the in vivo phenotypic mechanisms that cause genotypic selection in HIV remains open.


Asunto(s)
Infecciones por VIH , VIH-1/genética , Animales , Genotipo , Humanos
11.
Bioinformatics ; 32(2): 304-5, 2016 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-26428291

RESUMEN

MOTIVATION: Pairwise local alignment is an indispensable tool for molecular biologists. In real time (i.e. in about 1 s), ALP (Ascending Ladder Program) calculates the E-values for protein-protein or DNA-DNA local alignments of random sequences, for arbitrary substitution score matrix, gap costs and letter abundances; and FALP (Frameshift Ascending Ladder Program) performs a similar task, although more slowly, for frameshifting DNA-protein alignments. AVAILABILITY AND IMPLEMENTATION: To permit other C++ programmers to implement the computational efficiencies in ALP and FALP directly within their own programs, C++ source codes are available in the public domain at http://go.usa.gov/3GTSW under 'ALP' and 'FALP', along with the standalone programs ALP and FALP. CONTACT: spouge@nih.gov SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , ADN/química , Proteínas/química , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , ADN/metabolismo , Bases de Datos Factuales , Humanos , Proteínas/metabolismo , Alineación de Secuencia
12.
BMC Bioinformatics ; 17(1): 479, 2016 Nov 21.
Artículo en Inglés | MEDLINE | ID: mdl-27871221

RESUMEN

BACKGROUND: Transcription factors (TFs) form complexes that bind regulatory modules (RMs) within DNA, to control specific sets of genes. Some transcription factor binding sites (TFBSs) near the transcription start site (TSS) display tight positional preferences relative to the TSS. Furthermore, near the TSS, RMs can co-localize TFBSs with each other and the TSS. The proportion of TFBS positional preferences due to TFBS co-localization within RMs is unknown, however. ChIP experiments confirm co-localization of some TFBSs genome-wide, including near the TSS, but they typically examine only a few TFs at a time, using non-physiological conditions that can vary from lab to lab. In contrast, sequence analysis can examine many TFs uniformly and methodically, broadly surveying the co-localization of TFBSs with tight positional preferences relative to the TSS. RESULTS: Our statistics found 43 significant sets of human motifs in the JASPAR TF Database with positional preferences relative to the TSS, with 38 preferences tight (±5 bp). Each set of motifs corresponded to a gene group of 135 to 3304 genes, with 42/43 (98%) gene groups independently validated by DAVID, a gene ontology database, with FDR < 0.05. Motifs corresponding to two TFBSs in a RM should co-occur more than by chance alone, enriching the intersection of the gene groups corresponding to the two TFs. Thus, a gene-group intersection systematically enriched beyond chance alone provides evidence that the two TFs participate in an RM. Of the 903 = 43*42/2 intersections of the 43 significant gene groups, we found 768/903 (85%) pairs of gene groups with significantly enriched intersections, with 564/768 (73%) intersections independently validated by DAVID with FDR < 0.05. A user-friendly web site at http://go.usa.gov/3kjsH permits biologists to explore the interaction network of our TFBSs to identify candidate subunit RMs. CONCLUSIONS: Gene duplication and convergent evolution within a genome provide obvious biological mechanisms for replicating an RM near the TSS that binds a particular TF subunit. Of all intersections of our 43 significant gene groups, 85% were significantly enriched, with 73% of the significant enrichments independently validated by gene ontology. The co-localization of TFBSs within RMs therefore likely explains much of the tight TFBS positional preferences near the TSS.


Asunto(s)
ADN/metabolismo , Regulación de la Expresión Génica , Elementos Reguladores de la Transcripción/genética , Análisis de Secuencia de ADN/métodos , Factores de Transcripción/metabolismo , Sitio de Iniciación de la Transcripción , Sitios de Unión , ADN/química , ADN/genética , Ontología de Genes , Humanos , Unión Proteica
13.
Mol Biol Evol ; 32(5): 1354-64, 2015 May.
Artículo en Inglés | MEDLINE | ID: mdl-25589738

RESUMEN

Species in the genus Plasmodium cause malaria in humans and infect a variety of mammals and other vertebrates. Currently, estimated ages for several mammalian Plasmodium parasites differ by as much as one order of magnitude, an inaccuracy that frustrates reliable estimation of evolutionary rates of disease-related traits. We developed a novel statistical approach to dating the relative age of evolutionary lineages, based on Total Least Squares regression. We validated this lineage dating approach by applying it to the genus Drosophila. Using data from the Drosophila 12 Genomes project, our approach accurately reconstructs the age of well-established Drosophila clades, including the speciation event that led to the subgenera Drosophila and Sophophora, and age of the melanogaster species subgroup. We applied this approach to hundreds of loci from seven mammalian Plasmodium species. We demonstrate the existence of a molecular clock specific to individual Plasmodium proteins, and estimate the relative age of mammalian-infecting Plasmodium. These analyses indicate that: 1) the split between the human parasite Plasmodium vivax and P. knowlesi, from Old World monkeys, occurred 6.1 times earlier than that between P. falciparum and P. reichenowi, parasites of humans and chimpanzees, respectively; and 2) mammalian Plasmodium parasites originated 22 times earlier than the split between P. falciparum and P. reichenowi. Calibrating the absolute divergence times for Plasmodium with eukaryotic substitution rates, we show that the split between P. falciparum and P. reichenowi occurred 3.0-5.5 Ma, and that mammalian Plasmodium parasites originated over 64 Ma. Our results indicate that mammalian-infecting Plasmodium evolved contemporaneously with their hosts, with little evidence for parasite host-switching on an evolutionary scale, and provide a solid timeframe within which to place the evolution of new Plasmodium species.


Asunto(s)
Evolución Molecular , Interacciones Huésped-Parásitos/genética , Malaria Falciparum/genética , Plasmodium falciparum/genética , Animales , Humanos , Malaria Falciparum/parasitología , Pan troglodytes/genética , Filogenia , Plasmodium falciparum/patogenicidad , Plasmodium vivax/genética , Plasmodium vivax/patogenicidad , Alineación de Secuencia
14.
J Virol ; 89(7): 3619-29, 2015 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-25589663

RESUMEN

UNLABELLED: Human immunodeficiency virus (HIV) transmission typically results from infection by a single transmitted/founder (T/F) variant. Are T/F variants chosen uniformly at random from the donor pool, or are they selected based on advantageous traits facilitating transmission? Finding evidence for selection during transmission is of particular interest, because it would indicate that phenotypic and/or genetic properties of the viruses might be harnessed as potential vaccine targets or immunotherapies. Here, we systematically evaluated the differences between the Env proteins of simian immunodeficiency virus/simian HIV (SIV/SHIV) stock and T/F variants in search of "signature" sites of transmission. We also surveyed residue preferences in HIV at the SIV/SHIV signature sites. Four sites of gp120 showed significant selection, and an additional two sites showed a similar trend. Therefore, the six sites clearly differentiate T/F viruses from the majority of circulating variants in the stocks. The selection of SIV/SHIV could be inferred reasonably across both vaccinated and unvaccinated subjects, with infections resulting from vaginal, rectal, and intravenous routes of transmission and regardless of viral dosage. The evidence for selection in SIV and SHIV T/F variants is strong and plentiful, and in HIV the evidence is suggestive though commensurate with the availability of suitable data for analysis. Two of the signature residues are completely conserved across the SIV, SHIV, and HIV variants we examined. Five of the signature residues map to the C1 region of gp120 and one to the signal peptide. Our data raise the possibility that C1, while governing the association between gp120 and gp41, modulates transmission efficiency, replicative fitness, and/or host cell tropism at the level of virus-cell attachment and entry. IMPORTANCE: The present study finds significant evidence of selection on gp120 molecules of SIV/SHIV T/F viruses. The data provide ancillary evidence suggesting the same sites are under selection in HIV. Our findings suggest that the signature residues are involved in increasing the transmissibility of infecting viruses; therefore, they are potential targets for developing a vaccine or other protective measures. A recent study identified the same T/F signature motif but interpreted it as an effect of neutralization resistance. Here, we show that the T/F motif has broader functional significance beyond neutralization sensitivity, because it is present in nonimmune subjects. Also, a vaccine regimen popular in animal trials might have increased the transmission of variants with otherwise low transmission fitness. Our observations might explain why many animal vaccine trials have not faithfully predicted outcomes in human vaccine trials and suggest that current practices in vaccine design need to be reexamined accordingly.


Asunto(s)
Secuencia Conservada , Proteína gp120 de Envoltorio del VIH/genética , Proteína gp120 de Envoltorio del VIH/metabolismo , Infecciones por VIH/transmisión , Glicoproteínas de Membrana/genética , Glicoproteínas de Membrana/metabolismo , Síndrome de Inmunodeficiencia Adquirida del Simio/transmisión , Proteínas del Envoltorio Viral/genética , Proteínas del Envoltorio Viral/metabolismo , Animales , Femenino , Genotipo , VIH/genética , VIH/fisiología , Infecciones por VIH/virología , Humanos , Macaca mulatta , Masculino , Selección Genética , Síndrome de Inmunodeficiencia Adquirida del Simio/virología , Virus de la Inmunodeficiencia de los Simios/genética , Virus de la Inmunodeficiencia de los Simios/fisiología , Tropismo Viral , Acoplamiento Viral , Replicación Viral
15.
Bioinformatics ; 30(24): 3575-82, 2014 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-25172925

RESUMEN

MOTIVATION: The alignment of DNA sequences to proteins, allowing for frameshifts, is a classic method in sequence analysis. It can help identify pseudogenes (which accumulate mutations), analyze raw DNA and RNA sequence data (which may have frameshift sequencing errors), investigate ribosomal frameshifts, etc. Often, however, only ad hoc approximations or simulations are available to provide the statistical significance of a frameshift alignment score. RESULTS: We describe a method to estimate statistical significance of frameshift alignments, similar to classic BLAST statistics. (BLAST presently does not permit its alignments to include frameshifts.) We also illustrate the continuing usefulness of frameshift alignment with two 'post-genomic' applications: (i) when finding pseudogenes within the human genome, frameshift alignments show that most anciently conserved non-coding human elements are recent pseudogenes with conserved ancestral genes; and (ii) when analyzing metagenomic DNA reads from polluted soil, frameshift alignments show that most alignable metagenomic reads contain frameshifts, suggesting that metagenomic analysis needs to use frameshift alignment to derive accurate results.


Asunto(s)
Mutación del Sistema de Lectura , Alineación de Secuencia/métodos , Algoritmos , Interpretación Estadística de Datos , Genoma Humano , Genómica , Humanos , Metagenómica , Seudogenes , Análisis de Secuencia de ADN , Análisis de Secuencia de Proteína , Análisis de Secuencia de ARN , Programas Informáticos
16.
Proc Natl Acad Sci U S A ; 109(16): 6241-6, 2012 Apr 17.
Artículo en Inglés | MEDLINE | ID: mdl-22454494

RESUMEN

Six DNA regions were evaluated as potential DNA barcodes for Fungi, the second largest kingdom of eukaryotic life, by a multinational, multilaboratory consortium. The region of the mitochondrial cytochrome c oxidase subunit 1 used as the animal barcode was excluded as a potential marker, because it is difficult to amplify in fungi, often includes large introns, and can be insufficiently variable. Three subunits from the nuclear ribosomal RNA cistron were compared together with regions of three representative protein-coding genes (largest subunit of RNA polymerase II, second largest subunit of RNA polymerase II, and minichromosome maintenance protein). Although the protein-coding gene regions often had a higher percent of correct identification compared with ribosomal markers, low PCR amplification and sequencing success eliminated them as candidates for a universal fungal barcode. Among the regions of the ribosomal cistron, the internal transcribed spacer (ITS) region has the highest probability of successful identification for the broadest range of fungi, with the most clearly defined barcode gap between inter- and intraspecific variation. The nuclear ribosomal large subunit, a popular phylogenetic marker in certain groups, had superior species resolution in some taxonomic groups, such as the early diverging lineages and the ascomycete yeasts, but was otherwise slightly inferior to the ITS. The nuclear ribosomal small subunit has poor species-level resolution in fungi. ITS will be formally proposed for adoption as the primary fungal barcode marker to the Consortium for the Barcode of Life, with the possibility that supplementary barcodes may be developed for particular narrowly circumscribed taxonomic groups.


Asunto(s)
Código de Barras del ADN Taxonómico/métodos , ADN de Hongos/genética , ADN Espaciador Ribosómico/genética , Hongos/genética , Núcleo Celular/genética , Hongos/clasificación , Filogenia , Reacción en Cadena de la Polimerasa , Reproducibilidad de los Resultados , Especificidad de la Especie
17.
Theor Popul Biol ; 92: 51-4, 2014 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-24321308

RESUMEN

Sample n individuals uniformly at random from a population, and then sample m individuals uniformly at random from the sample. Consider the most recent common ancestor (MRCA) of the subsample of m individuals. Let the subsample MRCA have j descendants in the sample (m ⩽ j ⩽ n). Under a Moran or coalescent model (and therefore under many other models), the probability that j = n is known. In this case, the subsample MRCA is an ancestor of every sampled individual, and the subsample and sample MRCAs are identical. The probability that j = m is also known. In this case, the subsample MRCA is an ancestor of no sampled individual outside the subsample. This article derives the complete distribution of j, enabling inferences from the corresponding p-value. The text presents hypothetical statistical applications pertinent to taxonomy (the gene flow between Neanderthals and anatomically modern humans) and medicine (the association of genetic markers with disease).


Asunto(s)
Genética de Población , Modelos Teóricos
18.
Database (Oxford) ; 20242024 Aug 30.
Artículo en Inglés | MEDLINE | ID: mdl-39213390

RESUMEN

The All of Us Research Program ("All of Us") is an initiative led by the National Institutes of Health whose goal is to advance research on personalized medicine and health equity through the collection of genetic, environmental, demographic, and health data from volunteer participants who reside in the USA. The program's emphasis on recruiting a diverse participant cohort makes "All of Us" an effective platform for investigating health disparities. In this work, we analyzed participant electronic health record (EHR) data to identify the diseases and disease categories in the "All of Us" cohort for which racial and ethnic prevalence disparities can be observed. In conjunction with these analyses, we developed the US Health Disparities Browser as an interactive web application that enables users to visualize differences in race- and ethnic-group-specific prevalence estimates for 1755 different diseases: https://usdisparities.biosci.gatech.edu/. The web application features a catalog of all diseases represented in the browser, which can be sorted by overall prevalence as well as the variance in prevalence across racial and ethnic groups. The analyses outlined here provide details on the nature and extent of racial and ethnic health disparities in the "All of Us" participant cohort, and the accompanying browser can serve as a resource through which researchers can explore these disparities Database URL: https://usdisparities.biosci.gatech.edu.


Asunto(s)
Etnicidad , Disparidades en el Estado de Salud , Grupos Raciales , Femenino , Humanos , Masculino , Registros Electrónicos de Salud , Etnicidad/genética , Grupos Raciales/genética , Estados Unidos
19.
BMC Genomics ; 14: 349, 2013 May 25.
Artículo en Inglés | MEDLINE | ID: mdl-23706083

RESUMEN

BACKGROUND: Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) can locate transcription factor binding sites on genomic scale. Although many models and programs are available to call peaks, none has dominated its competition in comparison studies. RESULTS: We propose a rigorous statistical model, the normal-exponential two-peak (NEXT-peak) model, which parallels the physical processes generating the empirical data, and which can naturally incorporate mappability information. The model therefore estimates total strength of binding (even if some binding locations do not map uniquely into a reference genome, effectively censoring them); it also assigns an error to an estimated binding location. The comparison study with existing programs on real ChIP-seq datasets (STAT1, NRSF, and ZNF143) demonstrates that the NEXT-peak model performs well both in calling peaks and locating them. The model also provides a goodness-of-fit test, to screen out spurious peaks and to infer multiple binding events in a region. CONCLUSIONS: The NEXT-peak program calls peaks on any test dataset about as accurately as any other, but provides unusual accuracy in the estimated location of the peaks it calls. NEXT-peak is based on rigorous statistics, so its model also provides a principled foundation for a more elaborate statistical analysis of ChIP-seq data.


Asunto(s)
Inmunoprecipitación de Cromatina/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Modelos Estadísticos , Motivos de Nucleótidos , Programas Informáticos
20.
Epidemics ; 44: 100714, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37595401

RESUMEN

In a pending pandemic, early knowledge of age-specific disease parameters, e.g., susceptibility, infectivity, and the clinical fraction (the fraction of infections coming to clinical attention), supports targeted public health responses like school closures or sequestration of the elderly. The earlier the knowledge, the more useful it is, so the present article examines an early phase of many epidemics, exponential growth. Using age-stratified COVID-19 case counts collected in Canada, China, Israel, Italy, the Netherlands, and the United Kingdom before April 23, 2020, we present a linear analysis of the exponential phase that attempts to estimate the age-specific disease parameters given above. Some combinations of the parameters can be estimated by requiring that they change smoothly with age. The estimation yielded: (1) the case susceptibility, defined for each age-group as the product of susceptibility to infection and the clinical fraction; (2) the mean number of transmissions of infection per contact within each age-group; and (3) the reproduction number of infection within each age-group, i.e., the diagonal of the age-stratified next-generation matrix. Our restriction to data from the exponential phase indicates the combinations of epidemic parameters that are intrinsically easiest to estimate with early age-stratified case counts. For example, conclusions concerning the age-dependence of case susceptibility appeared more robust than corresponding conclusions about infectivity. Generally, the analysis produced some results consistent with conclusions confirmed much later in the COVID-19 pandemic. Notably, our analysis showed that in some countries, the reproduction number of infection within the half-decade 70-75 was unusually large compared to other half-decades. Our analysis therefore could have anticipated that without countermeasures, COVID-19 would spread rapidly once seeded in homes for the elderly.


Asunto(s)
COVID-19 , Pandemias , Anciano , Humanos , COVID-19/epidemiología , Salud Pública , Canadá , Reproducción
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA