Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 55
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-38045036

RESUMO

Background: Diabetes is a common disease with a major burden on morbidity, mortality, and productivity. Type 2 diabetes (T2D) accounts for roughly 90% of all diabetes cases in the USA and has a greater observed prevalence among those who identify as Black or Hispanic. Methods: This study aimed to assess T2D racial and ethnic disparities using the All of Us Research Program data and to measure associations between genetic ancestry (GA), socioeconomic deprivation, and T2D. We used the All of Us Researcher Workbench to analyze T2D prevalence and model its associations with GA, individual-level (iSDI), and zip code-based (zSDI) socioeconomic deprivation indices among participant self-identified race and ethnicity (SIRE) groups. Results: The study cohort of 86,488 participants from the four largest SIRE groups in All of Us: Asian (n = 2311), Black (n = 16,282), Hispanic (n = 16,966), and White (n = 50,292). SIRE groups show characteristic genetic ancestry patterns, consistent with their diverse origins, together with a continuum of ancestry fractions within and between groups. The Black and Hispanic groups show the highest levels of socioeconomic deprivation, followed by the Asian and White groups. Black participants show the highest age- and sex-adjusted T2D prevalence (21.9%), followed by the Hispanic (19.9%), Asian (15.1%), and White (14.8%) groups. Minority SIRE groups and socioeconomic deprivation, both iSDI and zSDI, are positively associated with T2D, when the entire cohort is analyzed together. However, SIRE and GA both show negative interaction effects with iSDI and zSDI on T2D. Higher levels of iSDI and zSDI are negatively associated with T2D in the Black and Hispanic groups, and higher levels of iSDI and zSDI are negatively associated with T2D at high levels of African and Native American ancestry. Conclusions: Socioeconomic deprivation is associated with a higher prevalence of T2D in Black and Hispanic minority groups, compared to the majority White group. Nonetheless, socioeconomic deprivation is associated with reduced T2D risk within the Black and Hispanic groups. These results are paradoxical and have not been reported elsewhere, with possible explanations related to the nature of the All of Us data along with SIRE group differences in access to healthcare, diet, and lifestyle.

2.
Res Sq ; 2023 Sep 22.
Artigo em Inglês | MEDLINE | ID: mdl-37790565

RESUMO

Background: Diabetes is a common disease with a major burden on morbidity, mortality, and productivity. Type 2 diabetes (T2D) accounts for roughly 90% of all diabetes cases in the United States and has greater observed prevalence among those who identify as Black or Hispanic. Methods: The aims of this study were to determine whether T2D racial and ethnic disparities can be observed in data from the All of Us Research Program and to measure associations of genetic ancestry (GA) and socioeconomic deprivation with T2D. The All of Us Researcher Workbench was used to calculate T2D prevalence and to model T2D associations with GA, individual-level (iSDI) and zip code-based (zSDI) socioeconomic deprivation indices within and between participant self-identified race and ethnicity (SIRE) groups. Results: The study cohort of 86,488 participants from the four largest SIRE groups in All of Us: Asian (n=2,311), Black (n=16,282), Hispanic (n=16,966), and White (n=50,292). SIRE groups show characteristic genetic ancestry patterns, consistent with their diverse origins, together with a continuum of ancestry fractions within and between groups. The Black and Hispanic groups show the highest median SDI values, followed by the Asian and White groups. Black participants show the highest age- and sex-adjusted T2D prevalence (21.9%), followed by the Hispanic (19.9%), Asian (15.1%), and White (14.8%) groups. Minority SIRE groups and socioeconomic deprivation are positively associated with T2D, when the entire cohort is analyzed together. However, SIRE and GA both show negative interaction effects with SDI on T2D. Higher levels of SDI are negatively associated with T2D in the Black and Hispanic groups, and higher levels of SDI are negatively associated with T2D at high levels of African and Native American ancestry. Conclusion: Socioeconomic deprivation is positively associated with the SIRE group T2D disparities observed here but negatively associated with T2D within the Black and Hispanic groups that show the highest T2D prevalence. These results are paradoxical and have not been reported elsewhere. We discuss possible explanations for this paradox related to the nature of the All of Us data along with SIRE group differences in access to healthcare, diet, and lifestyle.

3.
Epidemics ; 44: 100714, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37595401

RESUMO

In a pending pandemic, early knowledge of age-specific disease parameters, e.g., susceptibility, infectivity, and the clinical fraction (the fraction of infections coming to clinical attention), supports targeted public health responses like school closures or sequestration of the elderly. The earlier the knowledge, the more useful it is, so the present article examines an early phase of many epidemics, exponential growth. Using age-stratified COVID-19 case counts collected in Canada, China, Israel, Italy, the Netherlands, and the United Kingdom before April 23, 2020, we present a linear analysis of the exponential phase that attempts to estimate the age-specific disease parameters given above. Some combinations of the parameters can be estimated by requiring that they change smoothly with age. The estimation yielded: (1) the case susceptibility, defined for each age-group as the product of susceptibility to infection and the clinical fraction; (2) the mean number of transmissions of infection per contact within each age-group; and (3) the reproduction number of infection within each age-group, i.e., the diagonal of the age-stratified next-generation matrix. Our restriction to data from the exponential phase indicates the combinations of epidemic parameters that are intrinsically easiest to estimate with early age-stratified case counts. For example, conclusions concerning the age-dependence of case susceptibility appeared more robust than corresponding conclusions about infectivity. Generally, the analysis produced some results consistent with conclusions confirmed much later in the COVID-19 pandemic. Notably, our analysis showed that in some countries, the reproduction number of infection within the half-decade 70-75 was unusually large compared to other half-decades. Our analysis therefore could have anticipated that without countermeasures, COVID-19 would spread rapidly once seeded in homes for the elderly.


Assuntos
COVID-19 , Pandemias , Idoso , Humanos , COVID-19/epidemiologia , Saúde Pública , Canadá , Reprodução
4.
Bioinformatics ; 39(2)2023 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-36702468

RESUMO

MOTIVATION: We face an increasing flood of genetic sequence data, from diverse sources, requiring rapid computational analysis. Rapid analysis can be achieved by sampling a subset of positions in each sequence. Previous sequence-sampling methods, such as minimizers, syncmers and minimally overlapping words, were developed by heuristic intuition, and are not optimal. RESULTS: We present a sequence-sampling approach that provably optimizes sensitivity for a whole class of sequence comparison methods, for randomly evolving sequences. It is likely near-optimal for a wide range of alignment-based and alignment-free analyses. For real biological DNA, it increases specificity by avoiding simple repeats. Our approach generalizes universal hitting sets (which guarantee to sample a sequence at least once) and polar sets (which guarantee to sample a sequence at most once). This helps us understand how to do rapid sequence analysis as accurately as possible. AVAILABILITY AND IMPLEMENTATION: Source code is freely available at https://gitlab.com/mcfrith/noverlap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software , Análise de Sequência de DNA/métodos
6.
PLoS One ; 16(7): e0254145, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34255772

RESUMO

In a compartmental epidemic model, the initial exponential phase reflects a fixed interaction between an infectious agent and a susceptible population in steady state, so it determines the basic reproduction number R0 on its own. After the exponential phase, dynamic complexities like societal responses muddy the practical interpretation of many estimated parameters. The computer program ARRP, already available from sequence alignment applications, automatically estimated the end of the exponential phase in COVID-19 and extracted the exponential growth rate r for 160 countries. By positing a gamma-distributed generation time, the exponential growth method then yielded R0 estimates for COVID-19 in 160 countries. The use of ARRP ensured that the R0 estimates were largely freed from any dependency outside the exponential phase. The Prem matrices quantify rates of effective contact for infectious disease. Without using any age-stratified COVID-19 data, but under strong assumptions about the homogeneity of susceptibility, infectiousness, etc., across different age-groups, the Prem contact matrices also yielded theoretical R0 estimates for COVID-19 in 152 countries, generally in quantitative conflict with the R0 estimates derived from the exponential growth method. An exploratory analysis manipulating only the Prem contact matrices reduced the conflict, suggesting that age-groups under 20 years did not promote the initial exponential growth of COVID-19 as much as other age-groups. The analysis therefore supports tentatively and tardily, but independently of age-stratified COVID-19 data, the low priority given to vaccinating younger age groups. It also supports the judicious reopening of schools. The exploratory analysis also supports the possibility of suspecting differences in epidemic spread among different age-groups, even before substantial amounts of age-stratified data become available.


Assuntos
Número Básico de Reprodução/estatística & dados numéricos , COVID-19/epidemiologia , Fatores Etários , COVID-19/prevenção & controle , COVID-19/transmissão , Humanos , Modelos Estatísticos , Quarentena/estatística & dados numéricos , Vacinação/estatística & dados numéricos
8.
Transfus Med Hemother ; 48(1): 39-47, 2021 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-33708051

RESUMO

BACKGROUND: Red blood cells (RBCs) stored for transfusions can lyse over the course of the storage period. The lysis is traditionally assumed to occur via the formation of spiculated echinocyte forms, so that cells that appear smoother are assumed to have better storage quality. We investigate this hypothesis by comparing the morphological distribution to the hemolysis for samples from different donors. METHODS: Red cell concentrates were obtained from a regional blood bank quality control laboratory. Out of 636 units processed by the laboratory, we obtained 26 high hemolysis units and 24 low hemolysis units for assessment of RBC morphology. The association between the morphology and the hemolysis was tested with the Wilcoxon-Mann-Whitney U test. RESULTS: Samples with high stomatocyte counts (p = 0.0012) were associated with increased hemolysis, implying that cells can lyse via the formation of stomatocytes. CONCLUSION: RBCs can lyse without significant echinocyte formation. Lower degrees of spiculation are not a good indicator of low hemolysis when RBCs from different donors are compared.

9.
Algorithms Mol Biol ; 15: 17, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32968428

RESUMO

BACKGROUND: Data about herpesvirus microRNA motifs on human circular RNAs suggested the following statistical question. Consider independent random counts, not necessarily identically distributed. Conditioned on the sum, decide whether one of the counts is unusually large. Exact computation of the p-value leads to a specific algorithmic problem. Given n elements g 0 , g 1 , … , g n - 1 in a set G with the closure and associative properties and a commutative product without inverses, compute the jackknife (leave-one-out) products g ¯ j = g 0 g 1 ⋯ g j - 1 g j + 1 ⋯ g n - 1 ( 0 ≤ j < n ). RESULTS: This article gives a linear-time Jackknife Product algorithm. Its upward phase constructs a standard segment tree for computing segment products like g i , j = g i g i + 1 ⋯ g j - 1 ; its novel downward phase mirrors the upward phase while exploiting the symmetry of g j and its complement g ¯ j . The algorithm requires storage for 2 n elements of G and only about 3 n products. In contrast, the standard segment tree algorithms require about n products for construction and log 2 n products for calculating each g ¯ j , i.e., about n log 2 n products in total; and a naïve quadratic algorithm using n - 2 element-by-element products to compute each g ¯ j requires n n - 2 products. CONCLUSIONS: In the herpesvirus application, the Jackknife Product algorithm required 15 min; standard segment tree algorithms would have taken an estimated 3 h; and the quadratic algorithm, an estimated 1 month. The Jackknife Product algorithm has many possible uses in bioinformatics and statistics.

10.
PLoS One ; 15(8): e0237507, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32813726

RESUMO

DNA barcoding can identify biological species and provides an important tool in diverse applications, such as conserving species and identifying pathogens, among many others. If combined with statistical tests, DNA barcoding can focus taxonomic scrutiny onto anomalous species identifications based on morphological features. Accordingly, we put nonparametric tests into a taxonomic context to answer questions about our sequence dataset of the formal fungal barcode, the nuclear ribosomal internal transcribed spacer (ITS). For example, does DNA barcoding concur with annotated species identifications significantly better if expert taxonomists produced the annotations? Does species assignment improve significantly if sequences are restricted to lengths greater than 500 bp? Both questions require a figure of merit to measure of the accuracy of species identification, typically provided by the probability of correct identification (PCI). Many articles on DNA barcoding use variants of PCI to measure the accuracy of species identification, but do not provide the variants with names, and the absence of explicit names hinders the recognition that the different variants are not comparable from study to study. We provide four variant PCIs with a name and show that for fixed data they follow systematic inequalities. Despite custom, therefore, their comparison is at a minimum problematic. Some popular PCI variants are particularly vulnerable to errors in species annotation, insensitive to improvements in a barcoding pipeline, and unable to predict identification accuracy as a database grows, making them unsuitable for many purposes. Generally, the Fractional PCI has the best properties as a figure of merit for species identification. The fungal genus Ramaria provides unusual taxonomic difficulties. As a case study, it shows that a good taxonomic background can be combined with the pertinent summary statistics of molecular results to improve the identification of doubtful samples, linking both disciplines synergistically.


Assuntos
Código de Barras de DNA Taxonômico/métodos , DNA Fúngico/análise , DNA Espaçador Ribossômico/análise , Fungos/classificação , Fungos/genética , Análise de Sequência de DNA/métodos , Teorema de Bayes , Modelos Estatísticos , Filogenia , Especificidade da Espécie
11.
PLoS One ; 15(1): e0227127, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-31923263

RESUMO

If viruses or other pathogens infect a single host, the outcome of infection may depend on the initial basic reproduction number R0, the expected number of host cells infected by a single infected cell. This article shows that sometimes, phylogenetic models can estimate the initial R0, using only sequences sampled from the pathogenic population during its exponential growth or shortly thereafter. When evaluated by simulations mimicking the bursting viral reproduction of HIV and simultaneous sampling of HIV gp120 sequences during early viremia, the estimated R0 displayed useful accuracies in achievable experimental designs. Estimates of R0 have several potential applications to investigators interested in the progress of infection in single hosts, including: (1) timing a pathogen's movement through different microenvironments; (2) timing the change points in a pathogen's mode of spread (e.g., timing the change from cell-free spread to cell-to-cell spread, or vice versa, in an HIV infection); (3) quantifying the impact different initial microenvironments have on pathogens (e.g., in mucosal challenge with HIV, quantifying the impact that the presence or absence of mucosal infection has on R0); (4) quantifying subtle changes in infectability in therapeutic trials (either human or animal), even when therapies do not produce total sterilizing immunity; and (5) providing a variable predictive of the clinical efficacy of prophylactic therapies.


Assuntos
Número Básico de Reprodução , Efeito Fundador , Estatística como Assunto/métodos , Viroses/epidemiologia , Vírus/patogenicidade , Sequência de Aminoácidos , Animais , Proteína gp120 do Envelope de HIV/genética , Infecções por HIV/epidemiologia , Humanos , Modelos Genéticos , Filogenia
12.
PLoS One ; 14(6): e0217625, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31188853

RESUMO

An RNA switch triggers biological functions by toggling between two conformations. RNA switches include bacterial riboswitches, where ligand binding can stabilize a bound structure. For RNAs with only one stable structure, structural prediction usually just requires a straightforward free energy minimization, but for an RNA switch, the prediction of a less stable alternative structure is often computationally costly and even problematic. The current sampling-clustering method predicts stable and alternative structures by partitioning structures sampled from the energy landscape into two clusters, but it is very time-consuming. Instead, we predict the alternative structure of an RNA switch from conditional probability calculations within the energy landscape. First, our method excludes base pairs related to the most stable structure in the energy landscape. Then, it detects stable stems ("seeds") in the remaining landscape. Finally, it folds an alternative structure prediction around a seed. While having comparable riboswitch classification performance, the conditional-probability computations had fewer adjustable parameters, offered greater predictive flexibility, and were more than one thousand times faster than the sampling step alone in sampling-clustering predictions, the competing standard. Overall, the described approach helps traverse thermodynamically improbable energy landscapes to find biologically significant substructures and structures rapidly and effectively.


Assuntos
Algoritmos , Modelos Estatísticos , Conformação de Ácido Nucleico , Riboswitch , Pareamento de Bases , Sequência de Bases , Simulação por Computador , Estabilidade de RNA , Termodinâmica
13.
Theor Popul Biol ; 127: 7-15, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-30876864

RESUMO

If viruses or other pathogens infect a single host, the outcome of infection often hinges on the fate of the initial invaders. The initial basic reproduction number R0, the expected number of cells infected by a single infected cell, helps determine whether the initial viruses can establish a successful beachhead. To determine R0, the Kingman coalescent or continuous-time birth-and-death process can be used to infer the rate of exponential growth in an historical population. Given M sequences sampled in the present, the two models can make the inference from the site frequency spectrum (SFS), the count of mutations that appear in exactly k sequences (k=1,2,…,M). In the case of viruses, however, if R0 is large and an infected cell bursts while propagating virus, the two models are suspect, because they are Markovian with only binary branching. Accordingly, this article develops an approximation for the SFS of a discrete-time branching process with synchronous generations (i.e., a Galton-Watson process). When evaluated in simulations with an asynchronous, non-Markovian model (a Bellman-Harris process) with parameters intended to mimic the bursting viral reproduction of HIV, the approximation proved superior to approximations derived from the Kingman coalescent or continuous-time birth-and-death process. This article demonstrates that in analogy to methods in human genetics, the SFS of viral sequences sampled well after latent infection can remain informative about the initial R0. Thus, it suggests the utility of analyzing the SFS of sequences derived from patient and animal trials of viral therapies, because in some cases, the initial R0 may be able to indicate subtle therapeutic progress, even in the absence of statistically significant differences in the infection of treatment and control groups.


Assuntos
Frequência do Gene/genética , Modelos Genéticos , Mutação , Virulência/genética , Vírus/genética , Número Básico de Reprodução , Humanos
14.
BMC Bioinformatics ; 20(1): 77, 2019 Feb 14.
Artigo em Inglês | MEDLINE | ID: mdl-30764761

RESUMO

BACKGROUND: Genetic sequence database retrieval benchmarks play an essential role in evaluating the performance of sequence searching tools. To date, all phylogenetically diverse benchmarks known to the authors include only query sequences with single protein domains. Domains are the primary building blocks of protein structure and function. Independently, each domain can fulfill a single function, but most proteins (>80% in Metazoa) exist as multi-domain proteins. Multiple domain units combine in various arrangements or architectures to create different functions and are often under evolutionary pressures to yield new ones. Thus, it is crucial to create gold standards reflecting the multi-domain complexity of real proteins to more accurately evaluate sequence searching tools. DESCRIPTION: This work introduces MultiDomainBenchmark (MDB), a database suite of 412 curated multi-domain queries and 227,512 target sequences, representing at least 5108 species and 1123 phylogenetically divergent protein families, their relevancy annotation, and domain location. Here, we use the benchmark to evaluate the performance of two commonly used sequence searching tools, BLAST/PSI-BLAST and HMMER. Additionally, we introduce a novel classification technique for multi-domain proteins to evaluate how well an algorithm recovers a domain architecture. CONCLUSION: MDB is publicly available at http://csc.columbusstate.edu/carroll/MDB/ .


Assuntos
Algoritmos , Benchmarking , Bases de Dados de Proteínas , Proteínas/química , Sequência de Aminoácidos , Filogenia , Estrutura Terciária de Proteína , Alinhamento de Sequência
15.
BMC Genomics ; 19(1): 896, 2018 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-30526482

RESUMO

BACKGROUND: The application of genomic data and bioinformatics for the identification of restricted or illegally-sourced natural products is urgently needed. The taxonomic identity and geographic provenance of raw and processed materials have implications in sustainable-use commercial practices, and relevance to the enforcement of laws that regulate or restrict illegally harvested materials, such as timber. Improvements in genomics make it possible to capture and sequence partial-to-complete genomes from challenging tissues, such as wood and wood products. RESULTS: In this paper, we report the success of an alignment-free genome comparison method, [Formula: see text] that differentiates different geographic sources of white oak (Quercus) species with a high level of accuracy with very small amount of genomic data. The method is robust to sequencing errors, different sequencing laboratories and sequencing platforms. CONCLUSIONS: This method offers an approach based on genome-scale data, rather than panels of pre-selected markers for specific taxa. The method provides a generalizable platform for the identification and sourcing of materials using a unified next generation sequencing and analysis framework.


Assuntos
DNA de Plantas/genética , Genoma de Planta , Geografia , Quercus/genética , Alinhamento de Sequência/métodos , Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Componente Principal
16.
Proc Natl Acad Sci U S A ; 115(50): 12805-12810, 2018 12 11.
Artigo em Inglês | MEDLINE | ID: mdl-30455306

RESUMO

Noncoding RNAs have substantial effects in host-virus interactions. Circular RNAs (circRNAs) are novel single-stranded noncoding RNAs which can decoy other RNAs or RNA-binding proteins to inhibit their functions. The role of circRNAs is largely unknown in the context of Kaposi's sarcoma herpesvirus (KSHV). We hypothesized that circRNAs influence viral infection by inhibiting host and/or viral factors. Transcriptome analysis of KSHV-infected primary endothelial cells and a B cell line identified human circRNAs that are differentially regulated upon infection. We confirmed the expression changes with divergent PCR primers and RNase R treatment of specific circRNAs. Ectopic expression of hsa_circ_0001400, a circRNA induced by infection, suppressed expression of key viral latent gene LANA and lytic gene RTA in KSHV de novo infections. Since human herpesviruses express noncoding RNAs like microRNAs, we searched for viral circRNAs encoded in the KSHV genome. We performed circRNA-Seq analysis with RNase R-treated, circRNA-enriched RNA from KSHV-infected cells. We identified multiple circRNAs encoded by the KSHV genome that are expressed in KSHV-infected endothelial cells and primary effusion lymphoma (PEL) cells. The KSHV circRNAs are located within ORFs of viral lytic genes, are up-regulated upon the induction of the lytic cycle, and alter cell growth. Viral circRNAs were also detected in lymph nodes from patients of KSHV-driven diseases such as PEL, Kaposi's sarcoma, and multicentric Castleman's disease. We revealed new host-virus interactions of circRNAs: human antiviral circRNAs are activated in response to KSHV infection, and viral circRNA expression is induced in the lytic phase of infection.


Assuntos
Herpesvirus Humano 8/genética , RNA/genética , Sarcoma de Kaposi/genética , Sarcoma de Kaposi/virologia , Linfócitos B/virologia , Hiperplasia do Linfonodo Gigante/genética , Hiperplasia do Linfonodo Gigante/virologia , Linhagem Celular , Células Endoteliais/virologia , Perfilação da Expressão Gênica/métodos , Regulação Viral da Expressão Gênica/genética , Genes Virais/genética , Células HEK293 , Células Endoteliais da Veia Umbilical Humana , Humanos , Linfoma de Efusão Primária/genética , Linfoma de Efusão Primária/virologia , MicroRNAs/genética , Fases de Leitura Aberta/genética , RNA Circular , RNA Viral/genética
17.
Biometrics ; 74(2): 458-471, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-28940296

RESUMO

In recent mutation studies, analyses based on protein domain positions are gaining popularity over gene-centric approaches since the latter have limitations in considering the functional context that the position of the mutation provides. This presents a large-scale simultaneous inference problem, with hundreds of hypothesis tests to consider at the same time. This article aims to select significant mutation counts while controlling a given level of Type I error via False Discovery Rate (FDR) procedures. One main assumption is that the mutation counts follow a zero-inflated model in order to account for the true zeros in the count model and the excess zeros. The class of models considered is the Zero-inflated Generalized Poisson (ZIGP) distribution. Furthermore, we assumed that there exists a cut-off value such that smaller counts than this value are generated from the null distribution. We present several data-dependent methods to determine the cut-off value. We also consider a two-stage procedure based on screening process so that the number of mutations exceeding a certain value should be considered as significant mutations. Simulated and protein domain data sets are used to illustrate this procedure in estimation of the empirical null using a mixture of discrete distributions. Overall, while maintaining control of the FDR, the proposed two-stage testing procedure has superior empirical power.


Assuntos
Biometria/métodos , Interpretação Estatística de Dados , Domínios Proteicos , Distribuições Estatísticas , Análise Mutacional de DNA , Bases de Dados de Proteínas , Humanos , Taxa de Mutação , Distribuição de Poisson
18.
Proc Natl Acad Sci U S A ; 114(46): E9893-E9902, 2017 11 14.
Artigo em Inglês | MEDLINE | ID: mdl-29087304

RESUMO

A complete picture of HIV antigenicity during early replication is needed to elucidate the full range of options for controlling infection. Such information is frequently gained through analyses of isolated viral envelope antigens, host CD4 receptors, and cognate antibodies. However, direct examination of viral particles and virus-cell interactions is now possible via advanced microscopy techniques and reagents. Using such methods, we recently determined that CD4-induced (CD4i) transition state epitopes in the HIV surface antigen, gp120, while not exposed on free particles, rapidly become immunoreactive upon virus-cell binding. Here, we use 3D direct stochastic optical reconstruction microscopy (dSTORM) to show that certain CD4i epitopes specific to transition state structures are exposed across the surface of cell-bound virions, thus explaining their immunoreactivity. Moreover, such structures and their marker epitopes are dispersed to regions of virions distal to CD4 contact. We further show that the appearance and positioning of distal CD4i exposures is partially dependent on Gag maturation and intact matrix-gp41 interactions within the virion. Collectively, these observations provide a unique perspective of HIV during early replication. These features may define unique insights for understanding how humoral responses target virions and for developing related antiviral countermeasures.


Assuntos
Epitopos/imunologia , Proteína gp120 do Envelope de HIV/imunologia , Infecções por HIV/virologia , HIV-1/imunologia , Vírion/imunologia , Ligação Viral , Antígenos CD4/metabolismo , Contagem de Linfócito CD4 , Linhagem Celular , Epitopos/química , Anticorpos Anti-HIV/imunologia , Antígenos HIV/imunologia , Proteína gp120 do Envelope de HIV/química , Proteína gp41 do Envelope de HIV/química , Proteína gp41 do Envelope de HIV/imunologia , Infecções por HIV/imunologia , HIV-1/química , Humanos , Vírion/química , Vírion/metabolismo
19.
Retrovirology ; 14(1): 13, 2017 02 24.
Artigo em Inglês | MEDLINE | ID: mdl-28231858

RESUMO

Recently, Oberle et al. published a paper in Retrovirology evaluating the question of whether selection plays a role in HIV transmission. The Oberle study found no obvious genotypic or phenotypic differences between donors and recipients of epidemiologically linked pairs from the Swiss cohort. Thus, Oberle et al. characterized HIV-1 B transmission as largely "stochastic", an imprecise and potentially misleading term. Here, we re-analyzed their data and placed them in the context of transmission data for over 20 other human and animal trials. The present study finds that the transmitted/founder (T/F) viruses from the Swiss cohort show the same non-random genetic signatures conserved in 118 HIV-1, 40 SHIV, and 12 SIV T/F viruses previously published by two independent groups. We provide alternative interpretations of the Swiss cohort data and conclude that the sequences of their donor viruses lacked variability at the specific sites where other studies were able to demonstrate genotypic selection. Oberle et al. observed no phenotypic selection in vitro, so the problem of determining the in vivo phenotypic mechanisms that cause genotypic selection in HIV remains open.


Assuntos
Infecções por HIV , HIV-1/genética , Animais , Genótipo , Humanos
20.
BMC Bioinformatics ; 17(1): 479, 2016 Nov 21.
Artigo em Inglês | MEDLINE | ID: mdl-27871221

RESUMO

BACKGROUND: Transcription factors (TFs) form complexes that bind regulatory modules (RMs) within DNA, to control specific sets of genes. Some transcription factor binding sites (TFBSs) near the transcription start site (TSS) display tight positional preferences relative to the TSS. Furthermore, near the TSS, RMs can co-localize TFBSs with each other and the TSS. The proportion of TFBS positional preferences due to TFBS co-localization within RMs is unknown, however. ChIP experiments confirm co-localization of some TFBSs genome-wide, including near the TSS, but they typically examine only a few TFs at a time, using non-physiological conditions that can vary from lab to lab. In contrast, sequence analysis can examine many TFs uniformly and methodically, broadly surveying the co-localization of TFBSs with tight positional preferences relative to the TSS. RESULTS: Our statistics found 43 significant sets of human motifs in the JASPAR TF Database with positional preferences relative to the TSS, with 38 preferences tight (±5 bp). Each set of motifs corresponded to a gene group of 135 to 3304 genes, with 42/43 (98%) gene groups independently validated by DAVID, a gene ontology database, with FDR < 0.05. Motifs corresponding to two TFBSs in a RM should co-occur more than by chance alone, enriching the intersection of the gene groups corresponding to the two TFs. Thus, a gene-group intersection systematically enriched beyond chance alone provides evidence that the two TFs participate in an RM. Of the 903 = 43*42/2 intersections of the 43 significant gene groups, we found 768/903 (85%) pairs of gene groups with significantly enriched intersections, with 564/768 (73%) intersections independently validated by DAVID with FDR < 0.05. A user-friendly web site at http://go.usa.gov/3kjsH permits biologists to explore the interaction network of our TFBSs to identify candidate subunit RMs. CONCLUSIONS: Gene duplication and convergent evolution within a genome provide obvious biological mechanisms for replicating an RM near the TSS that binds a particular TF subunit. Of all intersections of our 43 significant gene groups, 85% were significantly enriched, with 73% of the significant enrichments independently validated by gene ontology. The co-localization of TFBSs within RMs therefore likely explains much of the tight TFBS positional preferences near the TSS.


Assuntos
DNA/metabolismo , Regulação da Expressão Gênica , Elementos Reguladores de Transcrição/genética , Análise de Sequência de DNA/métodos , Fatores de Transcrição/metabolismo , Sítio de Iniciação de Transcrição , Sítios de Ligação , DNA/química , DNA/genética , Ontologia Genética , Humanos , Ligação Proteica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...