Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
PLoS Genet ; 6(1): e1000829, 2010 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-20107516

RESUMEN

The clustering of transcription factor binding sites in developmental enhancers and the apparent preferential conservation of clustered sites have been widely interpreted as proof that spatially constrained physical interactions between transcription factors are required for regulatory function. However, we show here that selection on the composition of enhancers alone, and not their internal structure, leads to the accumulation of clustered sites with evolutionary dynamics that suggest they are preferentially conserved. We simulated the evolution of idealized enhancers from Drosophila melanogaster constrained to contain only a minimum number of binding sites for one or more factors. Under this constraint, mutations that destroy an existing binding site are tolerated only if a compensating site has emerged elsewhere in the enhancer. Overlapping sites, such as those frequently observed for the activator Bicoid and repressor Krüppel, had significantly longer evolutionary half-lives than isolated sites for the same factors. This leads to a substantially higher density of overlapping sites than expected by chance and the appearance that such sites are preferentially conserved. Because D. melanogaster (like many other species) has a bias for deletions over insertions, sites tended to become closer together over time, leading to an overall clustering of sites in the absence of any selection for clustered sites. Since this effect is strongest for the oldest sites, clustered sites also incorrectly appear to be preferentially conserved. Following speciation, sites tend to be closer together in all descendent species than in their common ancestors, violating the common assumption that shared features of species' genomes reflect their ancestral state. Finally, we show that selection on binding site composition alone recapitulates the observed number of overlapping and closely neighboring sites in real D. melanogaster enhancers. Thus, this study calls into question the common practice of inferring "cis-regulatory grammars" from the organization and evolutionary dynamics of developmental enhancers.


Asunto(s)
Drosophila melanogaster/genética , Elementos de Facilitación Genéticos , Evolución Molecular , Animales , Secuencia de Bases , Sitios de Unión , Simulación por Computador , Secuencia Conservada , Proteínas de Drosophila/química , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/química , Drosophila melanogaster/metabolismo , Modelos Genéticos , Unión Proteica , Selección Genética , Factores de Transcripción/química , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
2.
Syst Rev ; 10(1): 97, 2021 04 02.
Artículo en Inglés | MEDLINE | ID: mdl-33810798

RESUMEN

BACKGROUND: Systematic Reviews (SR), studies of studies, use a formal process to evaluate the quality of scientific literature and determine ensuing effectiveness from qualifying articles to establish consensus findings around a hypothesis. Their value is increasing as the conduct and publication of research and evaluation has expanded and the process of identifying key insights becomes more time consuming. Text analytics and machine learning (ML) techniques may help overcome this problem of scale while still maintaining the level of rigor expected of SRs. METHODS: In this article, we discuss an approach that uses existing examples of SRs to build and test a method for assisting the SR title and abstract pre-screening by reducing the initial pool of potential articles down to articles that meet inclusion criteria. Our approach differs from previous approaches to using ML as a SR tool in that it incorporates ML configurations guided by previously conducted SRs, and human confirmation on ML predictions of relevant articles during multiple iterative reviews on smaller tranches of citations. We applied the tailored method to a new SR review effort to validate performance. RESULTS: The case study test of the approach proved a sensitivity (recall) in finding relevant articles during down selection that may rival many traditional processes and show ability to overcome most type II errors. The study achieved a sensitivity of 99.5% (213 out of 214) of total relevant articles while only conducting a human review of 31% of total articles available for review. CONCLUSIONS: We believe this iterative method can help overcome bias in initial ML model training by having humans reinforce ML models with new and relevant information, and is an applied step towards transfer learning for ML in SR.


Asunto(s)
Diabetes Mellitus , Aprendizaje Automático , Humanos , Tamizaje Masivo , Proyectos de Investigación
3.
Birth Defects Res ; 112(18): 1450-1460, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32815300

RESUMEN

In 2016, Centers for Disease Control and Prevention (CDC) established surveillance of pregnant women with Zika virus infection and their infants in the U.S. states, territories, and freely associated states. To identify cases of Zika-associated birth defects, subject matter experts review data reported from medical records of completed pregnancies to identify findings that meet surveillance case criteria (manual review). The volume of reported data increased over the course of the Zika virus outbreak in the Americas, challenging the resources of the surveillance system to conduct manual review. Machine learning was explored as a possible method for predicting case status. Ensemble models (using machine learning algorithms including support vector machines, logistic regression, random forests, k-nearest neighbors, gradient boosted trees, and decision trees) were developed and trained using data collected from January 2016-October 2017. Models were developed separately, on data from the U.S. states, non-Puerto Rico territories, and freely associated states (referred to as the U.S. Zika Pregnancy and Infant Registry [USZPIR]) and data from Puerto Rico (referred to as the Zika Active Pregnancy Surveillance System [ZAPSS]) due to differences in data collection and storage methods. The machine learning models demonstrated high sensitivity for identifying cases while potentially reducing volume of data for manual review (USZPIR: 96% sensitivity, 25% reduction in review volume; ZAPSS: 97% sensitivity, 50% reduction in review volume). Machine learning models show potential for identifying cases of Zika-associated birth defects and for reducing volume of data for manual review, a potential benefit in other public health emergency response settings.


Asunto(s)
Complicaciones Infecciosas del Embarazo , Infección por el Virus Zika , Virus Zika , Femenino , Humanos , Lactante , Aprendizaje Automático , Vigilancia de la Población , Embarazo , Complicaciones Infecciosas del Embarazo/epidemiología , Estados Unidos/epidemiología , Infección por el Virus Zika/diagnóstico , Infección por el Virus Zika/epidemiología
4.
G3 (Bethesda) ; 6(10): 3419-3430, 2016 10 13.
Artículo en Inglés | MEDLINE | ID: mdl-27527791

RESUMEN

The regulation of gene expression controls development, and changes in this regulation often contribute to phenotypic evolution. Drosophila pigmentation is a model system for studying evolutionary changes in gene regulation, with differences in expression of pigmentation genes such as yellow that correlate with divergent pigment patterns among species shown to be caused by changes in cis- and trans-regulation. Currently, much more is known about the cis-regulatory component of divergent yellow expression than the trans-regulatory component, in part because very few trans-acting regulators of yellow expression have been identified. This study aims to improve our understanding of the trans-acting control of yellow expression by combining yeast-one-hybrid and RNAi screens for transcription factors binding to yellow cis-regulatory sequences and affecting abdominal pigmentation in adults, respectively. Of the 670 transcription factors included in the yeast-one-hybrid screen, 45 showed evidence of binding to one or more sequence fragments tested from the 5' intergenic and intronic yellow sequences from D. melanogaster, D. pseudoobscura, and D. willistoni, suggesting that they might be direct regulators of yellow expression. Of the 670 transcription factors included in the yeast-one-hybrid screen, plus another TF previously shown to be genetically upstream of yellow, 125 were also tested using RNAi, and 32 showed altered abdominal pigmentation. Nine transcription factors were identified in both screens, including four nuclear receptors related to ecdysone signaling (Hr78, Hr38, Hr46, and Eip78C). This finding suggests that yellow expression might be directly controlled by nuclear receptors influenced by ecdysone during early pupal development when adult pigmentation is forming.


Asunto(s)
Proteínas de Drosophila/genética , Drosophila/genética , Regulación de la Expresión Génica , Estudios de Asociación Genética , Pigmentación/genética , Interferencia de ARN , Técnicas del Sistema de Dos Híbridos , Animales , Drosophila/metabolismo , Ecdisona/metabolismo , Elementos de Facilitación Genéticos , Estudios de Asociación Genética/métodos , Pruebas Genéticas , Mutación , Fenotipo , Unión Proteica , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
5.
PLoS One ; 9(10): e110808, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25354084

RESUMEN

Trace quantities of contaminating DNA are widespread in the laboratory environment, but their presence has received little attention in the context of high throughput sequencing. This issue is highlighted by recent works that have rested controversial claims upon sequencing data that appear to support the presence of unexpected exogenous species. I used reads that preferentially aligned to alternate genomes to infer the distribution of potential contaminant species in a set of independent sequencing experiments. I confirmed that dilute samples are more exposed to contaminating DNA, and, focusing on four single-cell sequencing experiments, found that these contaminants appear to originate from a wide diversity of clades. Although negative control libraries prepared from 'blank' samples recovered the highest-frequency contaminants, low-frequency contaminants, which appeared to make heterogeneous contributions to samples prepared in parallel within a single experiment, were not well controlled for. I used these results to show that, despite heavy replication and plausible controls, contamination can explain all of the observations used to support a recent claim that complete genes pass from food to human blood. Contamination must be considered a potential source of signals of exogenous species in sequencing data, even if these signals are replicated in independent experiments, vary across conditions, or indicate a species which seems a priori unlikely to contaminate. Negative control libraries processed in parallel are essential to control for contaminant DNAs, but their limited ability to recover low-frequency contaminants must be recognized.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/normas , Análisis de Secuencia de ADN/normas , Contaminación de ADN , ADN Bacteriano/análisis , Escherichia coli/genética , Humanos
6.
PLoS One ; 8(1): e53778, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23320104

RESUMEN

The short length and high degeneracy of sites recognized by DNA-binding transcription factors limit the amount of information they can carry, and individual sites are rarely sufficient to mediate the regulation of specific targets. Computational analysis of microbial genomes has suggested that many factors function optimally when in a particular orientation and position with respect to their target promoters. To investigate this further, we developed and trained spatial models of binding site positioning and applied them to the genome of the yeast Saccharomyces cerevisiae. We found evidence of non-random organization of sites within promoters, differences in binding site density, or both for thirty-eight transcription factors. We show that these signatures allow transcription factors with substantial differences in binding site specificity to share similar promoter specificities. We illustrate how spatial information dictating the positioning and density of binding sites can in principle increase the information available to the organism for differentiating a transcription factor's true targets, and we indicate how this information could potentially be leveraged for the same purpose in bioinformatic analyses.


Asunto(s)
Regiones Promotoras Genéticas , Proteínas de Saccharomyces cerevisiae/biosíntesis , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Factores de Transcripción/biosíntesis , Regulación hacia Arriba/genética , Algoritmos , Sitios de Unión/genética , ADN de Hongos/genética , ADN de Hongos/metabolismo , Eliminación de Gen , Genes Fúngicos , Cadenas de Markov , Proteínas de Saccharomyces cerevisiae/genética , Factores de Transcripción/genética
7.
Pac Symp Biocomput ; : 489-500, 2008.
Artículo en Inglés | MEDLINE | ID: mdl-18229710

RESUMEN

The identification of transcription factor binding sites commonly relies on the interpretation of scores generated by a position weight matrix. These scores are presumed to reflect on the affinity of the transcription factor for the bound sequence. In almost all applications, a cutoff score is chosen to distinguish between functional and non-functional binding sites. This cutoff is generally based on statistical rather than biological criteria. Furthermore, given the variety of transcription factors, it is unlikely that the use of a common statistical threshold for all transcription factors is appropriate. In order to incorporate biological information into the choice of cutoff score, we developed a simple evolutionary model that assumes that transcription factor binding sites evolve to maintain an affinity greater than some factor-specific threshold. We then compared patterns of substitution in binding sites predicted by this model at different thresholds to patterns of substitution observed at sites bound in vivo by transcription factors in S. cerevisiae. Assuming that the cutoff value that gives the best fit between the observed and predicted values will optimally distinguish functional and non-functional sites, we discovered substantial heterogeneity for appropriate cutoff values among factors. While commonly used thresholds seem appropriate for many factors, some factors appear to function at cutoffs satisfied commonly in the genome. This evidence was corroborated by local patterns of rate variation for examples of stringent and lenient p-value cutoffs. Our analysis further highlights the necessity of taking a factor-specific approach to binding site identification.


Asunto(s)
Evolución Molecular , Modelos Genéticos , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Sitios de Unión/genética , Biología Computacional , ADN de Hongos/genética , ADN de Hongos/metabolismo , Modelos Estadísticos , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo
8.
Proc Natl Acad Sci U S A ; 100(26): 15661-5, 2003 Dec 23.
Artículo en Inglés | MEDLINE | ID: mdl-14660792

RESUMEN

Although the evolutionary significance of gene duplication has long been recognized, it remains unclear what determines gene duplicability. We find protein complexity to be an important determinant because the proportion of unduplicated genes (P) increases with the number of subunits in a protein. However, P is high (>or=65%) for both monomers and multimers in yeast, but

Asunto(s)
Evolución Molecular , Duplicación de Gen , Proteínas/genética , Dimerización , Humanos , Proteínas/química , Especificidad de la Especie , Levaduras/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA