Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
PLoS Comput Biol ; 20(5): e1012132, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38805561

RESUMEN

Accurate models describing the relationship between genotype and phenotype are necessary in order to understand and predict how mutations to biological sequences affect the fitness and evolution of living organisms. The apparent abundance of epistasis (genetic interactions), both between and within genes, complicates this task and how to build mechanistic models that incorporate epistatic coefficients (genetic interaction terms) is an open question. The Walsh-Hadamard transform represents a rigorous computational framework for calculating and modeling epistatic interactions at the level of individual genotypic values (known as genetical, biological or physiological epistasis), and can therefore be used to address fundamental questions related to sequence-to-function encodings. However, one of its main limitations is that it can only accommodate two alleles (amino acid or nucleotide states) per sequence position. In this paper we provide an extension of the Walsh-Hadamard transform that allows the calculation and modeling of background-averaged epistasis (also known as ensemble epistasis) in genetic landscapes with an arbitrary number of states per position (20 for amino acids, 4 for nucleotides, etc.). We also provide a recursive formula for the inverse matrix and then derive formulae to directly extract any element of either matrix without having to rely on the computationally intensive task of constructing or inverting large matrices. Finally, we demonstrate the utility of our theory by using it to model epistasis within both simulated and empirical multiallelic fitness landscapes, revealing that both pairwise and higher-order genetic interactions are enriched between physically interacting positions.


Asunto(s)
Epistasis Genética , Modelos Genéticos , Epistasis Genética/genética , Biología Computacional/métodos , Algoritmos , Mutación/genética , Genotipo
2.
Nature ; 626(7999): 643-652, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38109937

RESUMEN

Thousands of proteins have been validated genetically as therapeutic targets for human diseases1. However, very few have been successfully targeted, and many are considered 'undruggable'. This is particularly true for proteins that function via protein-protein interactions-direct inhibition of binding interfaces is difficult and requires the identification of allosteric sites. However, most proteins have no known allosteric sites, and a comprehensive allosteric map does not exist for any protein. Here we address this shortcoming by charting multiple global atlases of inhibitory allosteric communication in KRAS. We quantified the effects of more than 26,000 mutations on the folding of KRAS and its binding to six interaction partners. Genetic interactions in double mutants enabled us to perform biophysical measurements at scale, inferring more than 22,000 causal free energy changes. These energy landscapes quantify how mutations tune the binding specificity of a signalling protein and map the inhibitory allosteric sites for an important therapeutic target. Allosteric propagation is particularly effective across the central ß-sheet of KRAS, and multiple surface pockets are genetically validated as allosterically active, including a distal pocket in the C-terminal lobe of the protein. Allosteric mutations typically inhibit binding to all tested effectors, but they can also change the binding specificity, revealing the regulatory, evolutionary and therapeutic potential to tune pathway activation. Using the approach described here, it should be possible to rapidly and comprehensively identify allosteric target sites in many proteins.


Asunto(s)
Sitio Alostérico , Pliegue de Proteína , Proteínas Proto-Oncogénicas p21(ras) , Humanos , Regulación Alostérica/efectos de los fármacos , Regulación Alostérica/genética , Sitio Alostérico/efectos de los fármacos , Sitio Alostérico/genética , Mutación , Unión Proteica , Proteínas Proto-Oncogénicas p21(ras)/antagonistas & inhibidores , Proteínas Proto-Oncogénicas p21(ras)/química , Proteínas Proto-Oncogénicas p21(ras)/genética , Proteínas Proto-Oncogénicas p21(ras)/metabolismo , Reproducibilidad de los Resultados , Especificidad por Sustrato/efectos de los fármacos , Especificidad por Sustrato/genética , Termodinámica
3.
Nature ; 604(7904): 175-183, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35388192

RESUMEN

Allosteric communication between distant sites in proteins is central to biological regulation but still poorly characterized, limiting understanding, engineering and drug development1-6. An important reason for this is the lack of methods to comprehensively quantify allostery in diverse proteins. Here we address this shortcoming and present a method that uses deep mutational scanning to globally map allostery. The approach uses an efficient experimental design to infer en masse the causal biophysical effects of mutations by quantifying multiple molecular phenotypes-here we examine binding and protein abundance-in multiple genetic backgrounds and fitting thermodynamic models using neural networks. We apply the approach to two of the most common protein interaction domains found in humans, an SH3 domain and a PDZ domain, to produce comprehensive atlases of allosteric communication. Allosteric mutations are abundant, with a large mutational target space of network-altering 'edgetic' variants. Mutations are more likely to be allosteric closer to binding interfaces, at glycine residues and at specific residues connecting to an opposite surface within the PDZ domain. This general approach of quantifying mutational effects for multiple molecular phenotypes and in multiple genetic backgrounds should enable the energetic and allosteric landscapes of many proteins to be rapidly and comprehensively mapped.


Asunto(s)
Sitio Alostérico , Dominios PDZ , Proteínas , Regulación Alostérica/genética , Dominios PDZ/genética , Unión Proteica/genética , Proteínas/química , Termodinámica
4.
PLoS Genet ; 17(2): e1009353, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-33524037

RESUMEN

RNA structures are dynamic. As a consequence, mutational effects can be hard to rationalize with reference to a single static native structure. We reasoned that deep mutational scanning experiments, which couple molecular function to fitness, should capture mutational effects across multiple conformational states simultaneously. Here, we provide a proof-of-principle that this is indeed the case, using the self-splicing group I intron from Tetrahymena thermophila as a model system. We comprehensively mutagenized two 4-bp segments of the intron. These segments first come together to form the P1 extension (P1ex) helix at the 5' splice site. Following cleavage at the 5' splice site, the two halves of the helix dissociate to allow formation of an alternative helix (P10) at the 3' splice site. Using an in vivo reporter system that couples splicing activity to fitness in E. coli, we demonstrate that fitness is driven jointly by constraints on P1ex and P10 formation. We further show that patterns of epistasis can be used to infer the presence of intramolecular pleiotropy. Using a machine learning approach that allows quantification of mutational effects in a genotype-specific manner, we demonstrate that the fitness landscape can be deconvoluted to implicate P1ex or P10 as the effective genetic background in which molecular fitness is compromised or enhanced. Our results highlight deep mutational scanning as a tool to study alternative conformational states, with the capacity to provide critical insights into the structure, evolution and evolvability of RNAs as dynamic ensembles. Our findings also suggest that, in the future, deep mutational scanning approaches might help reverse-engineer multiple alternative or successive conformations from a single fitness landscape.


Asunto(s)
Intrones/genética , Mutación , Empalme del ARN , ARN Protozoario/genética , ARN/genética , Tetrahymena thermophila/genética , Secuencia de Bases , Evolución Molecular , Aptitud Genética , Pleiotropía Genética , Genotipo , Cinética , Aprendizaje Automático , Conformación de Ácido Nucleico , ARN/química , Sitios de Empalme de ARN/genética
5.
Elife ; 102021 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-33522485

RESUMEN

Plaques of the amyloid beta (Aß) peptide are a pathological hallmark of Alzheimer's disease (AD), the most common form of dementia. Mutations in Aß also cause familial forms of AD (fAD). Here, we use deep mutational scanning to quantify the effects of >14,000 mutations on the aggregation of Aß. The resulting genetic landscape reveals mechanistic insights into fibril nucleation, including the importance of charge and gatekeeper residues in the disordered region outside of the amyloid core in preventing nucleation. Strikingly, unlike computational predictors and previous measurements, the empirical nucleation scores accurately identify all known dominant fAD mutations in Aß, genetically validating that the mechanism of nucleation in a cell-based assay is likely to be very similar to the mechanism that causes the human disease. These results provide the first comprehensive atlas of how mutations alter the formation of any amyloid fibril and a resource for the interpretation of genetic variation in Aß.


Alzheimer's disease is the most common form of dementia, affecting more than 50 million people worldwide. Despite more than 400 clinical trials, there are still no effective drugs that can prevent or treat the disease. A common target in Alzheimer's disease trials is a small protein called amyloid beta. Amyloid beta proteins are 'sticky' molecules. In the brains of people with Alzheimer's disease, they join to form first small aggregates and then long chains called fibrils, a process which is toxic to neurons. Specific mutations in the gene for amyloid beta are known to cause rare, aggressive forms of Alzheimer's disease that typically affect people in their fifties or sixties. But these are not the only mutations that can occur in amyloid beta. In principle, any part of the protein could undergo mutation. And given the size of the human population, it is likely that each of these mutations exists in someone alive today. Seuma et al. reasoned that studying these mutations could help us understand the process by which amyloid beta forms new aggregates. Using an approach called deep mutational scanning, Seuma et al. mutated each point in the protein, one at a time. This produced more than 14,000 different versions of amyloid beta. Seuma et al. then measured how quickly these mutants were able to form aggregates by introducing them into yeast cells. All the mutations known to cause early-onset Alzheimer's disease accelerated amyloid beta aggregation in the yeast. But the results also revealed previously unknown properties that control how fast aggregation occurs. In addition, they highlighted a number of positions in the amyloid beta sequence that act as 'gatekeepers'. In healthy brains, these gatekeepers prevent amyloid beta proteins from sticking together. When mutated, they drive the protein to form aggregates. This comprehensive dataset will help researchers understand how proteins form toxic aggregates, which could in turn help them find ways to prevent this from happening. By providing an 'atlas' of all possible amyloid beta mutations, the dataset will also help clinicians interpret any new mutations they encounter in patients. By showing whether or not a mutation speeds up aggregation, the atlas will help clinicians predict whether that mutation increases the risk of Alzheimer's disease.


Asunto(s)
Enfermedad de Alzheimer/genética , Péptidos beta-Amiloides/genética , Amiloide/metabolismo , Mutación , Análisis Mutacional de ADN , Secuenciación de Nucleótidos de Alto Rendimiento , Plásmidos , Saccharomyces cerevisiae/metabolismo
6.
Genome Biol ; 21(1): 207, 2020 08 17.
Artículo en Inglés | MEDLINE | ID: mdl-32799905

RESUMEN

Deep mutational scanning (DMS) enables multiplexed measurement of the effects of thousands of variants of proteins, RNAs, and regulatory elements. Here, we present a customizable pipeline, DiMSum, that represents an end-to-end solution for obtaining variant fitness and error estimates from raw sequencing data. A key innovation of DiMSum is the use of an interpretable error model that captures the main sources of variability arising in DMS workflows, outperforming previous methods. DiMSum is available as an R/Bioconda package and provides summary reports to help researchers diagnose common DMS pathologies and take remedial steps in their analyses.


Asunto(s)
Análisis Mutacional de ADN/métodos , Técnicas de Diagnóstico Molecular/métodos , Mutación , Biología Computacional , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Modelos Genéticos , Reacción en Cadena de la Polimerasa , Proteínas/genética , Programas Informáticos
7.
Nat Commun ; 10(1): 4162, 2019 09 13.
Artículo en Inglés | MEDLINE | ID: mdl-31519910

RESUMEN

Insoluble protein aggregates are the hallmarks of many neurodegenerative diseases. For example, aggregates of TDP-43 occur in nearly all cases of amyotrophic lateral sclerosis (ALS). However, whether aggregates cause cellular toxicity is still not clear, even in simpler cellular systems. We reasoned that deep mutagenesis might be a powerful approach to disentangle the relationship between aggregation and toxicity. We generated >50,000 mutations in the prion-like domain (PRD) of TDP-43 and quantified their toxicity in yeast cells. Surprisingly, mutations that increase hydrophobicity and aggregation strongly decrease toxicity. In contrast, toxic variants promote the formation of dynamic liquid-like condensates. Mutations have their strongest effects in a hotspot that genetic interactions reveal to be structured in vivo, illustrating how mutagenesis can probe the in vivo structures of unstructured proteins. Our results show that aggregation of TDP-43 is not harmful but protects cells, most likely by titrating the protein away from a toxic liquid-like phase.


Asunto(s)
Biología Computacional/métodos , Genómica/métodos , Biología de Sistemas/métodos , Esclerosis Amiotrófica Lateral/genética , Esclerosis Amiotrófica Lateral/metabolismo , Humanos , Interacciones Hidrofóbicas e Hidrofílicas , Mutación/genética , Priones/genética , Priones/metabolismo
8.
Cell Syst ; 5(5): 471-484.e4, 2017 11 22.
Artículo en Inglés | MEDLINE | ID: mdl-29102610

RESUMEN

Isogenic cells in a common environment show substantial cell-to-cell variation in gene expression, often referred to as "expression noise." Here, we use multiple single-cell RNA-sequencing datasets to identify features associated with high or low expression noise in mouse embryonic stem cells. These include the core promoter architecture of a gene, with CpG island promoters and a TATA box associated with low and high noise, respectively. High noise is also associated with "conflicting" chromatin states-the absence of transcription-associated histone modifications or the presence of repressive ones in active genes. Genes regulated by pluripotency factors through super-enhancers show high and correlated expression variability, consistent with fluctuations in the pluripotent state. Together, our results provide an integrated view of how core promoters, chromatin, regulation, and pluripotency fluctuations contribute to the variability of gene expression across individual stem cells.


Asunto(s)
Células Madre Embrionarias/fisiología , Expresión Génica/genética , Animales , Cromatina/genética , Islas de CpG/genética , Código de Histonas/genética , Histonas/genética , Ratones , Células Madre Pluripotentes/fisiología , Regiones Promotoras Genéticas/genética , Transcripción Genética/genética
9.
Nature ; 544(7648): 59-64, 2017 04 06.
Artículo en Inglés | MEDLINE | ID: mdl-28289288

RESUMEN

The folding of genomic DNA from the beads-on-a-string-like structure of nucleosomes into higher-order assemblies is crucially linked to nuclear processes. Here we calculate 3D structures of entire mammalian genomes using data from a new chromosome conformation capture procedure that allows us to first image and then process single cells. The technique enables genome folding to be examined at a scale of less than 100 kb, and chromosome structures to be validated. The structures of individual topological-associated domains and loops vary substantially from cell to cell. By contrast, A and B compartments, lamina-associated domains and active enhancers and promoters are organized in a consistent way on a genome-wide basis in every cell, suggesting that they could drive chromosome and genome folding. By studying genes regulated by pluripotency factor and nucleosome remodelling deacetylase (NuRD), we illustrate how the determination of single-cell genome structure provides a new approach for investigating biological processes.


Asunto(s)
Ensamble y Desensamble de Cromatina , Genoma , Imagen Molecular/métodos , Nucleosomas/química , Análisis de la Célula Individual/métodos , Animales , Factor de Unión a CCCTC , Proteínas de Ciclo Celular/metabolismo , Ensamble y Desensamble de Cromatina/genética , Proteínas Cromosómicas no Histona/metabolismo , Cromosomas de los Mamíferos/química , Cromosomas de los Mamíferos/genética , Cromosomas de los Mamíferos/metabolismo , ADN/química , ADN/genética , ADN/metabolismo , Elementos de Facilitación Genéticos , Fase G1 , Regulación de la Expresión Génica , Redes Reguladoras de Genes , Genoma/genética , Haploidia , Complejo Desacetilasa y Remodelación del Nucleosoma Mi-2/metabolismo , Ratones , Modelos Moleculares , Conformación Molecular , Imagen Molecular/normas , Células Madre Embrionarias de Ratones/citología , Células Madre Embrionarias de Ratones/metabolismo , Nucleosomas/genética , Nucleosomas/metabolismo , Regiones Promotoras Genéticas , Proteínas Represoras/metabolismo , Reproducibilidad de los Resultados , Análisis de la Célula Individual/normas , Cohesinas
10.
Genome Res ; 25(4): 504-13, 2015 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-25677180

RESUMEN

In addition to mediating sister chromatid cohesion during the cell cycle, the cohesin complex associates with CTCF and with active gene regulatory elements to form long-range interactions between its binding sites. Genome-wide chromosome conformation capture had shown that cohesin's main role in interphase genome organization is in mediating interactions within architectural chromosome compartments, rather than specifying compartments per se. However, it remains unclear how cohesin-mediated interactions contribute to the regulation of gene expression. We have found that the binding of CTCF and cohesin is highly enriched at enhancers and in particular at enhancer arrays or "super-enhancers" in mouse thymocytes. Using local and global chromosome conformation capture, we demonstrate that enhancer elements associate not just in linear sequence, but also in 3D, and that spatial enhancer clustering is facilitated by cohesin. The conditional deletion of cohesin from noncycling thymocytes preserved enhancer position, H3K27ac, H4K4me1, and enhancer transcription, but weakened interactions between enhancers. Interestingly, ∼ 50% of deregulated genes reside in the vicinity of enhancer elements, suggesting that cohesin regulates gene expression through spatial clustering of enhancer elements. We propose a model for cohesin-dependent gene regulation in which spatial clustering of enhancer elements acts as a unified mechanism for both enhancer-promoter "connections" and "insulation."


Asunto(s)
Proteínas de Ciclo Celular/genética , Proteínas Cromosómicas no Histona/genética , Elementos de Facilitación Genéticos/genética , Regulación de la Expresión Génica/genética , Familia de Multigenes/genética , Proteínas Represoras/metabolismo , Timocitos/citología , Animales , Sitios de Unión/genética , Factor de Unión a CCCTC , Células Cultivadas , Histonas/genética , Ratones , Regiones Promotoras Genéticas/genética , Unión Proteica/genética , Cohesinas
11.
Elife ; 3: e02626, 2014 Oct 03.
Artículo en Inglés | MEDLINE | ID: mdl-25279814

RESUMEN

As exome sequencing gives way to genome sequencing, the need to interpret the function of regulatory DNA becomes increasingly important. To test whether evolutionary conservation of cis-regulatory modules (CRMs) gives insight into human gene regulation, we determined transcription factor (TF) binding locations of four liver-essential TFs in liver tissue from human, macaque, mouse, rat, and dog. Approximately, two thirds of the TF-bound regions fell into CRMs. Less than half of the human CRMs were found as a CRM in the orthologous region of a second species. Shared CRMs were associated with liver pathways and disease loci identified by genome-wide association studies. Recurrent rare human disease causing mutations at the promoters of several blood coagulation and lipid metabolism genes were also identified within CRMs shared in multiple species. This suggests that multi-species analyses of experimentally determined combinatorial TF binding will help identify genomic regions critical for tissue-specific gene control.


Asunto(s)
Hígado/metabolismo , Mamíferos/metabolismo , Transducción de Señal , Factores de Transcripción/metabolismo , Animales , Coagulación Sanguínea/genética , Inmunoprecipitación de Cromatina , Redes Reguladoras de Genes , Estudio de Asociación del Genoma Completo , Genómica , Humanos , Metabolismo de los Lípidos/genética , Masculino , Anotación de Secuencia Molecular , Especificidad de Órganos , Filogenia , Polimorfismo de Nucleótido Simple/genética , Unión Proteica , Secuencias Reguladoras de Ácidos Nucleicos/genética , Especificidad de la Especie
12.
Genome Res ; 23(12): 2066-77, 2013 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-24002784

RESUMEN

Chromosome conformation capture approaches have shown that interphase chromatin is partitioned into spatially segregated Mb-sized compartments and sub-Mb-sized topological domains. This compartmentalization is thought to facilitate the matching of genes and regulatory elements, but its precise function and mechanistic basis remain unknown. Cohesin controls chromosome topology to enable DNA repair and chromosome segregation in cycling cells. In addition, cohesin associates with active enhancers and promoters and with CTCF to form long-range interactions important for gene regulation. Although these findings suggest an important role for cohesin in genome organization, this role has not been assessed on a global scale. Unexpectedly, we find that architectural compartments are maintained in noncycling mouse thymocytes after genetic depletion of cohesin in vivo. Cohesin was, however, required for specific long-range interactions within compartments where cohesin-regulated genes reside. Cohesin depletion diminished interactions between cohesin-bound sites, whereas alternative interactions between chromatin features associated with transcriptional activation and repression became more prominent, with corresponding changes in gene expression. Our findings indicate that cohesin-mediated long-range interactions facilitate discrete gene expression states within preexisting chromosomal compartments.


Asunto(s)
Proteínas de Ciclo Celular/fisiología , Cromatina/genética , Cromatina/metabolismo , Proteínas Cromosómicas no Histona/fisiología , Regulación de la Expresión Génica , Proteínas Represoras/metabolismo , Timocitos/metabolismo , Animales , Factor de Unión a CCCTC , Ciclo Celular/genética , Cromosomas de los Mamíferos , Proteínas de Unión al ADN , Dosificación de Gen , Genoma , Modelos Lineales , Ratones , Proteínas Nucleares/metabolismo , Fosfoproteínas/metabolismo , Regiones Promotoras Genéticas , Secuencias Reguladoras de Ácidos Nucleicos , Factores de Transcripción/metabolismo , Cohesinas
13.
Genome Biol ; 14(12): R148, 2013 Dec 31.
Artículo en Inglés | MEDLINE | ID: mdl-24380390

RESUMEN

BACKGROUND: The genomic binding of CTCF is highly conserved across mammals, but the mechanisms that underlie its stability are poorly understood. One transcription factor known to functionally interact with CTCF in the context of X-chromosome inactivation is the ubiquitously expressed YY1. Because combinatorial transcription factor binding can contribute to the evolutionary stabilization of regulatory regions, we tested whether YY1 and CTCF co-binding could in part account for conservation of CTCF binding. RESULTS: Combined analysis of CTCF and YY1 binding in lymphoblastoid cell lines from seven primates, as well as in mouse and human livers, reveals extensive genome-wide co-localization specifically at evolutionarily stable CTCF-bound regions. CTCF-YY1 co-bound regions resemble regions bound by YY1 alone, as they enrich for active histone marks, RNA polymerase II and transcription factor binding. Although these highly conserved, transcriptionally active CTCF-YY1 co-bound regions are often promoter-proximal, gene-distal regions show similar molecular features. CONCLUSIONS: Our results reveal that these two ubiquitously expressed, multi-functional zinc-finger proteins collaborate in functionally active regions to stabilize one another's genome-wide binding across primate evolution.


Asunto(s)
Evolución Molecular , Primates/genética , Proteínas Represoras/metabolismo , Factor de Transcripción YY1/metabolismo , Animales , Factor de Unión a CCCTC , Línea Celular , Genoma , Humanos , Ratones , Proteínas Represoras/química
14.
Genome Res ; 22(11): 2163-75, 2012 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-22780989

RESUMEN

The cohesin protein complex contributes to transcriptional regulation in a CTCF-independent manner by colocalizing with master regulators at tissue-specific loci. The regulation of transcription involves the concerted action of multiple transcription factors (TFs) and cohesin's role in this context of combinatorial TF binding remains unexplored. To investigate cohesin-non-CTCF (CNC) binding events in vivo we mapped cohesin and CTCF, as well as a collection of tissue-specific and ubiquitous transcriptional regulators using ChIP-seq in primary mouse liver. We observe a positive correlation between the number of distinct TFs bound and the presence of CNC sites. In contrast to regions of the genome where cohesin and CTCF colocalize, CNC sites coincide with the binding of master regulators and enhancer-markers and are significantly associated with liver-specific expressed genes. We also show that cohesin presence partially explains the commonly observed discrepancy between TF motif score and ChIP signal. Evidence from these statistical analyses in wild-type cells, and comparisons to maps of TF binding in Rad21-cohesin haploinsufficient mouse liver, suggests that cohesin helps to stabilize large protein-DNA complexes. Finally, we observe that the presence of mirrored CTCF binding events at promoters and their nearby cohesin-bound enhancers is associated with elevated expression levels.


Asunto(s)
Proteínas de Ciclo Celular/metabolismo , Proteínas Cromosómicas no Histona/metabolismo , Redes Reguladoras de Genes , Transcripción Genética , Animales , Factor de Unión a CCCTC , Inmunoprecipitación de Cromatina , Proteínas de Unión al ADN , Genoma , Haploinsuficiencia , Ratones , Ratones Endogámicos C57BL , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Especificidad de Órganos , Fosfoproteínas/genética , Fosfoproteínas/metabolismo , Regiones Promotoras Genéticas , Unión Proteica , Proteínas Represoras/metabolismo , Análisis de Secuencia de ADN , Factores de Transcripción/metabolismo , Regulación hacia Arriba , Cohesinas
15.
BMC Bioinformatics ; 12: 29, 2011 Jan 24.
Artículo en Inglés | MEDLINE | ID: mdl-21261946

RESUMEN

BACKGROUND: In order to interpret the results obtained from a microarray experiment, researchers often shift focus from analysis of individual differentially expressed genes to analyses of sets of genes. These gene-set analysis (GSA) methods use previously accumulated biological knowledge to group genes into sets and then aim to rank these gene sets in a way that reflects their relative importance in the experimental situation in question. We suspect that the presence of paralogs affects the ability of GSA methods to accurately identify the most important sets of genes for subsequent research. RESULTS: We show that paralogs, which typically have high sequence identity and similar molecular functions, also exhibit high correlation in their expression patterns. We investigate this correlation as a potential confounding factor common to current GSA methods using Indygene http://www.cbio.uct.ac.za/indygene, a web tool that reduces a supplied list of genes so that it includes no pairwise paralogy relationships above a specified sequence similarity threshold. We use the tool to reanalyse previously published microarray datasets and determine the potential utility of accounting for the presence of paralogs. CONCLUSIONS: The Indygene tool efficiently removes paralogy relationships from a given dataset and we found that such a reduction, performed prior to GSA, has the ability to generate significantly different results that often represent novel and plausible biological hypotheses. This was demonstrated for three different GSA approaches when applied to the reanalysis of previously published microarray datasets and suggests that the redundancy and non-independence of paralogs is an important consideration when dealing with GSA methodologies.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Programas Informáticos , Algoritmos , Proteínas de Arabidopsis/genética , Modelos Estadísticos , Interfaz Usuario-Computador
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...