RESUMO
Arthrogryposis multiplex congenita (AMC) is heterogeneous group of disorders characterized by non-progressive joint contractures from birth that involve more than 1 part of the body. There are various etiologies for AMC including genetic and environmental depends on the specific type, however, for most types, the cause is not fully understood. We previously reported large Israeli Arab kindred consisting of 16 patients affected with AMC neuropathic type, and mapped the locus to a 5.5 cM interval on chromosome 5qter. Using whole exome sequencing, we have now identified homozygous pathogenic variant in the ERGIC1 gene within the previously defined linked region. ERGIC1 encodes a cycling membrane protein which has a possible role in transport between endoplasmic reticulum and Golgi. We further show that this mutation was absent in more than 200 samples of healthy unrelated individuals of the Israeli Arab population. Thus, our findings expand the spectrum of hereditary AMC and suggest that abnormalities in protein trafficking may underlie AMC-related disorders.
Assuntos
Artrogripose/genética , Predisposição Genética para Doença/genética , Mutação , Proteínas de Transporte Vesicular/genética , Sequência de Aminoácidos , Árabes , Artrogripose/patologia , Sequência de Bases , Consanguinidade , Feminino , Homozigoto , Humanos , Israel , Masculino , Linhagem , Sequenciamento do Exoma/métodosRESUMO
MOTIVATION: Genetic networks regulate key processes in living cells. Various methods have been suggested to reconstruct network architecture from gene expression data. However, most approaches are based on qualitative models that provide only rough approximations of the underlying events, and lack the quantitative aspects that are critical for understanding the proper function of biomolecular systems. RESULTS: We present fine-grained dynamical models of gene transcription and develop methods for reconstructing them from gene expression data within the framework of a generative probabilistic model. Unlike previous works, we employ quantitative transcription rates, and simultaneously estimate both the kinetic parameters that govern these rates, and the activity levels of unobserved regulators that control them. We apply our approach to expression datasets from yeast and show that we can learn the unknown regulator activity profiles, as well as the binding affinity parameters. We also introduce a novel structure learning algorithm, and demonstrate its power to accurately reconstruct the regulatory network from those datasets.
Assuntos
Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica/fisiologia , Modelos Genéticos , Proteoma/metabolismo , Elementos Reguladores de Transcrição/genética , Transdução de Sinais/genética , Fatores de Transcrição/genética , Sítios de Ligação , Mapeamento Cromossômico/métodos , Simulação por Computador , Bases de Dados de Proteínas , Ligação Proteica , Análise de Sequência de DNA/métodos , Ativação Transcricional/fisiologiaRESUMO
Constantly improving gene expression profiling technologies are expected to provide understanding and insight into cancer-related cellular processes. Gene expression data is also expected to significantly aid in the development of efficient cancer diagnosis and classification platforms. In this work we examine three sets of gene expression data measured across sets of tumor(s) and normal clinical samples: The first set consists of 2,000 genes, measured in 62 epithelial colon samples (Alon et al., 1999). The second consists of approximately equal to 100,000 clones, measured in 32 ovarian samples (unpublished extension of data set described in Schummer et al. (1999)). The third set consists of approximately equal to 7,100 genes, measured in 72 bone marrow and peripheral blood samples (Golub et al, 1999). We examine the use of scoring methods, measuring separation of tissue type (e.g., tumors from normals) using individual gene expression levels. These are then coupled with high-dimensional classification methods to assess the classification power of complete expression profiles. We present results of performing leave-one-out cross validation (LOOCV) experiments on the three data sets, employing nearest neighbor classifier, SVM (Cortes and Vapnik, 1995), AdaBoost (Freund and Schapire, 1997) and a novel clustering-based classification technique. As tumor samples can differ from normal samples in their cell-type composition, we also perform LOOCV experiments using appropriately modified sets of genes, attempting to eliminate the resulting bias. We demonstrate success rate of at least 90% in tumor versus normal classification, using sets of selected genes, with, as well as without, cellular-contamination-related members. These results are insensitive to the exact selection mechanism, over a certain range.
Assuntos
Perfilação da Expressão Gênica/estatística & dados numéricos , Análise por Conglomerados , Neoplasias do Colo/genética , Biologia Computacional , Bases de Dados Factuais , Feminino , Humanos , Leucemia/genética , Neoplasias Ovarianas/genética , Distribuição TecidualRESUMO
DNA hybridization arrays simultaneously measure the expression level for thousands of genes. These measurements provide a "snapshot" of transcription levels within the cell. A major challenge in computational biology is to uncover, from such measurements, gene/protein interactions and key biological features of cellular systems. In this paper, we propose a new framework for discovering interactions between genes based on multiple expression measurements. This framework builds on the use of Bayesian networks for representing statistical dependencies. A Bayesian network is a graph-based model of joint multivariate probability distributions that captures properties of conditional independence between variables. Such models are attractive for their ability to describe complex stochastic processes and because they provide a clear methodology for learning from (noisy) observations. We start by showing how Bayesian networks can describe interactions between genes. We then describe a method for recovering gene interactions from microarray data using tools for learning Bayesian networks. Finally, we demonstrate this method on the S. cerevisiae cell-cycle measurements of Spellman et al. (1998).