Búsqueda | Portal de Búsqueda de la BVS Enfermería

1.

Quantitative Activity Profile and Context Dependence of All Human 5' Splice Sites.

Wong, Mandy S; Kinney, Justin B; Krainer, Adrian R.

Mol Cell ; 71(6): 1012-1026.e3, 2018 09 20.

Artículo en Inglés | MEDLINE | ID: mdl-30174293

RESUMEN

Pre-mRNA splicing is an essential step in the expression of most human genes. Mutations at the 5' splice site (5'ss) frequently cause defective splicing and disease due to interference with the initial recognition of the exon-intron boundary by U1 small nuclear ribonucleoprotein (snRNP), a component of the spliceosome. Here, we use a massively parallel splicing assay (MPSA) in human cells to quantify the activity of all 32,768 unique 5'ss sequences (NNN/GYNNNN) in three different gene contexts. Our results reveal that although splicing efficiency is mostly governed by the 5'ss sequence, there are substantial differences in this efficiency across gene contexts. Among other uses, these MPSA measurements facilitate the prediction of 5'ss sequence variants that are likely to cause aberrant splicing. This approach provides a framework to assess potential pathogenic variants in the human genome and streamline the development of splicing-corrective therapies.

Asunto(s)

Empalme Alternativo/genética , Sitios de Empalme de ARN/genética , Sitios de Empalme de ARN/fisiología , Empalme Alternativo/fisiología , Proteínas Portadoras/genética , Secuencia Conservada/genética , Exones , Genes BRCA2 , Células HeLa , Humanos , Intrones , Mutación , Empalme del ARN/genética , Empalme del ARN/fisiología , ARN Nuclear Pequeño/fisiología , Ribonucleoproteína Nuclear Pequeña U1/fisiología , Empalmosomas , Proteína 1 para la Supervivencia de la Neurona Motora/genética , Factores de Elongación Transcripcional

2.

Higher-order epistasis and phenotypic prediction.

Zhou, Juannan; Wong, Mandy S; Chen, Wei-Chia; Krainer, Adrian R; Kinney, Justin B; McCandlish, David M.

Proc Natl Acad Sci U S A ; 119(39): e2204233119, 2022 09 27.

Artículo en Inglés | MEDLINE | ID: mdl-36129941

RESUMEN

Contemporary high-throughput mutagenesis experiments are providing an increasingly detailed view of the complex patterns of genetic interaction that occur between multiple mutations within a single protein or regulatory element. By simultaneously measuring the effects of thousands of combinations of mutations, these experiments have revealed that the genotype-phenotype relationship typically reflects not only genetic interactions between pairs of sites but also higher-order interactions among larger numbers of sites. However, modeling and understanding these higher-order interactions remains challenging. Here we present a method for reconstructing sequence-to-function mappings from partially observed data that can accommodate all orders of genetic interaction. The main idea is to make predictions for unobserved genotypes that match the type and extent of epistasis found in the observed data. This information on the type and extent of epistasis can be extracted by considering how phenotypic correlations change as a function of mutational distance, which is equivalent to estimating the fraction of phenotypic variance due to each order of genetic interaction (additive, pairwise, three-way, etc.). Using these estimated variance components, we then define an empirical Bayes prior that in expectation matches the observed pattern of epistasis and reconstruct the genotype-phenotype mapping by conducting Gaussian process regression under this prior. To demonstrate the power of this approach, we present an application to the antibody-binding domain GB1 and also provide a detailed exploration of a dataset consisting of high-throughput measurements for the splicing efficiency of human pre-mRNA [Formula: see text] splice sites, for which we also validate our model predictions via additional low-throughput experiments.

Asunto(s)

Epistasis Genética , Precursores del ARN , Teorema de Bayes , Mapeo Cromosómico , Biología Computacional , Genotipo , Humanos , Modelos Genéticos , Mutación , Fenotipo , Empalme del ARN

3.

Structural and mechanistic basis of σ-dependent transcriptional pausing.

Pukhrambam, Chirangini; Molodtsov, Vadim; Kooshkbaghi, Mahdi; Tareen, Ammar; Vu, Hoa; Skalenko, Kyle S; Su, Min; Yin, Zhou; Winkelman, Jared T; Kinney, Justin B; Ebright, Richard H; Nickels, Bryce E.

Proc Natl Acad Sci U S A ; 119(23): e2201301119, 2022 06 07.

Artículo en Inglés | MEDLINE | ID: mdl-35653571

RESUMEN

In σ-dependent transcriptional pausing, the transcription initiation factor σ, translocating with RNA polymerase (RNAP), makes sequence-specific proteinDNA interactions with a promoter-like sequence element in the transcribed region, inducing pausing. It has been proposed that, in σ-dependent pausing, the RNAP active center can access off-pathway "backtracked" states that are substrates for the transcript-cleavage factors of the Gre family and on-pathway "scrunched" states that mediate pause escape. Here, using site-specific proteinDNA photocrosslinking to define positions of the RNAP trailing and leading edges and of σ relative to DNA at the λPR' promoter, we show directly that σ-dependent pausing in the absence of GreB in vitro predominantly involves a state backtracked by 24 bp, and σ-dependent pausing in the presence of GreB in vitro and in vivo predominantly involves a state scrunched by 23 bp. Analogous experiments with a library of 47 (â¼16,000) transcribed-region sequences show that the state scrunched by 23 bpand only that stateis associated with the consensus sequence, T−3N−2Y−1G+1, (where −1 corresponds to the position of the RNA 3' end), which is identical to the consensus for pausing in initial transcription and which is related to the consensus for pausing in transcription elongation. Experiments with heteroduplex templates show that sequence information at position T−3 resides in the DNA nontemplate strand. A cryoelectron microscopy structure of a complex engaged in σ-dependent pausing reveals positions of DNA scrunching on the DNA nontemplate and template strands and suggests that position T−3 of the consensus sequence exerts its effects by facilitating scrunching.

Asunto(s)

ARN Polimerasas Dirigidas por ADN , Transcripción Genética , Microscopía por Crioelectrón , ADN , ARN Polimerasas Dirigidas por ADN/metabolismo , Escherichia coli/genética

4.

Field-theoretic density estimation for biological sequence space with applications to 5' splice site diversity and aneuploidy in cancer.

Chen, Wei-Chia; Zhou, Juannan; Sheltzer, Jason M; Kinney, Justin B; McCandlish, David M.

Proc Natl Acad Sci U S A ; 118(40)2021 10 05.

Artículo en Inglés | MEDLINE | ID: mdl-34599093

RESUMEN

Density estimation in sequence space is a fundamental problem in machine learning that is also of great importance in computational biology. Due to the discrete nature and large dimensionality of sequence space, how best to estimate such probability distributions from a sample of observed sequences remains unclear. One common strategy for addressing this problem is to estimate the probability distribution using maximum entropy (i.e., calculating point estimates for some set of correlations based on the observed sequences and predicting the probability distribution that is as uniform as possible while still matching these point estimates). Building on recent advances in Bayesian field-theoretic density estimation, we present a generalization of this maximum entropy approach that provides greater expressivity in regions of sequence space where data are plentiful while still maintaining a conservative maximum entropy character in regions of sequence space where data are sparse or absent. In particular, we define a family of priors for probability distributions over sequence space with a single hyperparameter that controls the expected magnitude of higher-order correlations. This family of priors then results in a corresponding one-dimensional family of maximum a posteriori estimates that interpolate smoothly between the maximum entropy estimate and the observed sample frequencies. To demonstrate the power of this method, we use it to explore the high-dimensional geometry of the distribution of 5' splice sites found in the human genome and to understand patterns of chromosomal abnormalities across human cancers.

Asunto(s)

Aneuploidia , Biología Computacional/métodos , Modelos Teóricos , Neoplasias/genética , Sitios de Empalme de ARN , Humanos , Probabilidad

5.

Promoter-sequence determinants and structural basis of primer-dependent transcription initiation in Escherichia coli.

Skalenko, Kyle S; Li, Lingting; Zhang, Yuanchao; Vvedenskaya, Irina O; Winkelman, Jared T; Cope, Alexander L; Taylor, Deanne M; Shah, Premal; Ebright, Richard H; Kinney, Justin B; Zhang, Yu; Nickels, Bryce E.

Proc Natl Acad Sci U S A ; 118(27)2021 07 06.

Artículo en Inglés | MEDLINE | ID: mdl-34187896

RESUMEN

Chemical modifications of RNA 5'-ends enable "epitranscriptomic" regulation, influencing multiple aspects of RNA fate. In transcription initiation, a large inventory of substrates compete with nucleoside triphosphates for use as initiating entities, providing an ab initio mechanism for altering the RNA 5'-end. In Escherichia coli cells, RNAs with a 5'-end hydroxyl are generated by use of dinucleotide RNAs as primers for transcription initiation, "primer-dependent initiation." Here, we use massively systematic transcript end readout (MASTER) to detect and quantify RNA 5'-ends generated by primer-dependent initiation for â¼410 (â¼1,000,000) promoter sequences in E. coli The results show primer-dependent initiation in E. coli involves any of the 16 possible dinucleotide primers and depends on promoter sequences in, upstream, and downstream of the primer binding site. The results yield a consensus sequence for primer-dependent initiation, YTSS-2NTSS-1NTSSWTSS+1, where TSS is the transcription start site, NTSS-1NTSS is the primer binding site, Y is pyrimidine, and W is A or T. Biochemical and structure-determination studies show that the base pair (nontemplate-strand base:template-strand base) immediately upstream of the primer binding site (Y:RTSS-2, where R is purine) exerts its effect through the base on the DNA template strand (RTSS-2) through interchain base stacking with the RNA primer. Results from analysis of a large set of natural, chromosomally encoded Ecoli promoters support the conclusions from MASTER. Our findings provide a mechanistic and structural description of how TSS-region sequence hard-codes not only the TSS position but also the potential for epitranscriptomic regulation through primer-dependent transcription initiation.

Asunto(s)

Cartilla de ADN/metabolismo , Escherichia coli/genética , Regiones Promotoras Genéticas , Iniciación de la Transcripción Genética , Secuencia de Bases , Sitios de Unión , Cromosomas Bacterianos/genética , Regulación Bacteriana de la Expresión Génica , ARN Mensajero/genética , ARN Mensajero/metabolismo , Sitio de Iniciación de la Transcripción

6.

Cell diversity and network dynamics in photosensitive human brain organoids.

Quadrato, Giorgia; Nguyen, Tuan; Macosko, Evan Z; Sherwood, John L; Min Yang, Sung; Berger, Daniel R; Maria, Natalie; Scholvin, Jorg; Goldman, Melissa; Kinney, Justin P; Boyden, Edward S; Lichtman, Jeff W; Williams, Ziv M; McCarroll, Steven A; Arlotta, Paola.

Nature ; 545(7652): 48-53, 2017 05 04.

Artículo en Inglés | MEDLINE | ID: mdl-28445462

RESUMEN

In vitro models of the developing brain such as three-dimensional brain organoids offer an unprecedented opportunity to study aspects of human brain development and disease. However, the cells generated within organoids and the extent to which they recapitulate the regional complexity, cellular diversity and circuit functionality of the brain remain undefined. Here we analyse gene expression in over 80,000 individual cells isolated from 31 human brain organoids. We find that organoids can generate a broad diversity of cells, which are related to endogenous classes, including cells from the cerebral cortex and the retina. Organoids could be developed over extended periods (more than 9 months), allowing for the establishment of relatively mature features, including the formation of dendritic spines and spontaneously active neuronal networks. Finally, neuronal activity within organoids could be controlled using light stimulation of photosensitive cells, which may offer a way to probe the functionality of human neuronal circuits using physiological sensory stimuli.

Asunto(s)

Encéfalo/citología , Vías Nerviosas/fisiología , Neurogénesis , Organoides/citología , Organoides/efectos de la radiación , Línea Celular , Separación Celular , Corteza Cerebral/citología , Corteza Cerebral/metabolismo , Dendritas , Perfilación de la Expresión Génica , Humanos , Técnicas In Vitro , Luz , Red Nerviosa/citología , Red Nerviosa/efectos de la radiación , Vías Nerviosas/citología , Vías Nerviosas/efectos de la radiación , Especificidad de Órganos , Organoides/crecimiento & desarrollo , Células Fotorreceptoras de Vertebrados/citología , Células Madre Pluripotentes/citología , Retina/citología , Retina/metabolismo , Análisis de Secuencia de ARN , Análisis de la Célula Individual , Factores de Tiempo , Transcriptoma

7.

Massively Parallel Assays and Quantitative Sequence-Function Relationships.

Kinney, Justin B; McCandlish, David M.

Annu Rev Genomics Hum Genet ; 20: 99-127, 2019 08 31.

Artículo en Inglés | MEDLINE | ID: mdl-31091417

RESUMEN

Over the last decade, a rich variety of massively parallel assays have revolutionized our understanding of how biological sequences encode quantitative molecular phenotypes. These assays include deep mutational scanning, high-throughput SELEX, and massively parallel reporter assays. Here, we review these experimental methods and how the data they produce can be used to quantitatively model sequence-function relationships. In doing so, we touch on a diverse range of topics, including the identification of clinically relevant genomic variants, the modeling of transcription factor binding to DNA, the functional and evolutionary landscapes of proteins, and cis-regulatory mechanisms in both transcription and mRNA splicing. We further describe a unified conceptual framework and a core set of mathematical modeling strategies that studies in these diverse areas can make use of. Finally, we highlight key aspects of experimental design and mathematical modeling that are important for the results of such studies to be interpretable and reproducible.

Asunto(s)

Epistasis Genética , Estudios de Asociación Genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Modelos Genéticos , Técnica SELEX de Producción de Aptámeros/métodos , ADN/genética , ADN/metabolismo , Genotipo , Humanos , Mutación , Fenotipo , Unión Proteica , Empalme del ARN , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Transcripción Genética

8.

Logomaker: beautiful sequence logos in Python.

Tareen, Ammar; Kinney, Justin B.

Bioinformatics ; 36(7): 2272-2274, 2020 04 01.

Artículo en Inglés | MEDLINE | ID: mdl-31821414

RESUMEN

SUMMARY: Sequence logos are visually compelling ways of illustrating the biological properties of DNA, RNA and protein sequences, yet it is currently difficult to generate and customize such logos within the Python programming environment. Here we introduce Logomaker, a Python API for creating publication-quality sequence logos. Logomaker can produce both standard and highly customized logos from either a matrix-like array of numbers or a multiple-sequence alignment. Logos are rendered as native matplotlib objects that are easy to stylize and incorporate into multi-panel figures. AVAILABILITY AND IMPLEMENTATION: Logomaker can be installed using the pip package manager and is compatible with both Python 2.7 and Python 3.6. Documentation is provided at http://logomaker.readthedocs.io; source code is available at http://github.com/jbkinney/logomaker.

Asunto(s)

Documentación , Programas Informáticos , ADN , Posición Específica de Matrices de Puntuación

9.

Systematic approach for dissecting the molecular mechanisms of transcriptional regulation in bacteria.

Belliveau, Nathan M; Barnes, Stephanie L; Ireland, William T; Jones, Daniel L; Sweredoski, Michael J; Moradian, Annie; Hess, Sonja; Kinney, Justin B; Phillips, Rob.

Proc Natl Acad Sci U S A ; 115(21): E4796-E4805, 2018 05 22.

Artículo en Inglés | MEDLINE | ID: mdl-29728462

RESUMEN

Gene regulation is one of the most ubiquitous processes in biology. However, while the catalog of bacterial genomes continues to expand rapidly, we remain ignorant about how almost all of the genes in these genomes are regulated. At present, characterizing the molecular mechanisms by which individual regulatory sequences operate requires focused efforts using low-throughput methods. Here, we take a first step toward multipromoter dissection and show how a combination of massively parallel reporter assays, mass spectrometry, and information-theoretic modeling can be used to dissect multiple bacterial promoters in a systematic way. We show this approach on both well-studied and previously uncharacterized promoters in the enteric bacterium Escherichia coli In all cases, we recover nucleotide-resolution models of promoter mechanism. For some promoters, including previously unannotated ones, the approach allowed us to further extract quantitative biophysical models describing input-output relationships. Given the generality of the approach presented here, it opens up the possibility of quantitatively dissecting the mechanisms of promoter function in E. coli and a wide range of other bacteria.

Asunto(s)

Proteínas de Escherichia coli/metabolismo , Escherichia coli/genética , Regulación Bacteriana de la Expresión Génica , Genoma Bacteriano , Proteínas Fluorescentes Verdes/metabolismo , Regiones Promotoras Genéticas , Escherichia coli/crecimiento & desarrollo , Escherichia coli/metabolismo , Proteínas de Escherichia coli/genética , Activación Transcripcional

10.

Mapping DNA sequence to transcription factor binding energy in vivo.

Barnes, Stephanie L; Belliveau, Nathan M; Ireland, William T; Kinney, Justin B; Phillips, Rob.

PLoS Comput Biol ; 15(2): e1006226, 2019 02.

Artículo en Inglés | MEDLINE | ID: mdl-30716072

RESUMEN

Despite the central importance of transcriptional regulation in biology, it has proven difficult to determine the regulatory mechanisms of individual genes, let alone entire gene networks. It is particularly difficult to decipher the biophysical mechanisms of transcriptional regulation in living cells and determine the energetic properties of binding sites for transcription factors and RNA polymerase. In this work, we present a strategy for dissecting transcriptional regulatory sequences using in vivo methods (massively parallel reporter assays) to formulate quantitative models that map a transcription factor binding site's DNA sequence to transcription factor-DNA binding energy. We use these models to predict the binding energies of transcription factor binding sites to within 1 kBT of their measured values. We further explore how such a sequence-energy mapping relates to the mechanisms of trancriptional regulation in various promoter contexts. Specifically, we show that our models can be used to design specific induction responses, analyze the effects of amino acid mutations on DNA sequence preference, and determine how regulatory context affects a transcription factor's sequence specificity.

Asunto(s)

Sitios de Unión/genética , Biología Computacional/métodos , Análisis de Secuencia de ADN/métodos , Mapeo Cromosómico , ADN/química , Regulación de la Expresión Génica/genética , Redes Reguladoras de Genes , Modelos Moleculares , Regiones Promotoras Genéticas/genética , Unión Proteica , Factores de Transcripción/química , Factores de Transcripción/metabolismo , Transcripción Genética/fisiología

11.

Concerted activities of Mcm4, Sld3, and Dbf4 in control of origin activation and DNA replication fork progression.

Sheu, Yi-Jun; Kinney, Justin B; Stillman, Bruce.

Genome Res ; 26(3): 315-30, 2016 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-26733669

RESUMEN

Eukaryotic chromosomes initiate DNA synthesis from multiple replication origins in a temporally specific manner during S phase. The replicative helicase Mcm2-7 functions in both initiation and fork progression and thus is an important target of regulation. Mcm4, a helicase subunit, possesses an unstructured regulatory domain that mediates control from multiple kinase signaling pathways, including the Dbf4-dependent Cdc7 kinase (DDK). Following replication stress in S phase, Dbf4 and Sld3, an initiation factor and essential target of Cyclin-Dependent Kinase (CDK), are targets of the checkpoint kinase Rad53 for inhibition of initiation from origins that have yet to be activated, so-called late origins. Here, whole-genome DNA replication profile analysis is used to access under various conditions the effect of mutations that alter the Mcm4 regulatory domain and the Rad53 targets, Sld3 and Dbf4. Late origin firing occurs under genotoxic stress when the controls on Mcm4, Sld3, and Dbf4 are simultaneously eliminated. The regulatory domain of Mcm4 plays an important role in the timing of late origin firing, both in an unperturbed S phase and in dNTP limitation. Furthermore, checkpoint control of Sld3 impacts fork progression under replication stress. This effect is parallel to the role of the Mcm4 regulatory domain in monitoring fork progression. Hypomorph mutations in sld3 are suppressed by a mcm4 regulatory domain mutation. Thus, in response to cellular conditions, the functions executed by Sld3, Dbf4, and the regulatory domain of Mcm4 intersect to control origin firing and replication fork progression, thereby ensuring genome stability.

Asunto(s)

Proteínas de Ciclo Celular/metabolismo , Replicación del ADN , Proteínas de Unión al ADN/metabolismo , Componente 4 del Complejo de Mantenimiento de Minicromosoma/metabolismo , Origen de Réplica , Proteínas de Saccharomyces cerevisiae/metabolismo , Alquilantes/farmacología , Alelos , Quinasa de Punto de Control 2/metabolismo , Cromosomas Fúngicos , Quinasas Ciclina-Dependientes/metabolismo , Replicación del ADN/efectos de los fármacos , Hidroxiurea/farmacología , Componente 4 del Complejo de Mantenimiento de Minicromosoma/genética , Mutación , Fenotipo , Fosforilación , Saccharomyces cerevisiae/efectos de los fármacos , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Eliminación de Secuencia , Transducción de Señal

12.

Automated in vivo patch-clamp evaluation of extracellular multielectrode array spike recording capability.

Allen, Brian D; Moore-Kochlacs, Caroline; Bernstein, Jacob G; Kinney, Justin P; Scholvin, Jorg; Seoane, Luís F; Chronopoulos, Chris; Lamantia, Charlie; Kodandaramaiah, Suhasa B; Tegmark, Max; Boyden, Edward S.

J Neurophysiol ; 120(5): 2182-2200, 2018 11 01.

Artículo en Inglés | MEDLINE | ID: mdl-29995597

RESUMEN

Much innovation is currently aimed at improving the number, density, and geometry of electrodes on extracellular multielectrode arrays for in vivo recording of neural activity in the mammalian brain. To choose a multielectrode array configuration for a given neuroscience purpose, or to reveal design principles of future multielectrode arrays, it would be useful to have a systematic way of evaluating the spike recording capability of such arrays. We describe an automated system that performs robotic patch-clamp recording of a neuron being simultaneously recorded via an extracellular multielectrode array. By recording a patch-clamp data set from a neuron while acquiring extracellular recordings from the same neuron, we can evaluate how well the extracellular multielectrode array captures the spiking information from that neuron. To demonstrate the utility of our system, we show that it can provide data from the mammalian cortex to evaluate how the spike sorting performance of a close-packed extracellular multielectrode array is affected by bursting, which alters the shape and amplitude of spikes in a train. We also introduce an algorithmic framework to help evaluate how the number of electrodes in a multielectrode array affects spike sorting, examining how adding more electrodes yields data that can be spike sorted more easily. Our automated methodology may thus help with the evaluation of new electrode designs and configurations, providing empirical guidance on the kinds of electrodes that will be optimal for different brain regions, cell types, and species, for improving the accuracy of spike sorting. NEW & NOTEWORTHY We present an automated strategy for evaluating the spike recording performance of an extracellular multielectrode array, by enabling simultaneous recording of a neuron with both such an array and with patch clamp. We use our robot and accompanying algorithms to evaluate the performance of multielectrode arrays on supporting spike sorting.

Asunto(s)

Potenciales de Acción , Automatización/métodos , Técnicas de Placa-Clamp/métodos , Corteza Visual/fisiología , Animales , Automatización/instrumentación , Excitabilidad Cortical , Electrodos/normas , Electroencefalografía/instrumentación , Electroencefalografía/métodos , Espacio Extracelular/fisiología , Masculino , Ratones , Ratones Endogámicos C57BL , Neuronas/fisiología , Técnicas de Placa-Clamp/instrumentación , Corteza Visual/citología

13.

Density Estimation on Small Data Sets.

Chen, Wei-Chia; Tareen, Ammar; Kinney, Justin B.

Phys Rev Lett ; 121(16): 160605, 2018 Oct 19.

Artículo en Inglés | MEDLINE | ID: mdl-30387642

RESUMEN

How might a smooth probability distribution be estimated with accurately quantified uncertainty from a limited amount of sampled data? Here we describe a field-theoretic approach that addresses this problem remarkably well in one dimension, providing an exact nonparametric Bayesian posterior without relying on tunable parameters or large-data approximations. Strong non-Gaussian constraints, which require a nonperturbative treatment, are found to play a major role in reducing distribution uncertainty. A software implementation of this method is provided.

14.

Equitability, mutual information, and the maximal information coefficient.

Kinney, Justin B; Atwal, Gurinder S.

Proc Natl Acad Sci U S A ; 111(9): 3354-9, 2014 Mar 04.

Artículo en Inglés | MEDLINE | ID: mdl-24550517

RESUMEN

How should one quantify the strength of association between two random variables without bias for relationships of a specific form? Despite its conceptual simplicity, this notion of statistical "equitability" has yet to receive a definitive mathematical formalization. Here we argue that equitability is properly formalized by a self-consistency condition closely related to Data Processing Inequality. Mutual information, a fundamental quantity in information theory, is shown to satisfy this equitability criterion. These findings are at odds with the recent work of Reshef et al. [Reshef DN, et al. (2011) Science 334(6062):1518-1524], which proposed an alternative definition of equitability and introduced a new statistic, the "maximal information coefficient" (MIC), said to satisfy equitability in contradistinction to mutual information. These conclusions, however, were supported only with limited simulation evidence, not with mathematical arguments. Upon revisiting these claims, we prove that the mathematical definition of equitability proposed by Reshef et al. cannot be satisfied by any (nontrivial) dependence measure. We also identify artifacts in the reported simulation evidence. When these artifacts are removed, estimates of mutual information are found to be more equitable than estimates of MIC. Mutual information is also observed to have consistently higher statistical power than MIC. We conclude that estimating mutual information provides a natural (and often practical) way to equitably quantify statistical associations in large datasets.

Asunto(s)

Interpretación Estadística de Datos , Teoría de la Información , Estadística como Asunto/métodos , Sesgo , Matemática

15.

Domain within the helicase subunit Mcm4 integrates multiple kinase signals to control DNA replication initiation and fork progression.

Sheu, Yi-Jun; Kinney, Justin B; Lengronne, Armelle; Pasero, Philippe; Stillman, Bruce.

Proc Natl Acad Sci U S A ; 111(18): E1899-908, 2014 May 06.

Artículo en Inglés | MEDLINE | ID: mdl-24740181

RESUMEN

Eukaryotic DNA synthesis initiates from multiple replication origins and progresses through bidirectional replication forks to ensure efficient duplication of the genome. Temporal control of initiation from origins and regulation of replication fork functions are important aspects for maintaining genome stability. Multiple kinase-signaling pathways are involved in these processes. The Dbf4-dependent Cdc7 kinase (DDK), cyclin-dependent kinase (CDK), and Mec1, the yeast Ataxia telangiectasia mutated/Ataxia telangiectasia mutated Rad3-related checkpoint regulator, all target the structurally disordered N-terminal serine/threonine-rich domain (NSD) of mini-chromosome maintenance subunit 4 (Mcm4), a subunit of the mini-chromosome maintenance (MCM) replicative helicase complex. Using whole-genome replication profile analysis and single-molecule DNA fiber analysis, we show that under replication stress the temporal pattern of origin activation and DNA replication fork progression are altered in cells with mutations within two separate segments of the Mcm4 NSD. The proximal segment of the NSD residing next to the DDK-docking domain mediates repression of late-origin firing by checkpoint signals because in its absence late origins become active despite an elevated DNA damage-checkpoint response. In contrast, the distal segment of the NSD at the N terminus plays no role in the temporal pattern of origin firing but has a strong influence on replication fork progression and on checkpoint signaling. Both fork progression and checkpoint response are regulated by the phosphorylation of the canonical CDK sites at the distal NSD. Together, our data suggest that the eukaryotic MCM helicase contains an intrinsic regulatory domain that integrates multiple signals to coordinate origin activation and replication fork progression under stress conditions.

Asunto(s)

Replicación del ADN/fisiología , ADN de Hongos/biosíntesis , ADN de Hongos/química , Componente 4 del Complejo de Mantenimiento de Minicromosoma/química , Componente 4 del Complejo de Mantenimiento de Minicromosoma/metabolismo , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/metabolismo , Puntos de Control del Ciclo Celular , Proteínas de Ciclo Celular/metabolismo , Quinasas Ciclina-Dependientes/metabolismo , Genoma Fúngico , Péptidos y Proteínas de Señalización Intracelular/metabolismo , Componente 4 del Complejo de Mantenimiento de Minicromosoma/genética , Mutación , Conformación de Ácido Nucleico , Fosforilación , Proteínas Serina-Treonina Quinasas/metabolismo , Estructura Terciaria de Proteína , Subunidades de Proteína , Origen de Réplica , Saccharomyces cerevisiae/citología , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Transducción de Señal

16.

Closed-loop, ultraprecise, automated craniotomies.

Pak, Nikita; Siegle, Joshua H; Kinney, Justin P; Denman, Daniel J; Blanche, Timothy J; Boyden, Edward S.

J Neurophysiol ; 113(10): 3943-53, 2015 Jun 01.

Artículo en Inglés | MEDLINE | ID: mdl-25855700

RESUMEN

A large array of neuroscientific techniques, including in vivo electrophysiology, two-photon imaging, optogenetics, lesions, and microdialysis, require access to the brain through the skull. Ideally, the necessary craniotomies could be performed in a repeatable and automated fashion, without damaging the underlying brain tissue. Here we report that when drilling through the skull a stereotypical increase in conductance can be observed when the drill bit passes through the skull base. We present an architecture for a robotic device that can perform this algorithm, along with two implementations--one based on homebuilt hardware and one based on commercially available hardware--that can automatically detect such changes and create large numbers of precise craniotomies, even in a single skull. We also show that this technique can be adapted to automatically drill cranial windows several millimeters in diameter. Such robots will not only be useful for helping neuroscientists perform both small and large craniotomies more reliably but can also be used to create precisely aligned arrays of craniotomies with stereotaxic registration to standard brain atlases that would be difficult to drill by hand.

Asunto(s)

Encéfalo/cirugía , Sistemas de Computación , Craneotomía/instrumentación , Craneotomía/métodos , Potenciales de Acción , Algoritmos , Animales , Encéfalo/fisiología , Ratones , Tomografía por Rayos X

17.

Parametric inference in the large data limit using maximally informative models.

Kinney, Justin B; Atwal, Gurinder S.

Neural Comput ; 26(4): 637-53, 2014 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-24479782

RESUMEN

Motivated by data-rich experiments in transcriptional regulation and sensory neuroscience, we consider the following general problem in statistical inference: when exposed to a high-dimensional signal S, a system of interest computes a representation R of that signal, which is then observed through a noisy measurement M. From a large number of signals and measurements, we wish to infer the "filter" that maps S to R. However, the standard method for solving such problems, likelihood-based inference, requires perfect a priori knowledge of the "noise function" mapping R to M. In practice such noise functions are usually known only approximately, if at all, and using an incorrect noise function will typically bias the inferred filter. Here we show that in the large data limit, this need for a precharacterized noise function can be circumvented by searching for filters that instead maximize the mutual information I[M; R] between observed measurements and predicted representations. Moreover, if the correct filter lies within the space of filters being explored, maximizing mutual information becomes equivalent to simultaneously maximizing every dependence measure that satisfies the data processing inequality. It is important to note that maximizing mutual information will typically leave a small number of directions in parameter space unconstrained. We term these directions diffeomorphic modes and present an equation that allows these modes to be derived systematically. The presence of diffeomorphic modes reflects a fundamental and nontrivial substructure within parameter space, one that is obscured by standard likelihood-based inference.

Asunto(s)

Funciones de Verosimilitud , Modelos Estadísticos , Algoritmos , Animales , Redes Reguladoras de Genes , Humanos

18.

Symmetry, gauge freedoms, and the interpretability of sequence-function relationships.

Posfai, Anna; McCandlish, David M; Kinney, Justin B.

bioRxiv ; 2024 Jun 24.

Artículo en Inglés | MEDLINE | ID: mdl-38798625

RESUMEN

Quantitative models that describe how biological sequences encode functional activities are ubiquitous in modern biology. One important aspect of these models is that they commonly exhibit gauge freedoms, i.e., directions in parameter space that do not affect model predictions. In physics, gauge freedoms arise when physical theories are formulated in ways that respect fundamental symmetries. However, the connections that gauge freedoms in models of sequence-function relationships have to the symmetries of sequence space have yet to be systematically studied. Here we study the gauge freedoms of models that respect a specific symmetry of sequence space: the group of position-specific character permutations. We find that gauge freedoms arise when model parameters transform under redundant irreducible matrix representations of this group. Based on this finding, we describe an "embedding distillation" procedure that enables analytic calculation of the number of independent gauge freedoms, as well as efficient computation of a sparse basis for the space of gauge freedoms. We also study how parameter transformation behavior affects parameter interpretability. We find that in many (and possibly all) nontrivial models, the ability to interpret individual model parameters as quantifying intrinsic allelic effects requires that gauge freedoms be present. This finding establishes an incompatibility between two distinct notions of parameter interpretability. Our work thus advances the understanding of symmetries, gauge freedoms, and parameter interpretability in sequence-function relationships. Significance Statement: Gauge freedoms-diections in parameter space that do not affect model predictions-are ubiquitous in mathematical models of biological sequence-function relationships. But in contrast to theoretical physics, where gauge freedoms play a central role, little is understood about the mathematical properties of gauge freedoms in models of sequence-function relationships. Here we identify a connection between specific symmetries of sequence space and the gauge freedoms present in a large class of commonly used models for sequence-function relationships. We show that this connection can be used to perform useful mathematical computations, and we discuss the impact of model transformation properties on parameter interpretability. The results fill a major gap in the understanding of quantitative sequence-function relationships.

19.

Interpreting cis-regulatory mechanisms from genomic deep neural networks using surrogate models.

Seitz, Evan E; McCandlish, David M; Kinney, Justin B; Koo, Peter K.

bioRxiv ; 2024 Mar 02.

Artículo en Inglés | MEDLINE | ID: mdl-38013993

RESUMEN

Deep neural networks (DNNs) have greatly advanced the ability to predict genome function from sequence. Interpreting genomic DNNs in terms of biological mechanisms, however, remains difficult. Here we introduce SQUID, a genomic DNN interpretability framework based on surrogate modeling. SQUID approximates genomic DNNs in user-specified regions of sequence space using surrogate models, i.e., simpler models that are mechanistically interpretable. Importantly, SQUID removes the confounding effects that nonlinearities and heteroscedastic noise in functional genomics data can have on model interpretation. Benchmarking analysis on multiple genomic DNNs shows that SQUID, when compared to established interpretability methods, identifies motifs that are more consistent across genomic loci and yields improved single-nucleotide variant-effect predictions. SQUID also supports surrogate models that quantify epistatic interactions within and between cis-regulatory elements. SQUID thus advances the ability to mechanistically interpret genomic DNNs.

20.

Gauge fixing for sequence-function relationships.

Posfai, Anna; Zhou, Juannan; McCandlish, David M; Kinney, Justin B.

bioRxiv ; 2024 Jun 24.

Artículo en Inglés | MEDLINE | ID: mdl-38798671

RESUMEN

Quantitative models of sequence-function relationships are ubiquitous in computational biology, e.g., for modeling the DNA binding of transcription factors or the fitness landscapes of proteins. Interpreting these models, however, is complicated by the fact that the values of model parameters can often be changed without affecting model predictions. Before the values of model parameters can be meaningfully interpreted, one must remove these degrees of freedom (called "gauge freedoms" in physics) by imposing additional constraints (a process called "fixing the gauge"). However, strategies for fixing the gauge of sequence-function relationships have received little attention. Here we derive an analytically tractable family of gauges for a large class of sequence-function relationships. These gauges are derived in the context of models with all-order interactions, but an important subset of these gauges can be applied to diverse types of models, including additive models, pairwise-interaction models, and models with higher-order interactions. Many commonly used gauges are special cases of gauges within this family. We demonstrate the utility of this family of gauges by showing how different choices of gauge can be used both to explore complex activity landscapes and to reveal simplified models that are approximately correct within localized regions of sequence space. The results provide practical gauge-fixing strategies and demonstrate the utility of gauge-fixing for model exploration and interpretation. Significance Statement: Computational biology relies heavily on mathematical models that predict biological activities from DNA, RNA, or protein sequences. Interpreting the parameters of these models, however, remains difficult. Here we address a core challenge for model interpretation-the presence of 'gauge freedoms', i.e., ways of changing model parameters without affecting model predictions. The results unify commonly used methods for eliminating gauge freedoms and show how these methods can be used to simplify complex models in localized regions of sequence space. This work thus overcomes a major obstacle in the interpretation of quantitative sequence-function relationships.

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA