Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
Add more filters










Publication year range
1.
PLoS Comput Biol ; 19(11): e1010979, 2023 Nov.
Article in English | MEDLINE | ID: mdl-38011281

ABSTRACT

A central challenge in population genetics is the detection of genomic footprints of selection. As machine learning tools including convolutional neural networks (CNNs) have become more sophisticated and applied more broadly, these provide a logical next step for increasing our power to learn and detect such patterns; indeed, CNNs trained on simulated genome sequences have recently been shown to be highly effective at this task. Unlike previous approaches, which rely upon human-crafted summary statistics, these methods are able to be applied directly to raw genomic data, allowing them to potentially learn new signatures that, if well-understood, could improve the current theory surrounding selective sweeps. Towards this end, we examine a representative CNN from the literature, paring it down to the minimal complexity needed to maintain comparable performance; this low-complexity CNN allows us to directly interpret the learned evolutionary signatures. We then validate these patterns in more complex models using metrics that evaluate feature importance. Our findings reveal that preprocessing steps, which determine how the population genetic data is presented to the model, play a central role in the learned prediction method. This results in models that mimic previously-defined summary statistics; in one case, the summary statistic itself achieves similarly high accuracy. For evolutionary processes that are less well understood than selective sweeps, we hope this provides an initial framework for using CNNs in ways that go beyond simply achieving high classification performance. Instead, we propose that CNNs might be useful as tools for learning novel patterns that can translate to easy-to-implement summary statistics available to a wider community of researchers.


Subject(s)
Machine Learning , Neural Networks, Computer , Humans , Genomics , Biological Evolution , Genetics, Population
2.
bioRxiv ; 2023 Nov 06.
Article in English | MEDLINE | ID: mdl-37986808

ABSTRACT

Mapping the functional human genome and impact of genetic variants is often limited to European-descendent population samples. To aid in overcoming this limitation, we measured gene expression using RNA sequencing in lymphoblastoid cell lines (LCLs) from 599 individuals from six African populations to identify novel transcripts including those not represented in the hg38 reference genome. We used whole genomes from the 1000 Genomes Project and 164 Maasai individuals to identify 8,881 expression and 6,949 splicing quantitative trait loci (eQTLs/sQTLs), and 2,611 structural variants associated with gene expression (SV-eQTLs). We further profiled chromatin accessibility using ATAC-Seq in a subset of 100 representative individuals, to identity chromatin accessibility quantitative trait loci (caQTLs) and allele-specific chromatin accessibility, and provide predictions for the functional effect of 78.9 million variants on chromatin accessibility. Using this map of eQTLs and caQTLs we fine-mapped GWAS signals for a range of complex diseases. Combined, this work expands global functional genomic data to identify novel transcripts, functional elements and variants, understand population genetic history of molecular quantitative trait loci, and further resolve the genetic basis of multiple human traits and disease.

3.
PLoS Comput Biol ; 19(5): e1011175, 2023 May.
Article in English | MEDLINE | ID: mdl-37235578

ABSTRACT

Machine learning tools have proven useful across biological disciplines, allowing researchers to draw conclusions from large datasets, and opening up new opportunities for interpreting complex and heterogeneous biological data. Alongside the rapid growth of machine learning, there have also been growing pains: some models that appear to perform well have later been revealed to rely on features of the data that are artifactual or biased; this feeds into the general criticism that machine learning models are designed to optimize model performance over the creation of new biological insights. A natural question arises: how do we develop machine learning models that are inherently interpretable or explainable? In this manuscript, we describe the SWIF(r) reliability score (SRS), a method building on the SWIF(r) generative framework that reflects the trustworthiness of the classification of a specific instance. The concept of the reliability score has the potential to generalize to other machine learning methods. We demonstrate the utility of the SRS when faced with common challenges in machine learning including: 1) an unknown class present in testing data that was not present in training data, 2) systemic mismatch between training and testing data, and 3) instances of testing data that have missing values for some attributes. We explore these applications of the SRS using a range of biological datasets, from agricultural data on seed morphology, to 22 quantitative traits in the UK Biobank, and population genetic simulations and 1000 Genomes Project data. With each of these examples, we demonstrate how the SRS can allow researchers to interrogate their data and training approach thoroughly, and to pair their domain-specific knowledge with powerful machine-learning frameworks. We also compare the SRS to related tools for outlier and novelty detection, and find that it has comparable performance, with the advantage of being able to operate when some data are missing. The SRS, and the broader discussion of interpretable scientific machine learning, will aid researchers in the biological machine learning space as they seek to harness the power of machine learning without sacrificing rigor and biological insight.


Subject(s)
Genome , Machine Learning , Reproducibility of Results
6.
Nat Neurosci ; 23(8): 981-991, 2020 08.
Article in English | MEDLINE | ID: mdl-32514136

ABSTRACT

Salient experiences are often relived in the mind. Human neuroimaging studies suggest that such experiences drive activity patterns in visual association cortex that are subsequently reactivated during quiet waking. Nevertheless, the circuit-level consequences of such reactivations remain unclear. Here, we imaged hundreds of neurons in visual association cortex across days as mice learned a visual discrimination task. Distinct patterns of neurons were activated by different visual cues. These same patterns were subsequently reactivated during quiet waking in darkness, with higher reactivation rates during early learning and for food-predicting versus neutral cues. Reactivations involving ensembles of neurons encoding both the food cue and the reward predicted strengthening of next-day functional connectivity of participating neurons, while the converse was observed for reactivations involving ensembles encoding only the food cue. We propose that task-relevant neurons strengthen while task-irrelevant neurons weaken their dialog with the network via participation in distinct flavors of reactivation.


Subject(s)
Discrimination Learning/physiology , Neuronal Plasticity/physiology , Neurons/physiology , Visual Cortex/physiology , Visual Perception/physiology , Animals , Cues , Food , Food Deprivation/physiology , Mice , Reward
7.
Neuron ; 105(6): 1094-1111.e10, 2020 03 18.
Article in English | MEDLINE | ID: mdl-31955944

ABSTRACT

Interoception, the sense of internal bodily signals, is essential for physiological homeostasis, cognition, and emotions. While human insular cortex (InsCtx) is implicated in interoception, the cellular and circuit mechanisms remain unclear. We imaged mouse InsCtx neurons during two physiological deficiency states: hunger and thirst. InsCtx ongoing activity patterns reliably tracked the gradual return to homeostasis but not changes in behavior. Accordingly, while artificial induction of hunger or thirst in sated mice via activation of specific hypothalamic neurons (AgRP or SFOGLUT) restored cue-evoked food- or water-seeking, InsCtx ongoing activity continued to reflect physiological satiety. During natural hunger or thirst, food or water cues rapidly and transiently shifted InsCtx population activity to the future satiety-related pattern. During artificial hunger or thirst, food or water cues further shifted activity beyond the current satiety-related pattern. Together with circuit-mapping experiments, these findings suggest that InsCtx integrates visceral-sensory signals of current physiological state with hypothalamus-gated amygdala inputs that signal upcoming ingestion of food or water to compute a prediction of future physiological state.


Subject(s)
Cerebral Cortex/physiology , Hunger/physiology , Interoception/physiology , Thirst/physiology , Agouti-Related Protein/metabolism , Animals , Clozapine/analogs & derivatives , Clozapine/pharmacology , Cues , Female , Hypothalamus/physiology , Male , Mice , Mice, Transgenic , Neural Pathways/physiology , Optical Imaging , Optogenetics , Subfornical Organ/physiology
8.
Nat Commun ; 9(1): 703, 2018 02 19.
Article in English | MEDLINE | ID: mdl-29459739

ABSTRACT

Statistical methods for identifying adaptive mutations from population genetic data face several obstacles: assessing the significance of genomic outliers, integrating correlated measures of selection into one analytic framework, and distinguishing adaptive variants from hitchhiking neutral variants. Here, we introduce SWIF(r), a probabilistic method that detects selective sweeps by learning the distributions of multiple selection statistics under different evolutionary scenarios and calculating the posterior probability of a sweep at each genomic site. SWIF(r) is trained using simulations from a user-specified demographic model and explicitly models the joint distributions of selection statistics, thereby increasing its power to both identify regions undergoing sweeps and localize adaptive mutations. Using array and exome data from 45 ‡Khomani San hunter-gatherers of southern Africa, we identify an enrichment of adaptive signals in genes associated with metabolism and obesity. SWIF(r) provides a transparent probabilistic framework for localizing beneficial mutations that is extensible to a variety of evolutionary scenarios.


Subject(s)
Adaptation, Physiological/genetics , Algorithms , Genome, Human/genetics , Models, Genetic , Mutation , Africa, Southern , Genetics, Population , Human Genome Project , Humans , Polymorphism, Single Nucleotide , Probability , Selection, Genetic
9.
Curr Opin Genet Dev ; 41: 140-149, 2016 Dec.
Article in English | MEDLINE | ID: mdl-27743539

ABSTRACT

Human population genomic studies have repeatedly observed a decrease in heterozygosity and an increase in linkage disequilibrium with geographic distance from Africa. While multiple demographic models can generate these patterns, many studies invoke the serial founder effect model, in which populations expand from a single origin and each new population's founders represent a subset of genetic variation in the previous population. The model assumes no admixture with archaic hominins, however, recent studies have identified loci in Homo sapiens bearing signatures of archaic introgression. These results appear to contradict the validity of analyses invoking the serial founder effect model, but we show these two perspectives are compatible. We also propose using the serial founder effect model as a null model for determining the signature of archaic admixture in modern human genomes at different geographic and genomic scales.


Subject(s)
Genetics, Population , Genome, Human/genetics , Metagenomics , Africa , Animals , Genetic Variation , Geography , Hominidae/genetics , Humans , Linkage Disequilibrium
10.
Bioinformatics ; 29(22): 2844-51, 2013 Nov 15.
Article in English | MEDLINE | ID: mdl-24048353

ABSTRACT

MOTIVATION: Validation and reproducibility of results is a central and pressing issue in genomics. Several recent embarrassing incidents involving the irreproducibility of high-profile studies have illustrated the importance of this issue and the need for rigorous methods for the assessment of reproducibility. RESULTS: Here, we describe an existing statistical model that is very well suited to this problem. We explain its utility for assessing the reproducibility of validation experiments, and apply it to a genome-scale study of adenosine deaminase acting on RNA (ADAR)-mediated RNA editing in Drosophila. We also introduce a statistical method for planning validation experiments that will obtain the tightest reproducibility confidence limits, which, for a fixed total number of experiments, returns the optimal number of replicates for the study. AVAILABILITY: Downloadable software and a web service for both the analysis of data from a reproducibility study and for the optimal design of these studies is provided at http://ccmbweb.ccv.brown.edu/reproducibility.html .


Subject(s)
Genomics/methods , Models, Statistical , Adenosine Deaminase , Animals , Drosophila/genetics , Genome , RNA Editing , Reproducibility of Results , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...