Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 48
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Bioinformatics ; 39(1)2023 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-36342203

RESUMEN

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) continues to expand our knowledge by facilitating the study of transcriptional heterogeneity at the level of single cells. Despite this technology's utility and success in biomedical research, technical artifacts are present in scRNA-seq data. Doublets/multiplets are a type of artifact that occurs when two or more cells are tagged by the same barcode, and therefore they appear as a single cell. Because this introduces non-existent transcriptional profiles, doublets can bias and mislead downstream analysis. To address this limitation, computational methods to annotate and remove doublets form scRNA-seq datasets are needed. RESULTS: We introduce vaeda (Variational Auto-Encoder for Doublet Annotation), a new approach for computational annotation of doublets in scRNA-seq data. Vaeda integrates a variational auto-encoder and Positive-Unlabeled learning to produce doublet scores and binary doublet calls. We apply vaeda, along with seven existing doublet annotation methods, to 16 benchmark datasets and find that vaeda performs competitively in terms of doublet scores and doublet calls. Notably, vaeda outperforms other python-based methods for doublet annotation. Altogether, vaeda is a robust and competitive method for scRNA-seq doublet annotation and may be of particular interest in the context of python-based workflows. AVAILABILITY AND IMPLEMENTATION: Vaeda is available at https://github.com/kostkalab/vaeda, and the version used for the results we present here is archived at zenodo (https://doi.org/10.5281/zenodo.7199783). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Investigación Biomédica , Programas Informáticos , Análisis de la Célula Individual/métodos , Artefactos , Análisis de Secuencia de ARN/métodos , Perfilación de la Expresión Génica/métodos
2.
Bioinformatics ; 39(39 Suppl 1): i413-i422, 2023 06 30.
Artículo en Inglés | MEDLINE | ID: mdl-37387140

RESUMEN

MOTIVATION: Sequence-based deep learning approaches have been shown to predict a multitude of functional genomic readouts, including regions of open chromatin and RNA expression of genes. However, a major limitation of current methods is that model interpretation relies on computationally demanding post hoc analyses, and even then, one can often not explain the internal mechanics of highly parameterized models. Here, we introduce a deep learning architecture called totally interpretable sequence-to-function model (tiSFM). tiSFM improves upon the performance of standard multilayer convolutional models while using fewer parameters. Additionally, while tiSFM is itself technically a multilayer neural network, internal model parameters are intrinsically interpretable in terms of relevant sequence motifs. RESULTS: We analyze published open chromatin measurements across hematopoietic lineage cell-types and demonstrate that tiSFM outperforms a state-of-the-art convolutional neural network model custom-tailored to this dataset. We also show that it correctly identifies context-specific activities of transcription factors with known roles in hematopoietic differentiation, including Pax5 and Ebf1 for B-cells, and Rorc for innate lymphoid cells. tiSFM's model parameters have biologically meaningful interpretations, and we show the utility of our approach on a complex task of predicting the change in epigenetic state as a function of developmental transition. AVAILABILITY AND IMPLEMENTATION: The source code, including scripts for the analysis of key findings, can be found at https://github.com/boooooogey/ATAConv, implemented in Python.


Asunto(s)
Inmunidad Innata , Linfocitos , Cromatina , Linfocitos B , Redes Neurales de la Computación , Factores de Transcripción
3.
Bioinformatics ; 39(9)2023 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-37688563

RESUMEN

SUMMARY: DNA changes that cause premature termination codons (PTCs) represent a large fraction of clinically relevant pathogenic genomic variation. Typically, PTCs induce transcript degradation by nonsense-mediated mRNA decay (NMD) and render such changes loss-of-function alleles. However, certain PTC-containing transcripts escape NMD and can exert dominant-negative or gain-of-function (DN/GOF) effects. Therefore, systematic identification of human PTC-causing variants and their susceptibility to NMD contributes to the investigation of the role of DN/GOF alleles in human disease. Here we present aenmd, a software for annotating PTC-containing transcript-variant pairs for predicted escape from NMD. aenmd is user-friendly and self-contained. It offers functionality not currently available in other methods and is based on established and experimentally validated rules for NMD escape; the software is designed to work at scale, and to integrate seamlessly with existing analysis workflows. We applied aenmd to variants in the gnomAD, Clinvar, and GWAS catalog databases and report the prevalence of human PTC-causing variants in these databases, and the subset of these variants that could exert DN/GOF effects via NMD escape. AVAILABILITY AND IMPLEMENTATION: aenmd is implemented in the R programming language. Code is available on GitHub as an R-package (github.com/kostkalab/aenmd.git), and as a containerized command-line interface (github.com/kostkalab/aenmd_cli.git).


Asunto(s)
Codón sin Sentido , Degradación de ARNm Mediada por Codón sin Sentido , Humanos
4.
Bioinformatics ; 38(10): 2749-2756, 2022 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-35561207

RESUMEN

MOTIVATION: Single-cell RNA-seq analysis has emerged as a powerful tool for understanding inter-cellular heterogeneity. Due to the inherent noise of the data, computational techniques often rely on dimensionality reduction (DR) as both a pre-processing step and an analysis tool. Ideally, DR should preserve the biological information while discarding the noise. However, if the DR is to be used directly to gain biological insight it must also be interpretable-that is the individual dimensions of the reduction should correspond to specific biological variables such as cell-type identity or pathway activity. Maximizing biological interpretability necessitates making assumption about the data structures and the choice of the model is critical. RESULTS: We present a new probabilistic single-cell factor analysis model, Non-negative Independent Factor Analysis (NIFA), that incorporates different interpretability inducing assumptions into a single modeling framework. The key advantage of our NIFA model is that it simultaneously models uni- and multi-modal latent factors, and thus isolates discrete cell-type identity and continuous pathway activity into separate components. We apply our approach to a range of datasets where cell-type identity is known, and we show that NIFA-derived factors outperform results from ICA, PCA, NMF and scCoGAPS (an NMF method designed for single-cell data) in terms of disentangling biological sources of variation. Studying an immunotherapy dataset in detail, we show that NIFA is able to reproduce and refine previous findings in a single analysis framework and enables the discovery of new clinically relevant cell states. AVAILABILITY AND IMPLEMENTATION: NFIA is a R package which is freely available at GitHub (https://github.com/wgmao/NIFA). The test dataset is archived at https://zenodo.org/record/6286646. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de la Célula Individual , Análisis Factorial , Análisis de Secuencia de ARN , Programas Informáticos
5.
Genomics ; 114(1): 278-291, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-34942352

RESUMEN

Mammalian nephrons originate from a population of nephron progenitor cells, and changes in these cells' transcriptomes contribute to the cessation of nephrogenesis, an important determinant of nephron number. To characterize microRNA (miRNA) expression and identify putative cis-regulatory regions, we collected nephron progenitor cells from mouse kidneys at embryonic day 14.5 and postnatal day zero and assayed small RNA expression and transposase-accessible chromatin. We detect expression of 1104 miRNA (114 with expression changes), and 46,374 chromatin accessible regions (2103 with changes in accessibility). Genome-wide, our data highlight processes like cellular differentiation, cell migration, extracellular matrix interactions, and developmental signaling pathways. Furthermore, they identify new candidate cis-regulatory elements for Eya1 and Pax8, both genes with a role in nephron progenitor cell differentiation. Finally, we associate expression-changing miRNAs, including let-7-5p, miR-125b-5p, miR-181a-2-3p, and miR-9-3p, with candidate cis-regulatory elements and target genes. These analyses highlight new putative cis-regulatory loci for miRNA in nephron progenitors.


Asunto(s)
Cromatina , MicroARNs , Animales , Diferenciación Celular/genética , Cromatina/genética , Cromatina/metabolismo , Riñón/metabolismo , Mamíferos/genética , Ratones , MicroARNs/genética , MicroARNs/metabolismo , Nefronas/metabolismo , Células Madre
6.
Bioinformatics ; 36(4): 1150-1158, 2020 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-31501871

RESUMEN

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) technologies enable the study of transcriptional heterogeneity at the resolution of individual cells and have an increasing impact on biomedical research. However, it is known that these methods sometimes wrongly consider two or more cells as single cells, and that a number of so-called doublets is present in the output of such experiments. Treating doublets as single cells in downstream analyses can severely bias a study's conclusions, and therefore computational strategies for the identification of doublets are needed. RESULTS: With scds, we propose two new approaches for in silico doublet identification: Co-expression based doublet scoring (cxds) and binary classification based doublet scoring (bcds). The co-expression based approach, cxds, utilizes binarized (absence/presence) gene expression data and, employing a binomial model for the co-expression of pairs of genes, yields interpretable doublet annotations. bcds, on the other hand, uses a binary classification approach to discriminate artificial doublets from original data. We apply our methods and existing computational doublet identification approaches to four datasets with experimental doublet annotations and find that our methods perform at least as well as the state of the art, at comparably little computational cost. We observe appreciable differences between methods and across datasets and that no approach dominates all others. In summary, scds presents a scalable, competitive approach that allows for doublet annotation of datasets with thousands of cells in a matter of seconds. AVAILABILITY AND IMPLEMENTATION: scds is implemented as a Bioconductor R package (doi: 10.18129/B9.bioc.scds). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
ARN , Programas Informáticos , Secuencia de Bases , Análisis de Secuencia de ARN , Análisis de la Célula Individual
7.
FASEB J ; 34(4): 5782-5799, 2020 04.
Artículo en Inglés | MEDLINE | ID: mdl-32141129

RESUMEN

Low nephron number results in an increased risk of developing hypertension and chronic kidney disease. Intrauterine growth restriction is associated with a nephron deficit in humans, and is commonly caused by placental insufficiency, which results in fetal hypoxia. The underlying mechanisms by which hypoxia impacts kidney development are poorly understood. microRNA-210 is the most consistently induced microRNA in hypoxia and is known to promote cell survival in a hypoxic environment. In this study, the role of microRNA-210 in kidney development was evaluated using a global microRNA-210 knockout mouse. A male-specific 35% nephron deficit in microRNA-210 knockout mice was observed. Wnt/ß-catenin signaling, a pathway crucial for nephron differentiation, was misregulated in male kidneys with increased expression of the canonical Wnt target lymphoid enhancer binding factor 1. This coincided with increased expression of caspase-8-associated protein 2, a known microRNA-210 target and apoptosis signal transducer. Together, these data are consistent with a sex-specific requirement for microRNA-210 in kidney development.


Asunto(s)
Diferenciación Celular , Hipoxia/fisiopatología , MicroARNs/genética , Nefronas/citología , Organogénesis , Animales , Apoptosis , Femenino , Masculino , Ratones , Ratones Noqueados , Nefronas/metabolismo
8.
Am J Respir Crit Care Med ; 201(8): 934-945, 2020 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-31834999

RESUMEN

Rationale: The role of FSTL-1 (follistatin-like 1) in lung homeostasis is unknown.Objectives: We aimed to define the impact of FSTL-1 attenuation on lung structure and function and to identify FSTL-1-regulated transcriptional pathways in the lung. Further, we aimed to analyze the association of FSTL-1 SNPs with lung disease.Methods: FSTL-1 hypomorphic (FSTL-1 Hypo) mice underwent lung morphometry, pulmonary function testing, and micro-computed tomography. Fstl1 expression was determined in wild-type lung cell populations from three independent research groups. RNA sequencing of wild-type and FSTL-1 Hypo mice identified FSTL-1-regulated gene expression, followed by validation and mechanistic in vitro examination. FSTL1 SNP analysis was performed in the COPDGene (Genetic Epidemiology of Chronic Obstructive Pulmonary Disease) cohort.Measurements and Main Results: FSTL-1 Hypo mice developed spontaneous emphysema, independent of smoke exposure. Fstl1 is highly expressed in the lung by mesenchymal and endothelial cells but not immune cells. RNA sequencing of whole lung identified 33 FSTL-1-regulated genes, including Nr4a1, an orphan nuclear hormone receptor that negatively regulates NF-κB (nuclear factor-κB) signaling. In vitro, recombinant FSTL-1 treatment of macrophages attenuated NF-κB p65 phosphorylation in an Nr4a1-dependent manner. Within the COPDGene cohort, several SNPs in the FSTL1 region corresponded to chronic obstructive pulmonary disease and lung function.Conclusions: This work identifies a novel role for FSTL-1 protecting against emphysema development independent of smoke exposure. This FSTL-1-deficient emphysema implicates regulation of immune tolerance in lung macrophages through Nr4a1. Further study of the mechanisms involving FSTL-1 in lung homeostasis, immune regulation, and NF-κB signaling may provide additional insight into the pathophysiology of emphysema and inflammatory lung diseases.


Asunto(s)
Proteínas Relacionadas con la Folistatina/genética , Pulmón/diagnóstico por imagen , Enfisema Pulmonar/genética , Humo/efectos adversos , Animales , Células Endoteliales/metabolismo , Proteínas Relacionadas con la Folistatina/farmacología , Regulación de la Expresión Génica , Técnicas de Silenciamiento del Gen , Humanos , Técnicas In Vitro , Pulmón/metabolismo , Macrófagos/efectos de los fármacos , Macrófagos/metabolismo , Ratones , Mutación , Miembro 1 del Grupo A de la Subfamilia 4 de Receptores Nucleares/efectos de los fármacos , Miembro 1 del Grupo A de la Subfamilia 4 de Receptores Nucleares/metabolismo , Fosforilación/efectos de los fármacos , Polimorfismo de Nucleótido Simple , Tomografía Computarizada por Tomografía de Emisión de Positrones , Enfermedad Pulmonar Obstructiva Crónica/genética , Enfisema Pulmonar/diagnóstico por imagen , Enfisema Pulmonar/metabolismo , Tomografía Computarizada por Tomografía Computarizada de Emisión de Fotón Único , Nicotiana , Factor de Transcripción ReIA/efectos de los fármacos , Factor de Transcripción ReIA/metabolismo , Microtomografía por Rayos X
9.
Bioinformatics ; 35(14): i596-i604, 2019 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-31510670

RESUMEN

MOTIVATION: MicroRNAs (miRNAs) are important non-coding post-transcriptional regulators that are involved in many biological processes and human diseases. Individual miRNAs may regulate hundreds of genes, giving rise to a complex gene regulatory network in which transcripts carrying miRNA binding sites act as competing endogenous RNAs (ceRNAs). Several methods for the analysis of ceRNA interactions exist, but these do often not adjust for statistical confounders or address the problem that more than one miRNA interacts with a target transcript. RESULTS: We present SPONGE, a method for the fast construction of ceRNA networks. SPONGE uses 'multiple sensitivity correlation', a newly defined measure for which we can estimate a distribution under a null hypothesis. SPONGE can accurately quantify the contribution of multiple miRNAs to a ceRNA interaction with a probabilistic model that addresses previously neglected confounding factors and allows fast P-value calculation, thus outperforming existing approaches. We applied SPONGE to paired miRNA and gene expression data from The Cancer Genome Atlas for studying global effects of miRNA-mediated cross-talk. Our results highlight already established and novel protein-coding and non-coding ceRNAs which could serve as biomarkers in cancer. AVAILABILITY AND IMPLEMENTATION: SPONGE is available as an R/Bioconductor package (doi: 10.18129/B9.bioc.SPONGE). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Neoplasias , ARN/genética , Redes Reguladoras de Genes , Humanos
10.
Nature ; 513(7517): 195-201, 2014 Sep 11.
Artículo en Inglés | MEDLINE | ID: mdl-25209798

RESUMEN

Gibbons are small arboreal apes that display an accelerated rate of evolutionary chromosomal rearrangement and occupy a key node in the primate phylogeny between Old World monkeys and great apes. Here we present the assembly and analysis of a northern white-cheeked gibbon (Nomascus leucogenys) genome. We describe the propensity for a gibbon-specific retrotransposon (LAVA) to insert into chromosome segregation genes and alter transcription by providing a premature termination site, suggesting a possible molecular mechanism for the genome plasticity of the gibbon lineage. We further show that the gibbon genera (Nomascus, Hylobates, Hoolock and Symphalangus) experienced a near-instantaneous radiation ∼5 million years ago, coincident with major geographical changes in southeast Asia that caused cycles of habitat compression and expansion. Finally, we identify signatures of positive selection in genes important for forelimb development (TBX5) and connective tissues (COL1A1) that may have been involved in the adaptation of gibbons to their arboreal habitat.


Asunto(s)
Genoma/genética , Hylobates/clasificación , Hylobates/genética , Cariotipo , Filogenia , Animales , Evolución Molecular , Hominidae/clasificación , Hominidae/genética , Humanos , Datos de Secuencia Molecular , Retroelementos/genética , Selección Genética , Terminación de la Transcripción Genética
11.
J Am Soc Nephrol ; 30(7): 1192-1205, 2019 07.
Artículo en Inglés | MEDLINE | ID: mdl-31142573

RESUMEN

BACKGROUND: Nephron progenitors, the cell population that give rise to the functional unit of the kidney, are metabolically active and self-renew under glycolytic conditions. A switch from glycolysis to mitochondrial respiration drives these cells toward differentiation, but the mechanisms that control this switch are poorly defined. Studies have demonstrated that kidney formation is highly dependent on oxygen concentration, which is largely regulated by von Hippel-Lindau (VHL; a protein component of a ubiquitin ligase complex) and hypoxia-inducible factors (a family of transcription factors activated by hypoxia). METHODS: To explore VHL as a regulator defining nephron progenitor self-renewal versus differentiation, we bred Six2-TGCtg mice with VHLlox/lox mice to generate mice with a conditional deletion of VHL from Six2+ nephron progenitors. We used histologic, immunofluorescence, RNA sequencing, and metabolic assays to characterize kidneys from these mice and controls during development and up to postnatal day 21. RESULTS: By embryonic day 15.5, kidneys of nephron progenitor cell-specific VHL knockout mice begin to exhibit reduced maturation of nephron progenitors. Compared with controls, VHL knockout kidneys are smaller and developmentally delayed by postnatal day 1, and have about half the number of glomeruli at postnatal day 21. VHL knockout nephron progenitors also exhibit persistent Six2 and Wt1 expression, as well as decreased mitochondrial respiration and prolonged reliance on glycolysis. CONCLUSIONS: Our findings identify a novel role for VHL in mediating nephron progenitor differentiation through metabolic regulation, and suggest that VHL is required for normal kidney development.


Asunto(s)
Nefronas/citología , Células Madre/citología , Proteína Supresora de Tumores del Síndrome de Von Hippel-Lindau/fisiología , Animales , Diferenciación Celular , Regulación de la Expresión Génica , Glucólisis , Proteínas de Homeodominio/fisiología , Ratones , Mitocondrias/metabolismo , Factores de Transcripción/fisiología
12.
BMC Genomics ; 20(1): 511, 2019 Jun 20.
Artículo en Inglés | MEDLINE | ID: mdl-31221079

RESUMEN

BACKGROUND: Non-coding gene regulatory enhancers are essential to transcription in mammalian cells. As a result, a large variety of experimental and computational strategies have been developed to identify cis-regulatory enhancer sequences. Given the differences in the biological signals assayed, some variation in the enhancers identified by different methods is expected; however, the concordance of enhancers identified by different methods has not been comprehensively evaluated. This is critically needed, since in practice, most studies consider enhancers identified by only a single method. Here, we compare enhancer sets from eleven representative strategies in four biological contexts. RESULTS: All sets we evaluated overlap significantly more than expected by chance; however, there is significant dissimilarity in their genomic, evolutionary, and functional characteristics, both at the element and base-pair level, within each context. The disagreement is sufficient to influence interpretation of candidate SNPs from GWAS studies, and to lead to disparate conclusions about enhancer and disease mechanisms. Most regions identified as enhancers are supported by only one method, and we find limited evidence that regions identified by multiple methods are better candidates than those identified by a single method. As a result, we cannot recommend the use of any single enhancer identification strategy in all settings. CONCLUSIONS: Our results highlight the inherent complexity of enhancer biology and identify an important challenge to mapping the genetic architecture of complex disease. Greater appreciation of how the diverse enhancer identification strategies in use today relate to the dynamic activity of gene regulatory regions is needed to enable robust and reproducible results.


Asunto(s)
Elementos de Facilitación Genéticos , Línea Celular , Bases de Datos Genéticas , Evolución Molecular , Regulación de la Expresión Génica , Genómica , Humanos , Hígado/metabolismo , Anotación de Secuencia Molecular , Miocardio/metabolismo
13.
Am J Physiol Renal Physiol ; 316(5): F993-F1005, 2019 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-30838872

RESUMEN

We have previously demonstrated that loss of miR-17~92 in nephron progenitors in a mouse model results in renal hypodysplasia and chronic kidney disease. Clinically, decreased congenital nephron endowment because of renal hypodysplasia is associated with an increased risk of hypertension and chronic kidney disease, and this is at least partly dependent on the self-renewal of nephron progenitors. Here, we present evidence for a novel molecular mechanism regulating the self-renewal of nephron progenitors and congenital nephron endowment by the highly conserved miR-17~92 cluster. Whole transcriptome sequencing revealed that nephron progenitors lacking this cluster demonstrated increased Cftr expression. We showed that one member of the cluster, miR-19b, is sufficient to repress Cftr expression in vitro and that perturbation of Cftr activity in nephron progenitors results in impaired proliferation. Together, these data suggest that miR-19b regulates Cftr expression in nephron progenitors, with this interaction playing a role in appropriate nephron progenitor self-renewal during kidney development to generate normal nephron endowment.


Asunto(s)
Regulador de Conductancia de Transmembrana de Fibrosis Quística/metabolismo , MicroARNs/metabolismo , Nefronas/metabolismo , Células Madre/metabolismo , Animales , Movimiento Celular , Proliferación Celular , Autorrenovación de las Células , Células Cultivadas , Regulador de Conductancia de Transmembrana de Fibrosis Quística/genética , Regulación del Desarrollo de la Expresión Génica , Ratones Endogámicos C57BL , Ratones Noqueados , MicroARNs/genética , Nefronas/embriología , Organogénesis , Transducción de Señal
14.
Mol Biol Evol ; 35(8): 2034-2045, 2018 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-29897475

RESUMEN

Some of the fastest evolving regions of the human genome are conserved noncoding elements with many human-specific DNA substitutions. These human accelerated regions (HARs) are enriched nearby regulatory genes, and several HARs function as developmental enhancers. To investigate if this evolutionary signature is unique to humans, we quantified evidence of accelerated substitutions in conserved genomic elements across multiple lineages and applied this approach simultaneously to the genomes of five apes: human, chimpanzee, gorilla, orangutan, and gibbon. We find roughly similar numbers and genomic distributions of lineage-specific accelerated regions (linARs) in all five apes. In particular, apes share an enrichment of linARs in regulatory DNA nearby genes involved in development, especially transcription factors and other regulators. Many developmental loci harbor clusters of nonoverlapping linARs from multiple apes, suggesting that accelerated evolution in each species affected distinct regulatory elements that control a shared set of developmental pathways. Our statistical tests distinguish between GC-biased and unbiased accelerated substitution rates, allowing us to quantify the roles of different evolutionary forces in creating linARs. We find evidence of GC-biased gene conversion in each ape, but unbiased acceleration consistent with positive selection or loss of constraint is more common in all five lineages. It therefore appears that similar evolutionary processes created independent accelerated regions in the genomes of different apes, and that these lineage-specific changes to conserved noncoding sequences may have differentially altered expression of a core set of developmental genes across ape evolution.


Asunto(s)
Evolución Molecular , Hominidae/genética , Algoritmos , Animales , Simulación por Computador , Conversión Génica , Hominidae/crecimiento & desarrollo , Humanos , Modelos Genéticos , Selección Genética
15.
Bioinformatics ; 34(13): i79-i88, 2018 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-29950006

RESUMEN

Motivation: Genome-wide transcriptome sequencing applied to single cells (scRNA-seq) is rapidly becoming an assay of choice across many fields of biological and biomedical research. Scientific objectives often revolve around discovery or characterization of types or sub-types of cells, and therefore, obtaining accurate cell-cell similarities from scRNA-seq data is a critical step in many studies. While rapid advances are being made in the development of tools for scRNA-seq data analysis, few approaches exist that explicitly address this task. Furthermore, abundance and type of noise present in scRNA-seq datasets suggest that application of generic methods, or of methods developed for bulk RNA-seq data, is likely suboptimal. Results: Here, we present RAFSIL, a random forest based approach to learn cell-cell similarities from scRNA-seq data. RAFSIL implements a two-step procedure, where feature construction geared towards scRNA-seq data is followed by similarity learning. It is designed to be adaptable and expandable, and RAFSIL similarities can be used for typical exploratory data analysis tasks like dimension reduction, visualization and clustering. We show that our approach compares favorably with current methods across a diverse collection of datasets, and that it can be used to detect and highlight unwanted technical variation in scRNA-seq datasets in situations where other methods fail. Overall, RAFSIL implements a flexible approach yielding a useful tool that improves the analysis of scRNA-seq data. Availability and implementation: The RAFSIL R package is available at www.kostkalab.net/software.html. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Programas Informáticos , Análisis por Conglomerados
16.
PLoS Genet ; 12(2): e1005788, 2016 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-26862916

RESUMEN

Elimination of the proliferating germline extends lifespan in C. elegans. This phenomenon provides a unique platform to understand how complex metazoans retain metabolic homeostasis when challenged with major physiological perturbations. Here, we demonstrate that two conserved transcription regulators essential for the longevity of germline-less adults, DAF-16/FOXO3A and TCER-1/TCERG1, concurrently enhance the expression of multiple genes involved in lipid synthesis and breakdown, and that both gene classes promote longevity. Lipidomic analyses revealed that key lipogenic processes, including de novo fatty acid synthesis, triglyceride production, desaturation and elongation, are augmented upon germline removal. Our data suggest that lipid anabolic and catabolic pathways are coordinately augmented in response to germline loss, and this metabolic shift helps preserve lipid homeostasis. DAF-16 and TCER-1 also perform essential inhibitory functions in germline-ablated animals. TCER-1 inhibits the somatic gene-expression program that facilitates reproduction and represses anti-longevity genes, whereas DAF-16 impedes ribosome biogenesis. Additionally, we discovered that TCER-1 is critical for optimal fertility in normal adults, suggesting that the protein acts as a switch supporting reproductive fitness or longevity depending on the presence or absence of the germline. Collectively, our data offer insights into how organisms adapt to changes in reproductive status, by utilizing the activating and repressive functions of transcription factors and coordinating fat production and degradation.


Asunto(s)
Adaptación Fisiológica , Proteínas de Caenorhabditis elegans/metabolismo , Caenorhabditis elegans/fisiología , Factores de Transcripción Forkhead/metabolismo , Células Germinativas/metabolismo , Homeostasis , Metabolismo de los Lípidos , Factores de Elongación de Péptidos/metabolismo , Animales , Dieta , Regulación hacia Abajo/genética , Ácidos Grasos/metabolismo , Fertilidad/genética , Regulación del Desarrollo de la Expresión Génica , Longevidad , Mutación/genética , Biosíntesis de Proteínas/genética , Receptores Notch/metabolismo , Reproducción , Transcriptoma/genética , Triglicéridos/metabolismo , Regulación hacia Arriba/genética
18.
Nature ; 478(7370): 476-82, 2011 Oct 12.
Artículo en Inglés | MEDLINE | ID: mdl-21993624

RESUMEN

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.


Asunto(s)
Evolución Molecular , Genoma Humano/genética , Genoma/genética , Mamíferos/genética , Animales , Enfermedad , Exones/genética , Genómica , Salud , Humanos , Anotación de Secuencia Molecular , Filogenia , ARN/clasificación , ARN/genética , Selección Genética/genética , Alineación de Secuencia , Análisis de Secuencia de ADN
19.
PLoS Genet ; 9(8): e1003684, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23966869

RESUMEN

GC-biased gene conversion (gBGC) is a recombination-associated process that favors the fixation of G/C alleles over A/T alleles. In mammals, gBGC is hypothesized to contribute to variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations, but its prevalence and general functional consequences remain poorly understood. gBGC is difficult to incorporate into models of molecular evolution and so far has primarily been studied using summary statistics from genomic comparisons. Here, we introduce a new probabilistic model that captures the joint effects of natural selection and gBGC on nucleotide substitution patterns, while allowing for correlations along the genome in these effects. We implemented our model in a computer program, called phastBias, that can accurately detect gBGC tracts about 1 kilobase or longer in simulated sequence alignments. When applied to real primate genome sequences, phastBias predicts gBGC tracts that cover roughly 0.3% of the human and chimpanzee genomes and account for 1.2% of human-chimpanzee nucleotide differences. These tracts fall in clusters, particularly in subtelomeric regions; they are enriched for recombination hotspots and fast-evolving sequences; and they display an ongoing fixation preference for G and C alleles. They are also significantly enriched for disease-associated polymorphisms, suggesting that they contribute to the fixation of deleterious alleles. The gBGC tracts provide a unique window into historical recombination processes along the human and chimpanzee lineages. They supply additional evidence of long-term conservation of megabase-scale recombination rates accompanied by rapid turnover of hotspots. Together, these findings shed new light on the evolutionary, functional, and disease implications of gBGC. The phastBias program and our predicted tracts are freely available.


Asunto(s)
Evolución Molecular , Conversión Génica/genética , Pan troglodytes/genética , Filogenia , Selección Genética , Animales , Secuencia de Bases , Mapeo Cromosómico , Genoma , Humanos , Mamíferos , Modelos Teóricos , Recombinación Genética , Alineación de Secuencia
20.
Bioinformatics ; 30(17): i408-14, 2014 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-25161227

RESUMEN

MOTIVATION: Methylation of CpG dinucleotides is a prevalent epigenetic modification that is required for proper development in vertebrates. Genome-wide DNA methylation assays have become increasingly common, and this has enabled characterization of DNA methylation in distinct stages across differentiating cellular lineages. Changes in CpG methylation are essential to cellular differentiation; however, current methods for modeling methylation dynamics do not account for the dependency structure between precursor and dependent cell types. RESULTS: We developed a continuous-time Markov chain approach, based on the observation that changes in methylation state over tissue differentiation can be modeled similarly to DNA nucleotide changes over evolutionary time. This model explicitly takes precursor to descendant relationships into account and enables inference of CpG methylation dynamics. To illustrate our method, we analyzed a high-resolution methylation map of the differentiation of mouse stem cells into several blood cell types. Our model can successfully infer unobserved CpG methylation states from observations at the same sites in related cell types (90% correct), and this approach more accurately reconstructs missing data than imputation based on neighboring CpGs (84% correct). Additionally, the single CpG resolution of our methylation dynamics estimates enabled us to show that DNA sequence context of CpG sites is informative about methylation dynamics across tissue differentiation. Finally, we identified genomic regions with clusters of highly dynamic CpGs and present a likely functional example. Our work establishes a framework for inference and modeling that is well suited to DNA methylation data, and our success suggests that other methods for analyzing DNA nucleotide substitutions will also translate to the modeling of epigenetic phenomena. AVAILABILITY AND IMPLEMENTATION: Source code is available at www.kostkalab.net/software.


Asunto(s)
Metilación de ADN , Modelos Genéticos , Animales , Secuencia de Bases , Diferenciación Celular/genética , Islas de CpG , ADN/química , ADN/metabolismo , Genómica , Cadenas de Markov , Ratones , Filogenia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA