Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 48
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36342203

RESUMO

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) continues to expand our knowledge by facilitating the study of transcriptional heterogeneity at the level of single cells. Despite this technology's utility and success in biomedical research, technical artifacts are present in scRNA-seq data. Doublets/multiplets are a type of artifact that occurs when two or more cells are tagged by the same barcode, and therefore they appear as a single cell. Because this introduces non-existent transcriptional profiles, doublets can bias and mislead downstream analysis. To address this limitation, computational methods to annotate and remove doublets form scRNA-seq datasets are needed. RESULTS: We introduce vaeda (Variational Auto-Encoder for Doublet Annotation), a new approach for computational annotation of doublets in scRNA-seq data. Vaeda integrates a variational auto-encoder and Positive-Unlabeled learning to produce doublet scores and binary doublet calls. We apply vaeda, along with seven existing doublet annotation methods, to 16 benchmark datasets and find that vaeda performs competitively in terms of doublet scores and doublet calls. Notably, vaeda outperforms other python-based methods for doublet annotation. Altogether, vaeda is a robust and competitive method for scRNA-seq doublet annotation and may be of particular interest in the context of python-based workflows. AVAILABILITY AND IMPLEMENTATION: Vaeda is available at https://github.com/kostkalab/vaeda, and the version used for the results we present here is archived at zenodo (https://doi.org/10.5281/zenodo.7199783). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Pesquisa Biomédica , Software , Análise de Célula Única/métodos , Artefatos , Análise de Sequência de RNA/métodos , Perfilação da Expressão Gênica/métodos
2.
Bioinformatics ; 39(39 Suppl 1): i413-i422, 2023 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-37387140

RESUMO

MOTIVATION: Sequence-based deep learning approaches have been shown to predict a multitude of functional genomic readouts, including regions of open chromatin and RNA expression of genes. However, a major limitation of current methods is that model interpretation relies on computationally demanding post hoc analyses, and even then, one can often not explain the internal mechanics of highly parameterized models. Here, we introduce a deep learning architecture called totally interpretable sequence-to-function model (tiSFM). tiSFM improves upon the performance of standard multilayer convolutional models while using fewer parameters. Additionally, while tiSFM is itself technically a multilayer neural network, internal model parameters are intrinsically interpretable in terms of relevant sequence motifs. RESULTS: We analyze published open chromatin measurements across hematopoietic lineage cell-types and demonstrate that tiSFM outperforms a state-of-the-art convolutional neural network model custom-tailored to this dataset. We also show that it correctly identifies context-specific activities of transcription factors with known roles in hematopoietic differentiation, including Pax5 and Ebf1 for B-cells, and Rorc for innate lymphoid cells. tiSFM's model parameters have biologically meaningful interpretations, and we show the utility of our approach on a complex task of predicting the change in epigenetic state as a function of developmental transition. AVAILABILITY AND IMPLEMENTATION: The source code, including scripts for the analysis of key findings, can be found at https://github.com/boooooogey/ATAConv, implemented in Python.


Assuntos
Imunidade Inata , Linfócitos , Cromatina , Linfócitos B , Redes Neurais de Computação , Fatores de Transcrição
3.
Bioinformatics ; 39(9)2023 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-37688563

RESUMO

SUMMARY: DNA changes that cause premature termination codons (PTCs) represent a large fraction of clinically relevant pathogenic genomic variation. Typically, PTCs induce transcript degradation by nonsense-mediated mRNA decay (NMD) and render such changes loss-of-function alleles. However, certain PTC-containing transcripts escape NMD and can exert dominant-negative or gain-of-function (DN/GOF) effects. Therefore, systematic identification of human PTC-causing variants and their susceptibility to NMD contributes to the investigation of the role of DN/GOF alleles in human disease. Here we present aenmd, a software for annotating PTC-containing transcript-variant pairs for predicted escape from NMD. aenmd is user-friendly and self-contained. It offers functionality not currently available in other methods and is based on established and experimentally validated rules for NMD escape; the software is designed to work at scale, and to integrate seamlessly with existing analysis workflows. We applied aenmd to variants in the gnomAD, Clinvar, and GWAS catalog databases and report the prevalence of human PTC-causing variants in these databases, and the subset of these variants that could exert DN/GOF effects via NMD escape. AVAILABILITY AND IMPLEMENTATION: aenmd is implemented in the R programming language. Code is available on GitHub as an R-package (github.com/kostkalab/aenmd.git), and as a containerized command-line interface (github.com/kostkalab/aenmd_cli.git).


Assuntos
Códon sem Sentido , Degradação do RNAm Mediada por Códon sem Sentido , Humanos
4.
Bioinformatics ; 38(10): 2749-2756, 2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35561207

RESUMO

MOTIVATION: Single-cell RNA-seq analysis has emerged as a powerful tool for understanding inter-cellular heterogeneity. Due to the inherent noise of the data, computational techniques often rely on dimensionality reduction (DR) as both a pre-processing step and an analysis tool. Ideally, DR should preserve the biological information while discarding the noise. However, if the DR is to be used directly to gain biological insight it must also be interpretable-that is the individual dimensions of the reduction should correspond to specific biological variables such as cell-type identity or pathway activity. Maximizing biological interpretability necessitates making assumption about the data structures and the choice of the model is critical. RESULTS: We present a new probabilistic single-cell factor analysis model, Non-negative Independent Factor Analysis (NIFA), that incorporates different interpretability inducing assumptions into a single modeling framework. The key advantage of our NIFA model is that it simultaneously models uni- and multi-modal latent factors, and thus isolates discrete cell-type identity and continuous pathway activity into separate components. We apply our approach to a range of datasets where cell-type identity is known, and we show that NIFA-derived factors outperform results from ICA, PCA, NMF and scCoGAPS (an NMF method designed for single-cell data) in terms of disentangling biological sources of variation. Studying an immunotherapy dataset in detail, we show that NIFA is able to reproduce and refine previous findings in a single analysis framework and enables the discovery of new clinically relevant cell states. AVAILABILITY AND IMPLEMENTATION: NFIA is a R package which is freely available at GitHub (https://github.com/wgmao/NIFA). The test dataset is archived at https://zenodo.org/record/6286646. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Análise Fatorial , Análise de Sequência de RNA , Software
5.
Genomics ; 114(1): 278-291, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34942352

RESUMO

Mammalian nephrons originate from a population of nephron progenitor cells, and changes in these cells' transcriptomes contribute to the cessation of nephrogenesis, an important determinant of nephron number. To characterize microRNA (miRNA) expression and identify putative cis-regulatory regions, we collected nephron progenitor cells from mouse kidneys at embryonic day 14.5 and postnatal day zero and assayed small RNA expression and transposase-accessible chromatin. We detect expression of 1104 miRNA (114 with expression changes), and 46,374 chromatin accessible regions (2103 with changes in accessibility). Genome-wide, our data highlight processes like cellular differentiation, cell migration, extracellular matrix interactions, and developmental signaling pathways. Furthermore, they identify new candidate cis-regulatory elements for Eya1 and Pax8, both genes with a role in nephron progenitor cell differentiation. Finally, we associate expression-changing miRNAs, including let-7-5p, miR-125b-5p, miR-181a-2-3p, and miR-9-3p, with candidate cis-regulatory elements and target genes. These analyses highlight new putative cis-regulatory loci for miRNA in nephron progenitors.


Assuntos
Cromatina , MicroRNAs , Animais , Diferenciação Celular/genética , Cromatina/genética , Cromatina/metabolismo , Rim/metabolismo , Mamíferos/genética , Camundongos , MicroRNAs/genética , MicroRNAs/metabolismo , Néfrons/metabolismo , Células-Tronco
6.
Bioinformatics ; 36(4): 1150-1158, 2020 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-31501871

RESUMO

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) technologies enable the study of transcriptional heterogeneity at the resolution of individual cells and have an increasing impact on biomedical research. However, it is known that these methods sometimes wrongly consider two or more cells as single cells, and that a number of so-called doublets is present in the output of such experiments. Treating doublets as single cells in downstream analyses can severely bias a study's conclusions, and therefore computational strategies for the identification of doublets are needed. RESULTS: With scds, we propose two new approaches for in silico doublet identification: Co-expression based doublet scoring (cxds) and binary classification based doublet scoring (bcds). The co-expression based approach, cxds, utilizes binarized (absence/presence) gene expression data and, employing a binomial model for the co-expression of pairs of genes, yields interpretable doublet annotations. bcds, on the other hand, uses a binary classification approach to discriminate artificial doublets from original data. We apply our methods and existing computational doublet identification approaches to four datasets with experimental doublet annotations and find that our methods perform at least as well as the state of the art, at comparably little computational cost. We observe appreciable differences between methods and across datasets and that no approach dominates all others. In summary, scds presents a scalable, competitive approach that allows for doublet annotation of datasets with thousands of cells in a matter of seconds. AVAILABILITY AND IMPLEMENTATION: scds is implemented as a Bioconductor R package (doi: 10.18129/B9.bioc.scds). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
RNA , Software , Sequência de Bases , Análise de Sequência de RNA , Análise de Célula Única
7.
FASEB J ; 34(4): 5782-5799, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-32141129

RESUMO

Low nephron number results in an increased risk of developing hypertension and chronic kidney disease. Intrauterine growth restriction is associated with a nephron deficit in humans, and is commonly caused by placental insufficiency, which results in fetal hypoxia. The underlying mechanisms by which hypoxia impacts kidney development are poorly understood. microRNA-210 is the most consistently induced microRNA in hypoxia and is known to promote cell survival in a hypoxic environment. In this study, the role of microRNA-210 in kidney development was evaluated using a global microRNA-210 knockout mouse. A male-specific 35% nephron deficit in microRNA-210 knockout mice was observed. Wnt/ß-catenin signaling, a pathway crucial for nephron differentiation, was misregulated in male kidneys with increased expression of the canonical Wnt target lymphoid enhancer binding factor 1. This coincided with increased expression of caspase-8-associated protein 2, a known microRNA-210 target and apoptosis signal transducer. Together, these data are consistent with a sex-specific requirement for microRNA-210 in kidney development.


Assuntos
Diferenciação Celular , Hipóxia/fisiopatologia , MicroRNAs/genética , Néfrons/citologia , Organogênese , Animais , Apoptose , Feminino , Masculino , Camundongos , Camundongos Knockout , Néfrons/metabolismo
8.
Am J Respir Crit Care Med ; 201(8): 934-945, 2020 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-31834999

RESUMO

Rationale: The role of FSTL-1 (follistatin-like 1) in lung homeostasis is unknown.Objectives: We aimed to define the impact of FSTL-1 attenuation on lung structure and function and to identify FSTL-1-regulated transcriptional pathways in the lung. Further, we aimed to analyze the association of FSTL-1 SNPs with lung disease.Methods: FSTL-1 hypomorphic (FSTL-1 Hypo) mice underwent lung morphometry, pulmonary function testing, and micro-computed tomography. Fstl1 expression was determined in wild-type lung cell populations from three independent research groups. RNA sequencing of wild-type and FSTL-1 Hypo mice identified FSTL-1-regulated gene expression, followed by validation and mechanistic in vitro examination. FSTL1 SNP analysis was performed in the COPDGene (Genetic Epidemiology of Chronic Obstructive Pulmonary Disease) cohort.Measurements and Main Results: FSTL-1 Hypo mice developed spontaneous emphysema, independent of smoke exposure. Fstl1 is highly expressed in the lung by mesenchymal and endothelial cells but not immune cells. RNA sequencing of whole lung identified 33 FSTL-1-regulated genes, including Nr4a1, an orphan nuclear hormone receptor that negatively regulates NF-κB (nuclear factor-κB) signaling. In vitro, recombinant FSTL-1 treatment of macrophages attenuated NF-κB p65 phosphorylation in an Nr4a1-dependent manner. Within the COPDGene cohort, several SNPs in the FSTL1 region corresponded to chronic obstructive pulmonary disease and lung function.Conclusions: This work identifies a novel role for FSTL-1 protecting against emphysema development independent of smoke exposure. This FSTL-1-deficient emphysema implicates regulation of immune tolerance in lung macrophages through Nr4a1. Further study of the mechanisms involving FSTL-1 in lung homeostasis, immune regulation, and NF-κB signaling may provide additional insight into the pathophysiology of emphysema and inflammatory lung diseases.


Assuntos
Proteínas Relacionadas à Folistatina/genética , Pulmão/diagnóstico por imagem , Enfisema Pulmonar/genética , Fumaça/efeitos adversos , Animais , Células Endoteliais/metabolismo , Proteínas Relacionadas à Folistatina/farmacologia , Regulação da Expressão Gênica , Técnicas de Silenciamento de Genes , Humanos , Técnicas In Vitro , Pulmão/metabolismo , Macrófagos/efeitos dos fármacos , Macrófagos/metabolismo , Camundongos , Mutação , Membro 1 do Grupo A da Subfamília 4 de Receptores Nucleares/efeitos dos fármacos , Membro 1 do Grupo A da Subfamília 4 de Receptores Nucleares/metabolismo , Fosforilação/efeitos dos fármacos , Polimorfismo de Nucleotídeo Único , Tomografia por Emissão de Pósitrons combinada à Tomografia Computadorizada , Doença Pulmonar Obstrutiva Crônica/genética , Enfisema Pulmonar/diagnóstico por imagem , Enfisema Pulmonar/metabolismo , Tomografia Computadorizada com Tomografia Computadorizada de Emissão de Fóton Único , Nicotiana , Fator de Transcrição RelA/efeitos dos fármacos , Fator de Transcrição RelA/metabolismo , Microtomografia por Raio-X
9.
Bioinformatics ; 35(14): i596-i604, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31510670

RESUMO

MOTIVATION: MicroRNAs (miRNAs) are important non-coding post-transcriptional regulators that are involved in many biological processes and human diseases. Individual miRNAs may regulate hundreds of genes, giving rise to a complex gene regulatory network in which transcripts carrying miRNA binding sites act as competing endogenous RNAs (ceRNAs). Several methods for the analysis of ceRNA interactions exist, but these do often not adjust for statistical confounders or address the problem that more than one miRNA interacts with a target transcript. RESULTS: We present SPONGE, a method for the fast construction of ceRNA networks. SPONGE uses 'multiple sensitivity correlation', a newly defined measure for which we can estimate a distribution under a null hypothesis. SPONGE can accurately quantify the contribution of multiple miRNAs to a ceRNA interaction with a probabilistic model that addresses previously neglected confounding factors and allows fast P-value calculation, thus outperforming existing approaches. We applied SPONGE to paired miRNA and gene expression data from The Cancer Genome Atlas for studying global effects of miRNA-mediated cross-talk. Our results highlight already established and novel protein-coding and non-coding ceRNAs which could serve as biomarkers in cancer. AVAILABILITY AND IMPLEMENTATION: SPONGE is available as an R/Bioconductor package (doi: 10.18129/B9.bioc.SPONGE). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Neoplasias , RNA/genética , Redes Reguladoras de Genes , Humanos
10.
Nature ; 513(7517): 195-201, 2014 Sep 11.
Artigo em Inglês | MEDLINE | ID: mdl-25209798

RESUMO

Gibbons are small arboreal apes that display an accelerated rate of evolutionary chromosomal rearrangement and occupy a key node in the primate phylogeny between Old World monkeys and great apes. Here we present the assembly and analysis of a northern white-cheeked gibbon (Nomascus leucogenys) genome. We describe the propensity for a gibbon-specific retrotransposon (LAVA) to insert into chromosome segregation genes and alter transcription by providing a premature termination site, suggesting a possible molecular mechanism for the genome plasticity of the gibbon lineage. We further show that the gibbon genera (Nomascus, Hylobates, Hoolock and Symphalangus) experienced a near-instantaneous radiation ∼5 million years ago, coincident with major geographical changes in southeast Asia that caused cycles of habitat compression and expansion. Finally, we identify signatures of positive selection in genes important for forelimb development (TBX5) and connective tissues (COL1A1) that may have been involved in the adaptation of gibbons to their arboreal habitat.


Assuntos
Genoma/genética , Hylobates/classificação , Hylobates/genética , Cariótipo , Filogenia , Animais , Evolução Molecular , Hominidae/classificação , Hominidae/genética , Humanos , Dados de Sequência Molecular , Retroelementos/genética , Seleção Genética , Terminação da Transcrição Genética
11.
J Am Soc Nephrol ; 30(7): 1192-1205, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31142573

RESUMO

BACKGROUND: Nephron progenitors, the cell population that give rise to the functional unit of the kidney, are metabolically active and self-renew under glycolytic conditions. A switch from glycolysis to mitochondrial respiration drives these cells toward differentiation, but the mechanisms that control this switch are poorly defined. Studies have demonstrated that kidney formation is highly dependent on oxygen concentration, which is largely regulated by von Hippel-Lindau (VHL; a protein component of a ubiquitin ligase complex) and hypoxia-inducible factors (a family of transcription factors activated by hypoxia). METHODS: To explore VHL as a regulator defining nephron progenitor self-renewal versus differentiation, we bred Six2-TGCtg mice with VHLlox/lox mice to generate mice with a conditional deletion of VHL from Six2+ nephron progenitors. We used histologic, immunofluorescence, RNA sequencing, and metabolic assays to characterize kidneys from these mice and controls during development and up to postnatal day 21. RESULTS: By embryonic day 15.5, kidneys of nephron progenitor cell-specific VHL knockout mice begin to exhibit reduced maturation of nephron progenitors. Compared with controls, VHL knockout kidneys are smaller and developmentally delayed by postnatal day 1, and have about half the number of glomeruli at postnatal day 21. VHL knockout nephron progenitors also exhibit persistent Six2 and Wt1 expression, as well as decreased mitochondrial respiration and prolonged reliance on glycolysis. CONCLUSIONS: Our findings identify a novel role for VHL in mediating nephron progenitor differentiation through metabolic regulation, and suggest that VHL is required for normal kidney development.


Assuntos
Néfrons/citologia , Células-Tronco/citologia , Proteína Supressora de Tumor Von Hippel-Lindau/fisiologia , Animais , Diferenciação Celular , Regulação da Expressão Gênica , Glicólise , Proteínas de Homeodomínio/fisiologia , Camundongos , Mitocôndrias/metabolismo , Fatores de Transcrição/fisiologia
12.
BMC Genomics ; 20(1): 511, 2019 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-31221079

RESUMO

BACKGROUND: Non-coding gene regulatory enhancers are essential to transcription in mammalian cells. As a result, a large variety of experimental and computational strategies have been developed to identify cis-regulatory enhancer sequences. Given the differences in the biological signals assayed, some variation in the enhancers identified by different methods is expected; however, the concordance of enhancers identified by different methods has not been comprehensively evaluated. This is critically needed, since in practice, most studies consider enhancers identified by only a single method. Here, we compare enhancer sets from eleven representative strategies in four biological contexts. RESULTS: All sets we evaluated overlap significantly more than expected by chance; however, there is significant dissimilarity in their genomic, evolutionary, and functional characteristics, both at the element and base-pair level, within each context. The disagreement is sufficient to influence interpretation of candidate SNPs from GWAS studies, and to lead to disparate conclusions about enhancer and disease mechanisms. Most regions identified as enhancers are supported by only one method, and we find limited evidence that regions identified by multiple methods are better candidates than those identified by a single method. As a result, we cannot recommend the use of any single enhancer identification strategy in all settings. CONCLUSIONS: Our results highlight the inherent complexity of enhancer biology and identify an important challenge to mapping the genetic architecture of complex disease. Greater appreciation of how the diverse enhancer identification strategies in use today relate to the dynamic activity of gene regulatory regions is needed to enable robust and reproducible results.


Assuntos
Elementos Facilitadores Genéticos , Linhagem Celular , Bases de Dados Genéticas , Evolução Molecular , Regulação da Expressão Gênica , Genômica , Humanos , Fígado/metabolismo , Anotação de Sequência Molecular , Miocárdio/metabolismo
13.
Am J Physiol Renal Physiol ; 316(5): F993-F1005, 2019 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-30838872

RESUMO

We have previously demonstrated that loss of miR-17~92 in nephron progenitors in a mouse model results in renal hypodysplasia and chronic kidney disease. Clinically, decreased congenital nephron endowment because of renal hypodysplasia is associated with an increased risk of hypertension and chronic kidney disease, and this is at least partly dependent on the self-renewal of nephron progenitors. Here, we present evidence for a novel molecular mechanism regulating the self-renewal of nephron progenitors and congenital nephron endowment by the highly conserved miR-17~92 cluster. Whole transcriptome sequencing revealed that nephron progenitors lacking this cluster demonstrated increased Cftr expression. We showed that one member of the cluster, miR-19b, is sufficient to repress Cftr expression in vitro and that perturbation of Cftr activity in nephron progenitors results in impaired proliferation. Together, these data suggest that miR-19b regulates Cftr expression in nephron progenitors, with this interaction playing a role in appropriate nephron progenitor self-renewal during kidney development to generate normal nephron endowment.


Assuntos
Regulador de Condutância Transmembrana em Fibrose Cística/metabolismo , MicroRNAs/metabolismo , Néfrons/metabolismo , Células-Tronco/metabolismo , Animais , Movimento Celular , Proliferação de Células , Autorrenovação Celular , Células Cultivadas , Regulador de Condutância Transmembrana em Fibrose Cística/genética , Regulação da Expressão Gênica no Desenvolvimento , Camundongos Endogâmicos C57BL , Camundongos Knockout , MicroRNAs/genética , Néfrons/embriologia , Organogênese , Transdução de Sinais
14.
Mol Biol Evol ; 35(8): 2034-2045, 2018 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-29897475

RESUMO

Some of the fastest evolving regions of the human genome are conserved noncoding elements with many human-specific DNA substitutions. These human accelerated regions (HARs) are enriched nearby regulatory genes, and several HARs function as developmental enhancers. To investigate if this evolutionary signature is unique to humans, we quantified evidence of accelerated substitutions in conserved genomic elements across multiple lineages and applied this approach simultaneously to the genomes of five apes: human, chimpanzee, gorilla, orangutan, and gibbon. We find roughly similar numbers and genomic distributions of lineage-specific accelerated regions (linARs) in all five apes. In particular, apes share an enrichment of linARs in regulatory DNA nearby genes involved in development, especially transcription factors and other regulators. Many developmental loci harbor clusters of nonoverlapping linARs from multiple apes, suggesting that accelerated evolution in each species affected distinct regulatory elements that control a shared set of developmental pathways. Our statistical tests distinguish between GC-biased and unbiased accelerated substitution rates, allowing us to quantify the roles of different evolutionary forces in creating linARs. We find evidence of GC-biased gene conversion in each ape, but unbiased acceleration consistent with positive selection or loss of constraint is more common in all five lineages. It therefore appears that similar evolutionary processes created independent accelerated regions in the genomes of different apes, and that these lineage-specific changes to conserved noncoding sequences may have differentially altered expression of a core set of developmental genes across ape evolution.


Assuntos
Evolução Molecular , Hominidae/genética , Algoritmos , Animais , Simulação por Computador , Conversão Gênica , Hominidae/crescimento & desenvolvimento , Humanos , Modelos Genéticos , Seleção Genética
15.
Bioinformatics ; 34(13): i79-i88, 2018 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-29950006

RESUMO

Motivation: Genome-wide transcriptome sequencing applied to single cells (scRNA-seq) is rapidly becoming an assay of choice across many fields of biological and biomedical research. Scientific objectives often revolve around discovery or characterization of types or sub-types of cells, and therefore, obtaining accurate cell-cell similarities from scRNA-seq data is a critical step in many studies. While rapid advances are being made in the development of tools for scRNA-seq data analysis, few approaches exist that explicitly address this task. Furthermore, abundance and type of noise present in scRNA-seq datasets suggest that application of generic methods, or of methods developed for bulk RNA-seq data, is likely suboptimal. Results: Here, we present RAFSIL, a random forest based approach to learn cell-cell similarities from scRNA-seq data. RAFSIL implements a two-step procedure, where feature construction geared towards scRNA-seq data is followed by similarity learning. It is designed to be adaptable and expandable, and RAFSIL similarities can be used for typical exploratory data analysis tasks like dimension reduction, visualization and clustering. We show that our approach compares favorably with current methods across a diverse collection of datasets, and that it can be used to detect and highlight unwanted technical variation in scRNA-seq datasets in situations where other methods fail. Overall, RAFSIL implements a flexible approach yielding a useful tool that improves the analysis of scRNA-seq data. Availability and implementation: The RAFSIL R package is available at www.kostkalab.net/software.html. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Software , Análise por Conglomerados
16.
PLoS Genet ; 12(2): e1005788, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26862916

RESUMO

Elimination of the proliferating germline extends lifespan in C. elegans. This phenomenon provides a unique platform to understand how complex metazoans retain metabolic homeostasis when challenged with major physiological perturbations. Here, we demonstrate that two conserved transcription regulators essential for the longevity of germline-less adults, DAF-16/FOXO3A and TCER-1/TCERG1, concurrently enhance the expression of multiple genes involved in lipid synthesis and breakdown, and that both gene classes promote longevity. Lipidomic analyses revealed that key lipogenic processes, including de novo fatty acid synthesis, triglyceride production, desaturation and elongation, are augmented upon germline removal. Our data suggest that lipid anabolic and catabolic pathways are coordinately augmented in response to germline loss, and this metabolic shift helps preserve lipid homeostasis. DAF-16 and TCER-1 also perform essential inhibitory functions in germline-ablated animals. TCER-1 inhibits the somatic gene-expression program that facilitates reproduction and represses anti-longevity genes, whereas DAF-16 impedes ribosome biogenesis. Additionally, we discovered that TCER-1 is critical for optimal fertility in normal adults, suggesting that the protein acts as a switch supporting reproductive fitness or longevity depending on the presence or absence of the germline. Collectively, our data offer insights into how organisms adapt to changes in reproductive status, by utilizing the activating and repressive functions of transcription factors and coordinating fat production and degradation.


Assuntos
Adaptação Fisiológica , Proteínas de Caenorhabditis elegans/metabolismo , Caenorhabditis elegans/fisiologia , Fatores de Transcrição Forkhead/metabolismo , Células Germinativas/metabolismo , Homeostase , Metabolismo dos Lipídeos , Fatores de Alongamento de Peptídeos/metabolismo , Animais , Dieta , Regulação para Baixo/genética , Ácidos Graxos/metabolismo , Fertilidade/genética , Regulação da Expressão Gênica no Desenvolvimento , Longevidade , Mutação/genética , Biossíntese de Proteínas/genética , Receptores Notch/metabolismo , Reprodução , Transcriptoma/genética , Triglicerídeos/metabolismo , Regulação para Cima/genética
18.
Nature ; 478(7370): 476-82, 2011 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-21993624

RESUMO

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.


Assuntos
Evolução Molecular , Genoma Humano/genética , Genoma/genética , Mamíferos/genética , Animais , Doença , Éxons/genética , Genômica , Saúde , Humanos , Anotação de Sequência Molecular , Filogenia , RNA/classificação , RNA/genética , Seleção Genética/genética , Alinhamento de Sequência , Análise de Sequência de DNA
19.
PLoS Genet ; 9(8): e1003684, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23966869

RESUMO

GC-biased gene conversion (gBGC) is a recombination-associated process that favors the fixation of G/C alleles over A/T alleles. In mammals, gBGC is hypothesized to contribute to variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations, but its prevalence and general functional consequences remain poorly understood. gBGC is difficult to incorporate into models of molecular evolution and so far has primarily been studied using summary statistics from genomic comparisons. Here, we introduce a new probabilistic model that captures the joint effects of natural selection and gBGC on nucleotide substitution patterns, while allowing for correlations along the genome in these effects. We implemented our model in a computer program, called phastBias, that can accurately detect gBGC tracts about 1 kilobase or longer in simulated sequence alignments. When applied to real primate genome sequences, phastBias predicts gBGC tracts that cover roughly 0.3% of the human and chimpanzee genomes and account for 1.2% of human-chimpanzee nucleotide differences. These tracts fall in clusters, particularly in subtelomeric regions; they are enriched for recombination hotspots and fast-evolving sequences; and they display an ongoing fixation preference for G and C alleles. They are also significantly enriched for disease-associated polymorphisms, suggesting that they contribute to the fixation of deleterious alleles. The gBGC tracts provide a unique window into historical recombination processes along the human and chimpanzee lineages. They supply additional evidence of long-term conservation of megabase-scale recombination rates accompanied by rapid turnover of hotspots. Together, these findings shed new light on the evolutionary, functional, and disease implications of gBGC. The phastBias program and our predicted tracts are freely available.


Assuntos
Evolução Molecular , Conversão Gênica/genética , Pan troglodytes/genética , Filogenia , Seleção Genética , Animais , Sequência de Bases , Mapeamento Cromossômico , Genoma , Humanos , Mamíferos , Modelos Teóricos , Recombinação Genética , Alinhamento de Sequência
20.
Bioinformatics ; 30(17): i408-14, 2014 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-25161227

RESUMO

MOTIVATION: Methylation of CpG dinucleotides is a prevalent epigenetic modification that is required for proper development in vertebrates. Genome-wide DNA methylation assays have become increasingly common, and this has enabled characterization of DNA methylation in distinct stages across differentiating cellular lineages. Changes in CpG methylation are essential to cellular differentiation; however, current methods for modeling methylation dynamics do not account for the dependency structure between precursor and dependent cell types. RESULTS: We developed a continuous-time Markov chain approach, based on the observation that changes in methylation state over tissue differentiation can be modeled similarly to DNA nucleotide changes over evolutionary time. This model explicitly takes precursor to descendant relationships into account and enables inference of CpG methylation dynamics. To illustrate our method, we analyzed a high-resolution methylation map of the differentiation of mouse stem cells into several blood cell types. Our model can successfully infer unobserved CpG methylation states from observations at the same sites in related cell types (90% correct), and this approach more accurately reconstructs missing data than imputation based on neighboring CpGs (84% correct). Additionally, the single CpG resolution of our methylation dynamics estimates enabled us to show that DNA sequence context of CpG sites is informative about methylation dynamics across tissue differentiation. Finally, we identified genomic regions with clusters of highly dynamic CpGs and present a likely functional example. Our work establishes a framework for inference and modeling that is well suited to DNA methylation data, and our success suggests that other methods for analyzing DNA nucleotide substitutions will also translate to the modeling of epigenetic phenomena. AVAILABILITY AND IMPLEMENTATION: Source code is available at www.kostkalab.net/software.


Assuntos
Metilação de DNA , Modelos Genéticos , Animais , Sequência de Bases , Diferenciação Celular/genética , Ilhas de CpG , DNA/química , DNA/metabolismo , Genômica , Cadeias de Markov , Camundongos , Filogenia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA