Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Bioinformatics ; 38(10): 2899-2911, 2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35561169

RESUMO

MOTIVATION: Regulatory elements (REs), such as enhancers and promoters, are known as regulatory sequences functional in a heterogeneous regulatory network to control gene expression by recruiting transcription regulators and carrying genetic variants in a context specific way. Annotating those REs relies on costly and labor-intensive next-generation sequencing and RNA-guided editing technologies in many cellular contexts. RESULTS: We propose a systematic Gene Ontology Annotation method for Regulatory Elements (RE-GOA) by leveraging the powerful word embedding in natural language processing. We first assemble a heterogeneous network by integrating context specific regulations, protein-protein interactions and gene ontology (GO) terms. Then we perform network embedding and associate regulatory elements with GO terms by assessing their similarity in a low dimensional vector space. With three applications, we show that RE-GOA outperforms existing methods in annotating TFs' binding sites from ChIP-seq data, in functional enrichment analysis of differentially accessible peaks from ATAC-seq data, and in revealing genetic correlation among phenotypes from their GWAS summary statistics data. AVAILABILITY AND IMPLEMENTATION: The source code and the systematic RE annotation for human and mouse are available at https://github.com/AMSSwanglab/RE-GOA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , Sequências Reguladoras de Ácido Nucleico , Animais , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Camundongos , Anotação de Sequência Molecular , Regiões Promotoras Genéticas
2.
Nucleic Acids Res ; 49(W1): W483-W490, 2021 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-33999180

RESUMO

Chromatin accessibility, as a powerful marker of active DNA regulatory elements, provides valuable information for understanding regulatory mechanisms. The revolution in high-throughput methods has accumulated massive chromatin accessibility profiles in public repositories. Nevertheless, utilization of these data is hampered by cumbersome collection, time-consuming processing, and manual chromatin accessibility (openness) annotation of genomic regions. To fill this gap, we developed OpenAnnotate (http://health.tsinghua.edu.cn/openannotate/) as the first web server for efficiently annotating openness of massive genomic regions across various biosample types, tissues, and biological systems. In addition to the annotation resource from 2729 comprehensive profiles of 614 biosample types of human and mouse, OpenAnnotate provides user-friendly functionalities, ultra-efficient calculation, real-time browsing, intuitive visualization, and elaborate application notebooks. We show its unique advantages compared to existing databases and toolkits by effectively revealing cell type-specificity, identifying regulatory elements and 3D chromatin contacts, deciphering gene functional relationships, inferring functions of transcription factors, and unprecedentedly promoting single-cell data analyses. We anticipate OpenAnnotate will provide a promising avenue for researchers to construct a more holistic perspective to understand regulatory mechanisms.


Assuntos
Cromatina/metabolismo , Genômica/métodos , Anotação de Sequência Molecular/métodos , Software , Internet , Sequências Reguladoras de Ácido Nucleico , Análise de Célula Única , Fatores de Transcrição/metabolismo
3.
Bioinformatics ; 36(8): 2474-2485, 2020 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-31845960

RESUMO

MOTIVATION: Single cell RNA-seq data offers us new resource and resolution to study cell type identity and its conversion. However, data analyses are challenging in dealing with noise, sparsity and poor annotation at single cell resolution. Detecting cell-type-indicative markers is promising to help denoising, clustering and cell type annotation. RESULTS: We developed a new method, scTIM, to reveal cell-type-indicative markers. scTIM is based on a multi-objective optimization framework to simultaneously maximize gene specificity by considering gene-cell relationship, maximize gene's ability to reconstruct cell-cell relationship and minimize gene redundancy by considering gene-gene relationship. Furthermore, consensus optimization is introduced for robust solution. Experimental results on three diverse single cell RNA-seq datasets show scTIM's advantages in identifying cell types (clustering), annotating cell types and reconstructing cell development trajectory. Applying scTIM to the large-scale mouse cell atlas data identifies critical markers for 15 tissues as 'mouse cell marker atlas', which allows us to investigate identities of different tissues and subtle cell types within a tissue. scTIM will serve as a useful method for single cell RNA-seq data mining. AVAILABILITY AND IMPLEMENTATION: scTIM is freely available at https://github.com/Frank-Orwell/scTIM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
RNA-Seq , Análise de Célula Única , Algoritmos , Animais , Consenso , Camundongos , Análise de Sequência de RNA , Software
4.
Curr Biol ; 33(19): 4037-4051.e5, 2023 Oct 09.
Artigo em Inglês | MEDLINE | ID: mdl-37643619

RESUMO

The adaptation of Tibetans to high-altitude environments has been studied extensively. However, the direct assessment of evolutionary adaptation, i.e., the reproductive fitness of Tibetans and its genetic basis, remains elusive. Here, we conduct systematic phenotyping and genome-wide association analysis of 2,252 mother-newborn pairs of indigenous Tibetans, covering 12 reproductive traits and 76 maternal physiological traits. Compared with the lowland immigrants living at high altitudes, indigenous Tibetans show better reproductive outcomes, reflected by their lower abortion rate, higher birth weight, and better fetal development. The results of genome-wide association analyses indicate a polygenic adaptation of reproduction in Tibetans, attributed to the genomic backgrounds of both the mothers and the newborns. Furthermore, the EPAS1-edited mice display higher reproductive fitness under chronic hypoxia, mirroring the situation in Tibetans. Collectively, these results shed new light on the phenotypic pattern and the genetic mechanism of human reproductive fitness in extreme environments.

5.
Phenomics ; 2(6): 389-403, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-35990388

RESUMO

Human genetic variants can influence the severity of symptoms infected with SARS-COV-2. Several genome-wide association studies have identified human genomic risk single nucleotide polymorphisms (SNPs) associated with coronavirus disease 2019 (COVID-19) severity. However, the causal tissues or cell types underlying COVID-19 severity are uncertain. In addition, candidate genes associated with these risk SNPs were investigated based on genomic proximity instead of their functional cellular contexts. Here, we compiled regulatory networks of 77 human contexts and revealed those risk SNPs' enriched cellular contexts and associated risk SNPs with transcription factors, regulatory elements, and target genes. Twenty-one human contexts were identified and grouped into two categories: immune cells and epithelium cells. We further aggregated the regulatory networks of immune cells and epithelium cells. These two aggregated regulatory networks were investigated to reveal their association with risk SNPs' regulation. Two genomic clusters, the chemokine receptors cluster and the oligoadenylate synthetase (OAS) cluster, showed the strongest association with COVID-19 severity, and they had different regulatory programs in immune and epithelium contexts. Our findings were supported by analysis of both SNP array and whole genome sequencing-based genome wide association study (GWAS) summary statistics. Supplementary Information: The online version contains supplementary material available at 10.1007/s43657-022-00066-x.

6.
Elife ; 112022 12 16.
Artigo em Inglês | MEDLINE | ID: mdl-36525361

RESUMO

Systems genetics holds the promise to decipher complex traits by interpreting their associated SNPs through gene regulatory networks derived from comprehensive multi-omics data of cell types, tissues, and organs. Here, we propose SpecVar to integrate paired chromatin accessibility and gene expression data into context-specific regulatory network atlas and regulatory categories, conduct heritability enrichment analysis with genome-wide association studies (GWAS) summary statistics, identify relevant tissues, and estimate relevance correlation to depict common genetic factors acting in the shared regulatory networks between traits. Our method improves power upon existing approaches by associating SNPs with context-specific regulatory elements to assess heritability enrichments and by explicitly prioritizing gene regulations underlying relevant tissues. Ablation studies, independent data validation, and comparison experiments with existing methods on GWAS of six phenotypes show that SpecVar can improve heritability enrichment, accurately detect relevant tissues, and reveal causal regulations. Furthermore, SpecVar correlates the relevance patterns for pairs of phenotypes and better reveals shared SNP-associated regulations of phenotypes than existing methods. Studying GWAS of 206 phenotypes in UK Biobank demonstrates that SpecVar leverages the context-specific regulatory network atlas to prioritize phenotypes' relevant tissues and shared heritability for biological and therapeutic insights. SpecVar provides a powerful way to interpret SNPs via context-specific regulatory networks and is available at https://github.com/AMSSwanglab/SpecVar, copy archived at swh:1:rev:cf27438d3f8245c34c357ec5f077528e6befe829.


Assuntos
Redes Reguladoras de Genes , Estudo de Associação Genômica Ampla , Fenótipo , Regulação da Expressão Gênica , Herança Multifatorial/genética , Polimorfismo de Nucleotídeo Único
7.
Nat Commun ; 13(1): 7832, 2022 12 20.
Artigo em Inglês | MEDLINE | ID: mdl-36539420

RESUMO

Standard genome-wide association studies (GWASs) rely on analyzing a single trait at a time. However, many human phenotypes are complex and composed by multiple correlated traits. Here we introduce C-GWAS, a method for combining GWAS summary statistics of multiple potentially correlated traits. Extensive computer simulations demonstrated increased statistical power of C-GWAS compared to the minimal p-values of multiple single-trait GWASs (MinGWAS) and the current state-of-the-art method for combining single-trait GWASs (MTAG). Applying C-GWAS to a meta-analysis dataset of 78 single trait facial GWASs from 10,115 Europeans identified 56 study-wide suggestively significant loci with multi-trait effects on facial morphology of which 17 are novel loci. Using data from additional 13,622 European and Asian samples, 46 (82%) loci, including 9 (53%) novel loci, were replicated at nominal significance with consistent allele effects. Functional analyses further strengthen the reliability of our C-GWAS findings. Our study introduces the C-GWAS method and makes it available as computationally efficient open-source R package for widespread future use. Our work also provides insights into the genetic architecture of human facial appearance.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Humanos , Estudo de Associação Genômica Ampla/métodos , Reprodutibilidade dos Testes , Fenótipo , Simulação por Computador
8.
Nat Commun ; 13(1): 3883, 2022 07 06.
Artigo em Inglês | MEDLINE | ID: mdl-35794099

RESUMO

Epigenetic information regulates gene expression and development. However, our understanding of the evolution of epigenetic regulation on brain development in primates is limited. Here, we compared chromatin accessibility landscapes and transcriptomes during fetal prefrontal cortex (PFC) development between rhesus macaques and humans. A total of 304,761 divergent DNase I-hypersensitive sites (DHSs) are identified between rhesus macaques and humans, although many of these sites share conserved DNA sequences. Interestingly, most of the cis-elements linked to orthologous genes with dynamic expression are divergent DHSs. Orthologous genes expressed at earlier stages tend to have conserved cis-elements, whereas orthologous genes specifically expressed at later stages seldom have conserved cis-elements. These genes are enriched in synapse organization, learning and memory. Notably, DHSs in the PFC at early stages are linked to human educational attainment and cognitive performance. Collectively, the comparison of the chromatin epigenetic landscape between rhesus macaques and humans suggests a potential role for regulatory elements in the evolution of differences in cognitive ability between non-human primates and humans.


Assuntos
Cromatina , Epigênese Genética , Animais , Cromatina/genética , Desoxirribonuclease I/metabolismo , Humanos , Macaca mulatta/genética , Macaca mulatta/metabolismo , Córtex Pré-Frontal/metabolismo , Sequências Reguladoras de Ácido Nucleico
9.
Commun Biol ; 4(1): 442, 2021 04 06.
Artigo em Inglês | MEDLINE | ID: mdl-33824393

RESUMO

Cranial Neural Crest Cells (CNCC) originate at the cephalic region from forebrain, midbrain and hindbrain, migrate into the developing craniofacial region, and subsequently differentiate into multiple cell types. The entire specification, delamination, migration, and differentiation process is highly regulated and abnormalities during this craniofacial development cause birth defects. To better understand the molecular networks underlying CNCC, we integrate paired gene expression & chromatin accessibility data and reconstruct the genome-wide human Regulatory network of CNCC (hReg-CNCC). Consensus optimization predicts high-quality regulations and reveals the architecture of upstream, core, and downstream transcription factors that are associated with functions of neural plate border, specification, and migration. hReg-CNCC allows us to annotate genetic variants of human facial GWAS and disease traits with associated cis-regulatory modules, transcription factors, and target genes. For example, we reveal the distal and combinatorial regulation of multiple SNPs to core TF ALX1 and associations to facial distances and cranial rare disease. In addition, hReg-CNCC connects the DNA sequence differences in evolution, such as ultra-conserved elements and human accelerated regions, with gene expression and phenotype. hReg-CNCC provides a valuable resource to interpret genetic variants as early as gastrulation during embryonic development. The network resources are available at https://github.com/AMSSwanglab/hReg-CNCC .


Assuntos
Diferenciação Celular , Regulação da Expressão Gênica no Desenvolvimento , Redes Reguladoras de Genes , Crista Neural/embriologia , Humanos
10.
Artigo em Inglês | MEDLINE | ID: mdl-29994260

RESUMO

Feature selection is the process of selecting a subset of landmark features for model construction when there are many features and a comparatively few samples. The far-reaching development technologies such as biological sequencing at single cell level make feature selection a more challenging work. The difficulty lies in four facts: those features measured are in high dimension and with noise; dropouts make the data much sparse; many features are either redundant or irrelevant; and samples are not well-labeled in the experiments. Here, we propose a new model called ELF (Extract Landmark Features) to address the above challenges. ELF aims to simultaneously maximize topology maintenance to keep the pairwise relationships among samples, minimize feature redundancy to diversify the features, and maximize feature specificity to make every selected feature more representative. This makes ELF a nonlinear combinatorial optimization. To solve this difficult problem, we propose a heuristic algorithm based on greedy strategy. We show ELF's outstanding performance on two single cell RNA-seq datasets. One is the direct reprogramming from mouse embryonic fibroblasts to induced neuron and the other is hepatoblast differentiation. ELF is able to choose only hundreds of landmark genes to maintain the cells' correlativity. Topology maintenance, redundancy removal, and specificity each plays its important role in selecting landmark features and revealing cells' biological functions. In addition, ELF can be generally applied in other scenarios. We demonstrate that ELF can reveal pivotal pixel in writing region and human face in two public image datasets. We believe that ELF is a useful tool to obtain more interpretable results by revealing key features while clustering the samples well.


Assuntos
Análise por Conglomerados , Biologia Computacional/métodos , Processamento de Imagem Assistida por Computador/métodos , Análise de Célula Única/métodos , Algoritmos , Animais , Bases de Dados Factuais , Face/anatomia & histologia , Humanos , Camundongos , RNA/genética , Sensibilidade e Especificidade , Análise de Sequência de RNA , Aprendizado de Máquina não Supervisionado
11.
Nat Commun ; 11(1): 4928, 2020 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-33004791

RESUMO

High-altitude adaptation of Tibetans represents a remarkable case of natural selection during recent human evolution. Previous genome-wide scans found many non-coding variants under selection, suggesting a pressing need to understand the functional role of non-coding regulatory elements (REs). Here, we generate time courses of paired ATAC-seq and RNA-seq data on cultured HUVECs under hypoxic and normoxic conditions. We further develop a variant interpretation methodology (vPECA) to identify active selected REs (ASREs) and associated regulatory network. We discover three causal SNPs of EPAS1, the key adaptive gene for Tibetans. These SNPs decrease the accessibility of ASREs with weakened binding strength of relevant TFs, and cooperatively down-regulate EPAS1 expression. We further construct the downstream network of EPAS1, elucidating its roles in hypoxic response and angiogenesis. Collectively, we provide a systematic approach to interpret phenotype-associated noncoding variants in proper cell types and relevant dynamic conditions, to model their impact on gene regulation.


Assuntos
Aclimatação/genética , Cromatina/metabolismo , Etnicidade/genética , Redes Reguladoras de Genes , Modelos Genéticos , Altitude , Doença da Altitude/etnologia , Doença da Altitude/genética , Doença da Altitude/metabolismo , Fatores de Transcrição Hélice-Alça-Hélice Básicos/genética , Hipóxia Celular/genética , Células Cultivadas , Cromatina/genética , Sequenciamento de Cromatina por Imunoprecipitação , Resistência à Doença/genética , Feminino , Regulação da Expressão Gênica , Células Endoteliais da Veia Umbilical Humana , Humanos , Hipóxia/genética , Hipóxia/metabolismo , Oxigênio/metabolismo , Polimorfismo de Nucleotídeo Único , Gravidez , Cultura Primária de Células , RNA-Seq , Elementos Reguladores de Transcrição/genética , Seleção Genética , Tibet/etnologia , Fatores de Transcrição/metabolismo , Sequenciamento Completo do Genoma
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA