Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
Nucleic Acids Res ; 52(D1): D72-D80, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37904589

RESUMO

G-quadruplexes (G4s) are non-canonical four-stranded structures and are emerging as novel genetic regulatory elements. However, a comprehensive genomic annotation of endogenous G4s (eG4s) and systematic characterization of their regulatory network are still lacking, posing major challenges for eG4 research. Here, we present EndoQuad (https://EndoQuad.chenzxlab.cn/) to address these pressing issues by integrating high-throughput experimental data. First, based on high-quality genome-wide eG4s mapping datasets (human: 1181; mouse: 24; chicken: 2) generated by G4 ChIP-seq/CUT&Tag, we generate a reference set of genome-wide eG4s. Our multi-omics analyses show that most eG4s are identified in one or a few cell types. The eG4s with higher occurrences across samples are more structurally stable, evolutionarily conserved, enriched in promoter regions, mark highly expressed genes and associate with complex regulatory programs, demonstrating higher confidence level for further experiments. Finally, we integrate millions of functional genomic variants and prioritize eG4s with regulatory functions in disease and cancer contexts. These efforts have culminated in the comprehensive and interactive database of experimentally validated DNA eG4s. As such, EndoQuad enables users to easily access, download and repurpose these data for their own research. EndoQuad will become a one-stop resource for eG4 research and lay the foundation for future functional studies.


Assuntos
Bases de Dados Genéticas , Quadruplex G , Sequências Reguladoras de Ácido Nucleico , Animais , Humanos , Camundongos , Genoma , Genômica
2.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37232385

RESUMO

The volume of ribonucleic acid (RNA)-seq data has increased exponentially, providing numerous new insights into various biological processes. However, due to significant practical challenges, such as data heterogeneity, it is still difficult to ensure the quality of these data when integrated. Although some quality control methods have been developed, sample consistency is rarely considered and these methods are susceptible to artificial factors. Here, we developed MassiveQC, an unsupervised machine learning-based approach, to automatically download and filter large-scale high-throughput data. In addition to the read quality used in other tools, MassiveQC also uses the alignment and expression quality as model features. Meanwhile, it is user-friendly since the cutoff is generated from self-reporting and is applicable to multimodal data. To explore its value, we applied MassiveQC to Drosophila RNA-seq data and generated a comprehensive transcriptome atlas across 28 tissues from embryogenesis to adulthood. We systematically characterized fly gene expression dynamics and found that genes with high expression dynamics were likely to be evolutionarily young and expressed at late developmental stages, exhibiting high nonsynonymous substitution rates and low phenotypic severity, and they were involved in simple regulatory programs. We also discovered that human and Drosophila had strong positive correlations in gene expression in orthologous organs, revealing the great potential of the Drosophila system for studying human development and disease.


Assuntos
Drosophila melanogaster , Transcriptoma , Humanos , Animais , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Perfilação da Expressão Gênica/métodos , RNA/genética , RNA-Seq , Análise de Sequência de RNA , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Drosophila
3.
Nucleic Acids Res ; 47(D1): D835-D840, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30380119

RESUMO

Many animal species present sex differences. Sex-associated genes (SAGs), which have female-biased or male-biased expression, have major influences on the remarkable sex differences in important traits such as growth, reproduction, disease resistance and behaviors. However, the SAGs resulting in the vast majority of phenotypic sex differences are still unknown. To provide a useful resource for the functional study of SAGs, we manually curated public RNA-seq datasets with paired female and male biological replicates from the same condition and systematically re-analyzed the datasets using standardized methods. We identified 27,793 female-biased SAGs and 64,043 male-biased SAGs from 2,828 samples of 21 species, including human, chimpanzee, macaque, mouse, rat, cow, horse, chicken, zebrafish, seven fly species and five worm species. All these data were cataloged into SAGD, a user-friendly database of SAGs (http://bioinfo.life.hust.edu.cn/SAGD) where users can browse SAGs by gene, species, drug and dataset. In SAGD, the expression, annotation, targeting drugs, homologs, ontology and related RNA-seq datasets of SAGs are provided to help researchers to explore their functions and potential applications in agriculture and human health.


Assuntos
Bases de Dados Genéticas , Regulação da Expressão Gênica/genética , Caracteres Sexuais , Transcriptoma/genética , Animais , Bovinos , Dípteros/genética , Feminino , Cavalos/genética , Humanos , Masculino , Camundongos , Anotação de Sequência Molecular , Ratos , Reprodução/genética , Software , Peixe-Zebra/genética
4.
BMC Bioinformatics ; 21(1): 252, 2020 Jun 18.
Artigo em Inglês | MEDLINE | ID: mdl-32552728

RESUMO

BACKGROUND: Many disease causing genes have been identified through different methods, but there have been no uniform annotations of biomedical named entity (bio-NE) of the disease phenotypes of these genes yet. Furthermore, semantic similarity comparison between two bio-NE annotations has become important for data integration or system genetics analysis. RESULTS: The package pyMeSHSim recognizes bio-NEs by using MetaMap which produces Unified Medical Language System (UMLS) concepts in natural language process. To map the UMLS concepts to Medical Subject Headings (MeSH), pyMeSHSim is embedded with a house-made dataset containing the main headings (MHs), supplementary concept records (SCRs), and their relations in MeSH. Based on the dataset, pyMeSHSim implemented four information content (IC)-based algorithms and one graph-based algorithm to measure the semantic similarity between two MeSH terms. To evaluate its performance, we used pyMeSHSim to parse OMIM and GWAS phenotypes. The pyMeSHSim introduced SCRs and the curation strategy of non-MeSH-synonymous UMLS concepts, which improved the performance of pyMeSHSim in the recognition of OMIM phenotypes. In the curation of 461 GWAS phenotypes, pyMeSHSim showed recall > 0.94, precision > 0.56, and F1 > 0.70, demonstrating better performance than the state-of-the-art tools DNorm and TaggerOne in recognizing MeSH terms from short biomedical phrases. The semantic similarity in MeSH terms recognized by pyMeSHSim and the previous manual work was calculated by pyMeSHSim and another semantic analysis tool meshes, respectively. The result indicated that the correlation of semantic similarity analysed by two tools reached as high as 0.89-0.99. CONCLUSIONS: The integrative MeSH tool pyMeSHSim embedded with the MeSH MHs and SCRs realized the bio-NE recognition, normalization, and comparison in biomedical text-mining.


Assuntos
Medical Subject Headings , Semântica , Unified Medical Language System/normas , Humanos
5.
Database (Oxford) ; 20232023 05 18.
Artigo em Inglês | MEDLINE | ID: mdl-37207350

RESUMO

Enhancers, which are key tumorigenic factors with wide applications for subtyping, diagnosis and treatment of cancer, are attracting increasing attention in the cancer research. However, systematic analysis of cancer enhancers poses a challenge due to the lack of integrative data resources, especially those from tumor primary tissues. To provide a comprehensive enhancer profile across cancer types, we developed a cancer enhancer database CenhANCER by curating public resources including all the public H3K27ac ChIP-Seq data from 805 primary tissue samples and 671 cell line samples across 41 cancer types. In total, 57 029 408 typical enhancers, 978 411 super-enhancers and 226 726 enriched transcription factors were identified. We annotated the super-enhancers with chromatin accessibility regions, cancer expression quantitative trait loci (eQTLs), genotype-tissue expression eQTLs and genome-wide association study risk single nucleotide polymorphisms (SNPs) for further functional analysis. The identified enhancers were highly consistent with accessible chromatin regions in the corresponding cancer types, and all the 10 super-enhancer regions identified from one colorectal cancer study were recapitulated in our CenhANCER, both of which testified the high quality of our data. CenhANCER with high-quality cancer enhancer candidates and transcription factors that are potential therapeutic targets across multiple cancer types provides a credible resource for single cancer analysis and for comparative studies of various cancer types. Database URL http://cenhancer.chenzxlab.cn/.


Assuntos
Estudo de Associação Genômica Ampla , Neoplasias , Humanos , Elementos Facilitadores Genéticos/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Linhagem Celular , Cromatina , Neoplasias/genética
6.
J Genet Genomics ; 48(12): 1122-1129, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34538772

RESUMO

The origination of new genes contributes to the biological diversity of life. New genes may quickly build their network, exert important functions, and generate novel phenotypes. Dating gene age and inferring the origination mechanisms of new genes, like primate-specific genes, is the basis for the functional study of the genes. However, no comprehensive resource of gene age estimates across species is available. Here, we systematically date the age of 9,102,113 protein-coding genes from 565 species in the Ensembl and Ensembl Genomes databases, including 82 bacteria, 57 protists, 134 fungi, 58 plants, 56 metazoa, and 178 vertebrates, using a protein-family-based pipeline with Wagner parsimony algorithm. We also collect gene age estimate data from other studies and uniformly distribute the gene age estimates to time ranges in a million years for comparison across studies. All the data are cataloged into GenOrigin (http://genorigin.chenzxlab.cn/), a user-friendly new database of gene age estimates, where users can browse gene age estimates by species, age, and gene ontology. In GenOrigin, the information such as gene age estimates, annotation, gene ontology, ortholog, and paralog, as well as detailed gene presence/absence views for gene age inference based on the species tree with evolutionary timescale, is provided to researchers for exploring gene functions.


Assuntos
Evolução Molecular , Vertebrados , Algoritmos , Animais , Filogenia , Software , Vertebrados/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA