Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 22
1.
Front Cell Dev Biol ; 11: 1209320, 2023.
Article En | MEDLINE | ID: mdl-38020907

Background: Currently, the mechanism(s) underlying corticogenesis is still under characterization. Methods: We curated the most comprehensive single-cell RNA-seq (scRNA-seq) datasets from mouse and human fetal cortexes for data analysis and confirmed the findings with co-immunostaining experiments. Results: By analyzing the developmental trajectories with scRNA-seq datasets in mice, we identified a specific developmental sub-path contributed by a cell-population expressing both deep- and upper-layer neurons (DLNs and ULNs) specific markers, which occurred on E13.5 but was absent in adults. In this cell-population, the percentages of cells expressing DLN and ULN markers decreased and increased, respectively, during the development suggesting direct neuronal transition (namely D-T-U). Whilst genes significantly highly/uniquely expressed in D-T-U cell population were significantly enriched in PTN/MDK signaling pathways related to cell migration. Both findings were further confirmed by co-immunostaining with DLNs, ULNs and D-T-U specific markers across different timepoints. Furthermore, six genes (co-expressed with D-T-U specific markers in mice) showing a potential opposite temporal expression between human and mouse during fetal cortical development were associated with neuronal migration and cognitive functions. In adult prefrontal cortexes (PFC), D-T-U specific genes were expressed in neurons from different layers between humans and mice. Conclusion: Our study characterizes a specific cell population D-T-U showing direct DLNs to ULNs neuronal transition and migration during fetal cortical development in mice. It is potentially associated with the difference of cortical development in humans and mice.

2.
Nucleic Acids Res ; 51(21): 11770-11782, 2023 Nov 27.
Article En | MEDLINE | ID: mdl-37870428

Precision medicine depends on high-accuracy individual-level genotype data. However, the whole-genome sequencing (WGS) is still not suitable for gigantic studies due to budget constraints. It is particularly important to construct highly accurate haplotype reference panel for genotype imputation. In this study, we used 10 000 samples with medium-depth WGS to construct a reference panel that we named the CKB reference panel. By imputing microarray datasets, it showed that the CKB panel outperformed compared panels in terms of both the number of well-imputed variants and imputation accuracy. In addition, we have completed the imputation of 100 706 microarrays with the CKB panel, and the after-imputed data is the hitherto largest whole genome data of the Chinese population. Furthermore, in the GWAS analysis of real phenotype height, the number of tested SNPs tripled and the number of significant SNPs doubled after imputation. Finally, we developed an online server for offering free genotype imputation service based on the CKB reference panel (https://db.cngb.org/imputation/). We believe that the CKB panel is of great value for imputing microarray or low-coverage genotype data of Chinese population, and potentially mixed populations. The imputation-completed 100 706 microarray data are enormous and precious resources of population genetic studies for complex traits and diseases.


Biological Specimen Banks , Genome , Humans , Haplotypes , Genotype , Genome-Wide Association Study , Polymorphism, Single Nucleotide , China
3.
iScience ; 26(8): 107287, 2023 Aug 18.
Article En | MEDLINE | ID: mdl-37539039

Budd-Chiari syndrome (BCS) is characterized by hepatic venous outflow obstruction, posing life-threatening risks in severe cases. Reported risk factors include inherited and acquired hypercoagulable states or other predisposing factors. However, many patients have no identifiable etiology, and causes of BCS differ between the West and East. This study recruited 500 BCS patients and 696 normal individuals for whole-exome sequencing and developed a polygenic risk scoring (PRS) model using PLINK, LASSOSUM, BLUP, and BayesA methods. Risk factors for venous thromboembolism and vascular malformations were also assessed for BCS risk prediction. Ultimately, we discovered potential BCS risk mutations, such as rs1042331, and the optimal BayesA-generated PRS model presented an AUC >0.9 in the external replication cohort. This model provides particular insights into genetic risk differences between China and the West and suggests shared genetic risks among BCS, venous thromboembolism, and vascular malformations, offering different perspectives on BCS pathogenesis.

4.
Clin Nutr ; 42(4): 511-518, 2023 04.
Article En | MEDLINE | ID: mdl-36857960

BACKGROUND & AIMS: Body mass index and waist circumference are simple measures of obesity. However, they do not distinguish between visceral and subcutaneous fat, or muscle, potentially leading to biased relationships between individual body composition parameters and adverse health outcomes. The purpose of this study was to develop and validate prediction models for volumetric adipose and muscle. METHODS: Based on cross-sectional data of 18,457, 18,260, and 17,052 White adults from the UK Biobank, we developed sex-specific equations to estimate visceral adipose tissue (VAT), abdominal subcutaneous adipose tissue (ASAT), and total thigh fat-free muscle (FFM) volumes, respectively. Volumetric magnetic resonance imaging served as the reference. We used the least absolute shrinkage and selection operator and the extreme gradient boosting methods separately to fit three sequential models, the inputs of which included demographics and anthropometrics and, in some, bioelectrical impedance analysis parameters. We applied comprehensive metrics to assess model performance in the temporal validation set. RESULTS: The equations that included more predictors generally performed better. Accuracy of the equations was moderate for VAT (percentage of estimates that differed <30% from the measured values, 70 to 78 in males, 64 to 69 in females) and good for ASAT (85 to 91 in males, 90 to 95 in females) and FFM (99 to 100 in both sexes). All the equations appeared precise (interquartile range of the difference, 0.89 to 1.76 L for VAT, 1.16 to 1.61 L for ASAT, 0.81 to 1.39 L for FFM). Bias of all the equations was negligible (-0.17 to 0.05 L for VAT, -0.10 to 0.12 L for ASAT, -0.07 to 0.09 L for FFM). The equations achieved superior cardiometabolic correlations compared with body mass index and waist circumference. CONCLUSIONS: The developed equations to estimate VAT, ASAT, and FFM volumes achieved moderate to good performance. They may be cost-effective tools to revisit the implications of diverse body components.


Biological Specimen Banks , Body Composition , Adult , Male , Female , Humans , Cross-Sectional Studies , Obesity/diagnosis , Body Mass Index , Subcutaneous Fat, Abdominal , United Kingdom
5.
Nat Commun ; 14(1): 1093, 2023 02 25.
Article En | MEDLINE | ID: mdl-36841846

Protein-Protein Interactions (PPIs) are fundamental means of functions and signalings in biological systems. The massive growth in demand and cost associated with experimental PPI studies calls for computational tools for automated prediction and understanding of PPIs. Despite recent progress, in silico methods remain inadequate in modeling the natural PPI hierarchy. Here we present a double-viewed hierarchical graph learning model, HIGH-PPI, to predict PPIs and extrapolate the molecular details involved. In this model, we create a hierarchical graph, in which a node in the PPI network (top outside-of-protein view) is a protein graph (bottom inside-of-protein view). In the bottom view, a group of chemically relevant descriptors, instead of the protein sequences, are used to better capture the structure-function relationship of the protein. HIGH-PPI examines both outside-of-protein and inside-of-protein of the human interactome to establish a robust machine understanding of PPIs. This model demonstrates high accuracy and robustness in predicting PPIs. Moreover, HIGH-PPI can interpret the modes of action of PPIs by identifying important binding and catalytic sites precisely. Overall, "HIGH-PPI [ https://github.com/zqgao22/HIGH-PPI ]" is a domain-knowledge-driven and interpretable framework for PPI prediction studies.


Deep Learning , Protein Interaction Mapping , Humans , Protein Interaction Mapping/methods , Proteins/metabolism , Amino Acid Sequence , Protein Interaction Maps
6.
Nucleic Acids Res ; 51(D1): D890-D895, 2023 01 06.
Article En | MEDLINE | ID: mdl-35871305

A high-quality genome variation database derived from a large-scale population is one of the most important infrastructures for genomics, clinical and translational medicine research. Here, we developed the Chinese Millionome Database (CMDB), a database that contains 9.04 million single nucleotide variants (SNV) with allele frequency information derived from low-coverage (0.06×-0.1×) whole-genome sequencing (WGS) data of 141 431 unrelated healthy Chinese individuals. These individuals were recruited from 31 out of the 34 administrative divisions in China, covering Han and 36 other ethnic minorities. CMDB, housing the WGS data of a multi-ethnic Chinese population featuring wide geographical distribution, has become the most representative and comprehensive Chinese population genome database to date. Researchers can quickly search for variant, gene or genomic regions to obtain the variant information, including mutation basic information, allele frequency, genic annotation and overview of frequencies in global populations. Furthermore, the CMDB also provides information on the association of the variants with a range of phenotypes, including height, BMI, maternal age and twin pregnancy. Based on these data, researchers can conduct meta-analysis of related phenotypes. CMDB is freely available at https://db.cngb.org/cmdb/.


Databases, Genetic , East Asian People , Humans , Gene Frequency , Mutation , China/ethnology , East Asian People/genetics , Genetic Variation , Genetics, Population
7.
Nucleic Acids Res ; 51(D1): D1168-D1178, 2023 01 06.
Article En | MEDLINE | ID: mdl-36350663

Characterization of the specific expression and chromatin profiles of genes enables understanding how they contribute to tissue/organ development and the mechanisms leading to diseases. Whilst the number of single-cell sequencing studies is increasing dramatically; however, data mining and reanalysis remains challenging. Herein, we systematically curated the up-to-date and most comprehensive datasets of sequencing data originating from 2760 bulk samples and over 5.1 million single-cells from multiple developmental periods from humans and multiple model organisms. With unified and systematic analysis, we profiled the gene expression and chromatin accessibility among 481 cell-types, 79 tissue-types and 92 timepoints, and pinpointed cells with the co-expression of target genes. We also enabled the detection of gene(s) with a temporal and cell-type specific expression profile that is similar to or distinct from that of a target gene. Additionally, we illustrated the potential upstream and downstream gene-gene regulation interactions, particularly under the same biological process(es) or KEGG pathway(s). Thus, TEDD (Temporal Expression during Development Database), a value-added database with a user-friendly interface, not only enables researchers to identify cell-type/tissue-type specific and temporal gene expression and chromatin profiles but also facilitates the association of genes with undefined biological functions in development and diseases. The database URL is https://TEDD.obg.cuhk.edu.hk/.


Databases, Genetic , Gene Expression , Humans , Chromatin/genetics , Gene Expression Regulation , User-Computer Interface , Animals , Embryonic Development , Organ Specificity
8.
Front Genet ; 13: 900548, 2022.
Article En | MEDLINE | ID: mdl-36110214

Purposes: We aimed to characterize the USH2A genotypic spectrum in a Chinese cohort and provide a detailed genetic profile for Chinese patients with USH2A-IRD. Methods: We designed a retrospective study wherein a total of 1,334 patients diagnosed with IRD were included as a study cohort, namely 1,278 RP and 56 USH patients, as well as other types of IEDs patients and healthy family members as a control cohort. The genotype-phenotype correlation of all participants with USH2A variant was evaluated. Results: Etiological mutations in USH2A, the most common cause of RP and USH, were found in 16.34% (n = 218) genetically solved IRD patients, with prevalences of 14.87% (190/1,278) and 50% (28/56). After bioinformatics and QC processing, 768 distinct USH2A variants were detected in all participants, including 136 disease-causing mutations present in 665 alleles, distributed in 5.81% of all participants. Of these 136 mutations, 43 were novel, nine were founder mutations, and two hot spot mutations with allele count ≥10. Furthermore, 38.5% (84/218) of genetically solved USH2A-IRD patients were caused by at least one of both c.2802T>G and c.8559-2 A>G mutations, and 36.9% and 69.6% of the alleles in the RP and USH groups were truncating, respectively. Conclusion: USH2A-related East Asian-specific founder and hot spot mutations were the major causes for Chinese RP and USH patients. Our study systematically delineated the genotype spectrum of USH2A-IRD, enabled accurate genetic diagnosis, and provided East Asian and other ethnicities with baseline data of a Chinese origin, which would better serve genetic counseling and therapeutic targets selection.

9.
Brief Bioinform ; 23(6)2022 11 19.
Article En | MEDLINE | ID: mdl-36124777

A transcriptional regulatory network (TRN) is a collection of transcription regulators with their associated downstream genes, which is highly condition-specific. Understanding how cell states can be programmed through small molecules/drugs or conditions by modulating the whole gene expression system granted us the potential to amend abnormal cells and cure diseases. Condition Orientated Regulatory Networks (CORN, https://qinlab.sysu.edu.cn/home) is a library of condition (small molecule/drug treatments and gene knockdowns)-based transcriptional regulatory sub-networks (TRSNs) that come with an online TRSN matching tool. It allows users to browse condition-associated TRSNs or match those TRSNs by inputting transcriptomic changes of interest. CORN utilizes transcriptomic changes data after specific conditional treatment in cells, and in vivo transcription factor (TF) binding data in cells, by combining TF binding information and calculations of significant expression alterations of TFs and genes after the conditional treatments, TRNs under the effect of different conditions were constructed. In short, CORN associated 1805 different types of specific conditions (small molecule/drug treatments and gene knockdowns) to 9553 TRSNs in 25 human cell lines, involving 204TFs. By linking and curating specific conditions to responsive TRNs, the scientific community can now perceive how TRNs are altered and controlled by conditions alone in an organized manner for the first time. This study demonstrated with examples that CORN can aid the understanding of molecular pathology, pharmacology and drug repositioning, and screened drugs with high potential for cancer and coronavirus disease 2019 (COVID-19) treatments.


COVID-19 , Gene Regulatory Networks , Humans , COVID-19/genetics , Transcription Factors/genetics , Transcription Factors/metabolism , Transcriptome
10.
Mol Genet Genomic Med ; 10(9): e2021, 2022 09.
Article En | MEDLINE | ID: mdl-35876299

PURPOSE: To expand the mutation spectrum of patients with familial exudative vitreoretinopathy (FEVR) disease. PARTICIPANTS: 74 probands (53 families and 21 sporadic probands) with familial exudative vitreoretinopathy (FEVR) disease and their available family members (n = 188) were recruited for sequencing. METHODS: Panel-based targeted screening was performed on all subjects. Before sanger sequencing, variants of LRP5, NDP, FZD4, TSPAN12, ZNF408, KIF11, RCBTB1, JAG1, and CTNNA1 genes were verified by a series of bioinformatics tools and genotype-phenotype co-segregation analysis. RESULTS: 40.54% (30/74) of the probands were sighted to possess at least one etiological mutation of the nine FEVR-causative genes. The etiological mutation detection rate was 37.74% (20/53) in family-attainable probands while 47.62% (10/21) in sporadic cases. The diagnosis rate of patients in the early-onset subgroup (≤5 years old, 45.4%) is higher than that of the children or adolescence-onset subgroup (6-16 years old, 42.1%) and the late-onset subgroup (≥17 years old, 39.4%). A total of 36 etiological mutations were identified in this study, comprising 26 novel mutations and 10 reported mutations. LRP5 was the most prevalent mutant gene among the 36 mutation types with a percentage of 41.67% (15/36). Followed by FZD4 (10/36, 27.78%), TSPAN12 (5/36, 13.89%), NDP (4/36, 11.11%), KIF11 (1/36, 2.78%), and RCBTB1 (1/36, 2.78%). Among these mutations, 63.89% (23/36) were missense mutations, 25.00% (9/36) were frameshift mutations, 5.56% (2/36) were splicing mutations, 5.56% (2/36) were nonsense mutations. Moreover, the clinical pathogenicity of these variants was defined according to American College of Medical Genetics (ACMG) and genomics guidelines: 41.67% (15/36) were likely pathogenic variants, 27.78% (10/36) pathogenic variants, 30.55% (11/36) variants of uncertain significance. No etiological mutations discovered in the ZNF408, JAG1, and CTNNA1 genes in this FEVR cohort. CONCLUSIONS: We systematically screened nine FEVR disease-associated genes in a cohort of 74 Chinese probands with FEVR disease. With a detection rate of 40.54%, 36 etiological mutations of six genes were authenticated in 30 probands, including 26 novel mutations and 10 reported mutations. The most prevalent mutated gene is LRP5, followed by FZD4, TSPAN12, NDP, KIF11, and RCBTB1. In total, a de novo mutation was confirmed. Our study significantly clarified the mutation spectrum of variants bounded up to FEVR disease.


Low Density Lipoprotein Receptor-Related Protein-5 , Retinal Diseases , Codon, Nonsense , DNA Mutational Analysis , DNA-Binding Proteins/genetics , Familial Exudative Vitreoretinopathies/genetics , Frizzled Receptors/genetics , Guanine Nucleotide Exchange Factors/genetics , Humans , Low Density Lipoprotein Receptor-Related Protein-5/genetics , Mutation , Pedigree , Retinal Diseases/genetics , Tetraspanins/genetics , Transcription Factors
11.
Nucleic Acids Res ; 50(D1): D934-D942, 2022 01 07.
Article En | MEDLINE | ID: mdl-34634807

Viral infectious diseases are a devastating and continuing threat to human and animal health. Receptor binding is the key step for viral entry into host cells. Therefore, recognizing viral receptors is fundamental for understanding the potential tissue tropism or host range of these pathogens. The rapid advancement of single-cell RNA sequencing (scRNA-seq) technology has paved the way for studying the expression of viral receptors in different tissues of animal species at single-cell resolution, resulting in huge scRNA-seq datasets. However, effectively integrating or sharing these datasets among the research community is challenging, especially for laboratory scientists. In this study, we manually curated up-to-date datasets generated in animal scRNA-seq studies, analyzed them using a unified processing pipeline, and comprehensively annotated 107 viral receptors in 142 viruses and obtained accurate expression signatures in 2 100 962 cells from 47 animal species. Thus, the VThunter database provides a user-friendly interface for the research community to explore the expression signatures of viral receptors. VThunter offers an informative and convenient resource for scientists to better understand the interactions between viral receptors and animal viruses and to assess viral pathogenesis and transmission in species. Database URL: https://db.cngb.org/VThunter/.


Databases, Factual , Genome, Viral , Host-Pathogen Interactions/genetics , Receptors, Virus/genetics , Software , Virus Diseases/genetics , Viruses/genetics , Animals , Binding Sites , Datasets as Topic , Gene Expression Regulation , High-Throughput Nucleotide Sequencing , Humans , Internet , Molecular Sequence Annotation , Protein Binding , Receptors, Virus/classification , Receptors, Virus/metabolism , Signal Transduction , Single-Cell Analysis , Virus Diseases/metabolism , Virus Diseases/transmission , Virus Diseases/virology , Viruses/classification , Viruses/metabolism , Viruses/pathogenicity
12.
BMC Med Genomics ; 14(1): 260, 2021 11 04.
Article En | MEDLINE | ID: mdl-34736471

BACKGROUND: Birth defects pose a major challenge to infant health. Thus far, however, the causes of most birth defects remain cryptic. Over the past few decades, considerable effort has been expended on disclosing the underlying mechanisms related to birth defects, yielding myriad treatises and data. To meet the increasing requirements for data resources, we developed a freely accessible birth defect multi-omics database (BDdb, http://t21omics.cngb.org ) consisting of multi-omics data and potential disease biomarkers. RESULTS: In total, omics datasets from 136 Gene Expression Omnibus (GEO) Series records, including 5245 samples, as well as 869 biomarkers of 22 birth defects in six different species, were integrated into the BDdb. The database provides a user-friendly interface for searching, browsing, and downloading data of interest. The BDdb also enables users to explore the correlations among different sequencing methods, such as chromatin immunoprecipitation sequencing (ChIP-Seq) and RNA sequencing (RNA-Seq) from different studies, to obtain the information on gene expression patterns from diverse aspects. CONCLUSION: To the best of our knowledge, the BDdb is the first comprehensive database associated with birth defects, which should benefit the diagnosis and prevention of birth defects.


Congenital Abnormalities/genetics , Databases, Genetic , Genomics , Metabolomics , Proteomics , Humans
13.
Front Microbiol ; 12: 687259, 2021.
Article En | MEDLINE | ID: mdl-34408729

Helicobacter pylori exhibit specific geographic distributions that are related to clinical outcomes. Despite the high infection rate of H. pylori throughout the world, the genetic epidemiology surveillance of H. pylori still needs to be improved. This study used the single nucleotide polymorphisms (SNPs) profiling approach based on whole genome sequencing (WGS) to facilitate genomic population analyses of H. pylori and encourage the dissemination of microbial genotyping strategies worldwide. A total number of 1,211 public H. pylori genomes were downloaded and used to construct the typing tool, named HpTT (H. pylori Typing Tool). Combined with the metadata, we developed two levels of genomic typing, including a continent-scale and a country scale that nested in the continent scale. Results showed that Asia was the largest isolate source in our dataset, while isolates from Europe and Oceania were comparatively more widespread. More specifically, Switzerland and Australia are the main sources of widespread isolates in their corresponding continents. To integrate all the typing information and enable researchers to compare their dataset against the existing global database easily and rapidly, a user-friendly website (https://db.cngb.org/HPTT/) was developed with both genomic typing tools and visualization tools. To further confirm the validity of the website, ten newly assembled genomes were downloaded and tested precisely located on the branch as we expected. In summary, the H. pylori typing tool (HpTT) is a novel genomic epidemiological tool that can achieve high-resolution analysis of genomic typing and visualizing simultaneously, providing insights into the genetic population structure, evolution analysis, and epidemiological surveillance of H. pylori.

14.
Front Genet ; 12: 708981, 2021.
Article En | MEDLINE | ID: mdl-34447413

It is well recognized that batch effect in single-cell RNA sequencing (scRNA-seq) data remains a big challenge when integrating different datasets. Here, we proposed deepMNN, a novel deep learning-based method to correct batch effect in scRNA-seq data. We first searched mutual nearest neighbor (MNN) pairs across different batches in a principal component analysis (PCA) subspace. Subsequently, a batch correction network was constructed by stacking two residual blocks and further applied for the removal of batch effects. The loss function of deepMNN was defined as the sum of a batch loss and a weighted regularization loss. The batch loss was used to compute the distance between cells in MNN pairs in the PCA subspace, while the regularization loss was to make the output of the network similar to the input. The experiment results showed that deepMNN can successfully remove batch effects across datasets with identical cell types, datasets with non-identical cell types, datasets with multiple batches, and large-scale datasets as well. We compared the performance of deepMNN with state-of-the-art batch correction methods, including the widely used methods of Harmony, Scanorama, and Seurat V4 as well as the recently developed deep learning-based methods of MMD-ResNet and scGen. The results demonstrated that deepMNN achieved a better or comparable performance in terms of both qualitative analysis using uniform manifold approximation and projection (UMAP) plots and quantitative metrics such as batch and cell entropies, ARI F1 score, and ASW F1 score under various scenarios. Additionally, deepMNN allowed for integrating scRNA-seq datasets with multiple batches in one step. Furthermore, deepMNN ran much faster than the other methods for large-scale datasets. These characteristics of deepMNN made it have the potential to be a new choice for large-scale single-cell gene expression data analysis.

15.
J Genet Genomics ; 48(5): 411-425, 2021 05 20.
Article En | MEDLINE | ID: mdl-34144929

The genetically engineered pig is regarded as an optimal source of organ transplantation for humans and an excellent model for human disease research, given its comparable physiology to human beings. A myriad of single-cell RNA sequencing (scRNA-seq) data on humans has been reported, but such data on pigs are scarce. Here, we apply scRNA-seq technology to study the cellular heterogeneity of 3-month-old pig lungs, generating the single-cell atlas of 13,580 cells covering 16 major cell types. Based on these data, we systematically characterize the similarities and differences in the cellular cross-talk and expression patterns of respiratory virus receptors in each cell type of pig lungs compared with human lungs. Furthermore, we analyze pig lung xenotransplantation barriers and reported the cell-type expression patterns of 10 genes associated with pig-to-human immunobiological incompatibility and coagulation dysregulation. We also investigate the conserved transcription factors (TFs) and their candidate target genes and constructed five conserved TF regulatory networks in the main cell types shared by pig and human lungs. Finally, we present a comprehensive and openly accessible online platform, ScdbLung. Our scRNA-seq atlas of the domestic pig lung and ScdbLung database can guide pig lung research and clinical applicability.


Gene Expression Profiling , Lung/metabolism , Single-Cell Analysis/methods , Sus scrofa/genetics , Transcriptome , Animals , Biomarkers , Computational Biology/methods , Conserved Sequence , Databases, Genetic , Disease Susceptibility/immunology , Evolution, Molecular , Genetic Predisposition to Disease , Host-Pathogen Interactions/genetics , Humans , Molecular Sequence Annotation , RNA-Seq , Swine , Web Browser
16.
Sci Bull (Beijing) ; 66(14): 1448-1461, 2021 Jul 30.
Article En | MEDLINE | ID: mdl-36654371

The brain of the domestic pig (Sus scrofa domesticus) has drawn considerable attention due to its high similarities to that of humans. However, the cellular compositions of the pig brain (PB) remain elusive. Here we investigated the single-nucleus transcriptomic profiles of five regions of the PB (frontal lobe, parietal lobe, temporal lobe, occipital lobe, and hypothalamus) and identified 21 cell subpopulations. The cross-species comparison of mouse and pig hypothalamus revealed the shared and specific gene expression patterns at the single-cell resolution. Furthermore, we identified cell types and molecular pathways closely associated with neurological disorders, bridging the gap between gene mutations and pathogenesis. We reported, to our knowledge, the first single-cell atlas of domestic pig cerebral cortex and hypothalamus combined with a comprehensive analysis across species, providing extensive resources for future research regarding neural science, evolutionary developmental biology, and regenerative medicine.

17.
BMC Genomics ; 21(1): 32, 2020 Jan 09.
Article En | MEDLINE | ID: mdl-31918660

BACKGROUND: Decapods are an order of crustaceans which includes shrimps, crabs, lobsters and crayfish. They occur worldwide and are of great scientific interest as well as being of ecological and economic importance in fisheries and aquaculture. However, our knowledge of their biology mainly comes from the group which is most closely related to crustaceans - insects. Here we produce a de novo transcriptome database, crustacean annotated transcriptome (CAT) database, spanning multiple tissues and the life stages of seven crustaceans. DESCRIPTION: A total of 71 transcriptome assemblies from six decapod species and a stomatopod species, including the coral shrimp Stenopus hispidus, the cherry shrimp Neocaridina davidi, the redclaw crayfish Cherax quadricarinatus, the spiny lobster Panulirus ornatus, the red king crab Paralithodes camtschaticus, the coconut crab Birgus latro, and the zebra mantis shrimp Lysiosquillina maculata, were generated. Differential gene expression analyses within species were generated as a reference and included in a graphical user interface database at http://cat.sls.cuhk.edu.hk/. Users can carry out gene name searches and also access gene sequences based on a sequence query using the BLAST search function. CONCLUSIONS: The data generated and deposited in this database offers a valuable resource for the further study of these crustaceans, as well as being of use in aquaculture development.


Decapoda/genetics , Transcriptome/genetics , Animals , Databases, Genetic
18.
Sci Rep ; 9(1): 9514, 2019 07 02.
Article En | MEDLINE | ID: mdl-31267025

Microalgal Chlorella has been demonstrated to process wastewater efficiently from piggery industry, yet optimization through genetic engineering of such a bio-treatment is currently challenging, largely due to the limited data and knowledge in genomics. In this study, we first investigated the differential growth rates among three wastewater-processing Chlorella strains: Chlorella sorokiniana BD09, Chlorella sorokiniana BD08 and Chlorella sp. Dachan, and the previously published Chlorella sorokiniana UTEX 1602, showing us that BD09 maintains the best tolerance in synthetic wastewater. We then performed genome sequencing and analysis, resulting in a high-quality assembly for each genome with scaffold N50 > 2 Mb and genomic completeness ≥91%, as well as genome annotation with 9,668, 10,240, 9,821 high-confidence gene models predicted for BD09, BD08, and Dachan, respectively. Comparative genomics study unravels that metabolic pathways, which are involved in nitrogen and phosphorus assimilation, were enriched in the faster-growing strains. We found that gene structural variation and genomic rearrangement might contribute to differential capabilities in wastewater tolerance among the strains, as indicated by gene copy number variation, domain reshuffling of orthologs involved, as well as a ~1 Mb-length chromosomal inversion we observed in BD08 and Dachan. In addition, we speculated that an associated bacterium, Microbacterium chocolatum, which was identified within Dachan, play a possible role in synergizing nutrient removal. Our three newly sequenced Chlorella genomes provide a fundamental foundation to understand the molecular basis of abiotic stress tolerance in wastewater treatment, which is essential for future genetic engineering and strain improvement.


Chlorella/genetics , Genome, Plant , Wastewater/chemistry , Algal Proteins/genetics , Algal Proteins/metabolism , Chlorella/classification , Chlorella/drug effects , Chlorella/growth & development , Comparative Genomic Hybridization , DNA Copy Number Variations , DNA, Algal/chemistry , DNA, Algal/genetics , DNA, Algal/metabolism , Nitrogen/metabolism , Phosphorus/metabolism , Phylogeny , Sequence Analysis, DNA , Wastewater/toxicity
19.
Nucleic Acids Res ; 47(D1): D322-D329, 2019 01 08.
Article En | MEDLINE | ID: mdl-30476229

Eukaryotic nucleic acid binding protein database (ENPD, http://qinlab.sls.cuhk.edu.hk/ENPD/) is a library of nucleic acid binding proteins (NBPs) and their functional information. NBPs such as DNA binding proteins (DBPs), RNA binding proteins (RBPs), and DNA and RNA binding proteins (DRBPs) are involved in every stage of gene regulation through their interactions with DNA and RNA. Due to the importance of NBPs, the database was constructed based on manual curation and a newly developed pipeline utilizing both sequenced transcriptomes and genomes. In total the database has recorded 2.8 million of NBPs and their binding motifs from 662 NBP families and 2423 species, constituting the largest NBP database. ENPD covers evolutionarily important lineages which have never been included in the previous NBP databases, while lineage-specific NBP family expansions were also found. ENPD also focuses on the involvements of DBPs, RBPs and DRBPs in non-coding RNA (ncRNA) mediated gene regulation. The predicted and experimentally validated targets of NBPs have both been recorded and manually curated in ENPD, linking the interactions between ncRNAs, DNA regulatory elements and NBPs in gene regulation. This database provides key resources for the scientific community, laying a solid foundation for future gene regulatory studies from both functional and evolutionary perspectives.


DNA-Binding Proteins/genetics , Databases, Genetic , Eukaryota/genetics , Gene Expression Regulation , RNA-Binding Proteins/genetics , Amino Acid Motifs/genetics , Animals , Binding Sites/genetics , Data Curation , Humans , Protein Binding , Proteome , RNA, Untranslated/metabolism , Transcriptome
20.
BMC Bioinformatics ; 19(Suppl 19): 516, 2018 Dec 31.
Article En | MEDLINE | ID: mdl-30598069

BACKGROUND: Finding peptides with high binding affinity to Class I major histocompatibility complex (MHC-I) attracts intensive research, and it serves a crucial part of developing a better vaccine for precision medicine. Traditional methods cost highly for designing such peptides. The advancement of computational approaches reduces the cost of new drug discovery dramatically. Compared with flourishing computational drug discovery area, the immunology area lacks tools focused on in silico design for the peptides with high binding affinity. Attributed to the ever-expanding amount of MHC-peptides binding data, it enables the tremendous influx of deep learning techniques for modeling MHC-peptides binding. To leverage the availability of these data, it is of great significance to find MHC-peptides binding specificities. The binding motifs are one of the key components to decide the MHC-peptides combination, which generally refer to a combination of some certain amino acids at certain sites which highly contribute to the binding affinity. RESULT: In this work, we propose the Motif Activation Mapping (MAM) network for MHC-I and peptides binding to extract motifs from peptides. Then, we substitute amino acid randomly according to the motifs for generating peptides with high affinity. We demonstrated the MAM network could extract motifs which are the features of peptides of highly binding affinities, as well as generate peptides with high-affinities; that is, 0.859 for HLA-A*0201, 0.75 for HLA-A*0206, 0.92 for HLA-B*2702, 0.9 for HLA-A*6802 and 0.839 for Mamu-A1*001:01. Besides, its binding prediction result reaches the state of the art. The experiment also reveals the network is appropriate for most MHC-I with transfer learning. CONCLUSIONS: We design the MAM network to extract the motifs from MHC-peptides binding through prediction, which are proved to generate the peptides with high binding affinity successfully. The new peptides preserve the motifs but vary in sequences.


Algorithms , Amino Acid Motifs , Computer Simulation , Histocompatibility Antigens Class I/metabolism , Oligopeptides/metabolism , Peptide Fragments/metabolism , Sequence Analysis, Protein/methods , Alleles , Histocompatibility Antigens Class I/immunology , Humans , Oligopeptides/immunology , Peptide Fragments/immunology , Protein Binding
...