Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 54
Filter
1.
Genes (Basel) ; 14(11)2023 Oct 26.
Article in English | MEDLINE | ID: mdl-38002941

ABSTRACT

Phelan-McDermid syndrome (PMS) is a rare genetic neurodevelopmental disorder caused by 22q13 region deletions or SHANK3 gene variants. Deletions vary in size and can affect other genes in addition to SHANK3. PMS is characterized by autism spectrum disorder (ASD), intellectual disability (ID), developmental delays, seizures, speech delay, hypotonia, and minor dysmorphic features. It is challenging to determine individual gene contributions due to variability in deletion sizes and clinical features. We implemented a genomic data mining approach for identifying and prioritizing the candidate genes in the 22q13 region for five phenotypes: ASD, ID, seizures, language impairment, and hypotonia. Weighted gene co-expression networks were constructed using the BrainSpan transcriptome dataset of a human brain. Bioinformatic analyses of the co-expression modules allowed us to select specific candidate genes, including EP300, TCF20, RBX1, XPNPEP3, PMM1, SCO2, BRD1, and SHANK3, for the common neurological phenotypes of PMS. The findings help understand the disease mechanisms and may provide novel therapeutic targets for the precise treatment of PMS.


Subject(s)
Autism Spectrum Disorder , Autistic Disorder , Intellectual Disability , Language Development Disorders , Humans , Autism Spectrum Disorder/genetics , Muscle Hypotonia/genetics , Intellectual Disability/genetics , Nerve Tissue Proteins/genetics , Brain , Language Development Disorders/genetics , Seizures , Transcription Factors
2.
Genes (Basel) ; 14(2)2023 02 03.
Article in English | MEDLINE | ID: mdl-36833327

ABSTRACT

Calcium channels are an integral component in maintaining cellular function. Alterations may lead to channelopathies, primarily manifested in the central nervous system. This study describes the clinical and genetic features of a unique 12-year-old boy harboring two congenital calcium channelopathies, involving the CACNA1A and CACNA1F genes, and provides an unadulterated view of the natural history of sporadic hemiplegic migraine type 1 (SHM1) due to the patient's inability to tolerate any preventative medication. The patient presents with episodes of vomiting, hemiplegia, cerebral edema, seizure, fever, transient blindness, and encephalopathy. He is nonverbal, nonambulatory, and forced to have a very limited diet due to abnormal immune responses. The SHM1 manifestations apparent in the subject are consistent with the phenotype described in the 48 patients identified as part of a systematic literature review. The ocular symptoms of CACNA1F align with the family history of the subject. The presence of multiple pathogenic variants make it difficult to identify a clear phenotype-genotype correlation in the present case. Moreover, the detailed case description and natural history along with the comprehensive review of the literature contribute to the understanding of this complex disorder and point to the need for comprehensive clinical assessments of SHM1.


Subject(s)
Channelopathies , Migraine with Aura , Male , Humans , Calcium , Channelopathies/genetics , Migraine with Aura/complications , Migraine with Aura/genetics , Central Nervous System , Calcium Channels , Calcium Channels, L-Type
3.
Genes (Basel) ; 13(8)2022 08 20.
Article in English | MEDLINE | ID: mdl-36011399

ABSTRACT

In the nervous system, synapses are special and pervasive structures between axonal and dendritic terminals, which facilitate electrical and chemical communications among neurons. Extensive studies have been conducted in mice and rats to explore the RNA pool at synapses and investigate RNA transport, local protein synthesis, and synaptic plasticity. However, owing to the experimental difficulties of studying human synaptic transcriptomes, the full pool of human synaptic RNAs remains largely unclear. We developed a new machine learning method, called PredSynRNA, to predict the synaptic localization of human RNAs. Training instances of dendritically localized RNAs were compiled from previous rodent studies, overcoming the shortage of empirical instances of human synaptic RNAs. Using RNA sequence and gene expression data as features, various models with different learning algorithms were constructed and evaluated. Strikingly, the models using the developmental brain gene expression features achieved superior performance for predicting synaptically localized RNAs. We examined the relevant expression features learned by PredSynRNA and used an independent test dataset to further validate the model performance. PredSynRNA models were then applied to the prediction and prioritization of candidate RNAs localized to human synapses, providing valuable targets for experimental investigations into neuronal mechanisms and brain disorders.


Subject(s)
Neurons , Synapses , Animals , Brain/metabolism , Humans , Mice , Neurons/metabolism , Protein Biosynthesis , RNA/genetics , RNA/metabolism , Rats , Synapses/genetics
4.
Bioinformatics ; 37(3): 396-403, 2021 04 20.
Article in English | MEDLINE | ID: mdl-32790840

ABSTRACT

MOTIVATION: Essential genes are required for the reproductive success at either cellular or organismal level. The identification of essential genes is important for understanding the core biological processes and identifying effective therapeutic drug targets. However, experimental identification of essential genes is costly, time consuming and labor intensive. Although several machine learning models have been developed to predict essential genes, these models are not readily applicable to lncRNAs. Moreover, the currently available models cannot be used to predict essential genes in a specific cancer type. RESULTS: In this study, we have developed a new machine learning approach, XGEP (eXpression-based Gene Essentiality Prediction), to predict essential genes and candidate lncRNAs in cancer cells. The novelty of XGEP lies in the utilization of relevant features derived from the TCGA transcriptome dataset through collaborative embedding. When evaluated on the pan-cancer dataset, XGEP was able to accurately predict human essential genes and achieve significantly higher performance than previous models. Notably, several candidate lncRNAs selected by XGEP are reported to promote cell proliferation and inhibit cell apoptosis. Moreover, XGEP also demonstrated superior performance on cancer-type-specific datasets to identify essential genes. The comprehensive lists of candidate essential genes in specific cancer types may be used to guide experimental characterization and facilitate the discovery of drug targets for cancer therapy. AVAILABILITY AND IMPLEMENTATION: The source code and datasets used in this study are freely available at https://github.com/BioDataLearning/XGEP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Neoplasms , RNA, Long Noncoding , Genes, Essential , Humans , Machine Learning , Neoplasms/genetics , RNA, Long Noncoding/genetics , Software , Transcriptome
5.
J Comput Biol ; 28(2): 133-145, 2021 02.
Article in English | MEDLINE | ID: mdl-33232622

ABSTRACT

The three-dimensional (3D) organization of the human genome is of crucial importance for gene regulation, and the CCCTC-binding factor (CTCF) plays an important role in chromatin interactions. However, it is still unclear what sequence patterns in addition to CTCF motif pairs determine chromatin loop formation. To discover the underlying sequence patterns, we have developed a deep learning model, called DeepCTCFLoop, to predict whether a chromatin loop can be formed between a pair of convergent or tandem CTCF motifs using only the DNA sequences of the motifs and their flanking regions. Our results suggest that DeepCTCFLoop can accurately distinguish the CTCF motif pairs forming chromatin loops from the ones not forming loops. It significantly outperforms CTCF-MP, a machine learning model based on word2vec and boosted trees, when using DNA sequences only. Furthermore, we show that DNA motifs binding to several transcription factors, including ZNF384, ZNF263, ASCL1, SP1, and ZEB1, may constitute the complex sequence patterns for CTCF-mediated chromatin loop formation. DeepCTCFLoop has also been applied to disease-associated sequence variants to identify candidates that may disrupt chromatin loop formation. Therefore, our results provide useful information for understanding the mechanism of 3D genome organization and may also help annotate and prioritize the noncoding sequence variants associated with human diseases.


Subject(s)
CCCTC-Binding Factor/metabolism , Chromatin/genetics , Computational Biology/methods , DNA/chemistry , DNA/metabolism , Binding Sites , CCCTC-Binding Factor/chemistry , Cell Line , Chromatin/metabolism , Deep Learning , Genetic Predisposition to Disease , HeLa Cells , Humans , K562 Cells , Nucleotide Motifs , Sequence Analysis, DNA , Transcription Factors/chemistry , Transcription Factors/metabolism
6.
BMC Bioinformatics ; 21(1): 505, 2020 Nov 07.
Article in English | MEDLINE | ID: mdl-33160303

ABSTRACT

BACKGROUND: Autism spectrum disorders (ASD) refer to a range of neurodevelopmental conditions, which are genetically complex and heterogeneous with most of the genetic risk factors also found in the unaffected general population. Although all the currently known ASD risk genes code for proteins, long non-coding RNAs (lncRNAs) as essential regulators of gene expression have been implicated in ASD. Some lncRNAs show altered expression levels in autistic brains, but their roles in ASD pathogenesis are still unclear. RESULTS: In this study, we have developed a new machine learning approach to predict candidate lncRNAs associated with ASD. Particularly, the knowledge learnt from protein-coding ASD risk genes was transferred to the prediction and prioritization of ASD-associated lncRNAs. Both developmental brain gene expression data and transcript sequence were found to contain relevant information for ASD risk gene prediction. During the pre-training phase of model construction, an autoencoder network was implemented for a representation learning of the gene expression data, and a random-forest-based feature selection was applied to the transcript-sequence-derived k-mers. Our models, including logistic regression, support vector machine and random forest, showed robust performance based on tenfold cross-validations as well as candidate prioritization with hypothetical loci. We then utilized the models to predict and prioritize a list of candidate lncRNAs, including some reported to be cis-regulators of known ASD risk genes, for further investigation. CONCLUSIONS: Our results suggest that ASD risk genes can be accurately predicted using developmental brain gene expression data and transcript sequence features, and the models may provide useful information for functional characterization of the candidate lncRNAs associated with ASD.


Subject(s)
Autism Spectrum Disorder/genetics , Machine Learning , RNA, Long Noncoding/metabolism , Autism Spectrum Disorder/pathology , Brain/growth & development , Brain/metabolism , Humans , RNA, Long Noncoding/genetics , Risk , Transcriptome
7.
NAR Genom Bioinform ; 2(1): lqaa007, 2020 Mar.
Article in English | MEDLINE | ID: mdl-33575554

ABSTRACT

N6-adenosine methylation (m6A) is the most abundant internal RNA modification in eukaryotes, and affects RNA metabolism and non-coding RNA function. Previous studies suggest that m6A modifications in mammals occur on the consensus sequence DRACH (D = A/G/U, R = A/G, H = A/C/U). However, only about 10% of such adenosines can be m6A-methylated, and the underlying sequence determinants are still unclear. Notably, the regulation of m6A modifications can be cell-type-specific. In this study, we have developed a deep learning model, called TDm6A, to predict RNA m6A modifications in human cells. For cell types with limited availability of m6A data, transfer learning may be used to enhance TDm6A model performance. We show that TDm6A can learn common and cell-type-specific motifs, some of which are associated with RNA-binding proteins previously reported to be m6A readers or anti-readers. In addition, we have used TDm6A to predict m6A sites on human long non-coding RNAs (lncRNAs) for selection of candidates with high levels of m6A modifications. The results provide new insights into m6A modifications on human protein-coding and non-coding transcripts.

8.
NAR Genom Bioinform ; 2(2): lqaa031, 2020 Jun.
Article in English | MEDLINE | ID: mdl-33575587

ABSTRACT

CCCTC-binding factor (CTCF) is a key regulator of 3D genome organization and gene expression. Recent studies suggest that RNA transcripts, mostly long non-coding RNAs (lncRNAs), can serve as locus-specific factors to bind and recruit CTCF to the chromatin. However, it remains unclear whether specific sequence patterns are shared by the CTCF-binding RNA sites, and no RNA motif has been reported so far for CTCF binding. In this study, we have developed DeepLncCTCF, a new deep learning model based on a convolutional neural network and a bidirectional long short-term memory network, to discover the RNA recognition patterns of CTCF and identify candidate lncRNAs binding to CTCF. When evaluated on two different datasets, human U2OS dataset and mouse ESC dataset, DeepLncCTCF was shown to be able to accurately predict CTCF-binding RNA sites from nucleotide sequence. By examining the sequence features learned by DeepLncCTCF, we discovered a novel RNA motif with the consensus sequence, AGAUNGGA, for potential CTCF binding in humans. Furthermore, the applicability of DeepLncCTCF was demonstrated by identifying nearly 5000 candidate lncRNAs that might bind to CTCF in the nucleus. Our results provide useful information for understanding the molecular mechanisms of CTCF function in 3D genome organization.

9.
J Zhejiang Univ Sci B ; 20(6): 476-487, 2019 Jun.
Article in English | MEDLINE | ID: mdl-31090273

ABSTRACT

Life may have begun in an RNA world, which is supported by increasing evidence of the vital role that RNAs perform in biological systems. In the human genome, most genes actually do not encode proteins; they are noncoding RNA genes. The largest class of noncoding genes is known as long noncoding RNAs (lncRNAs), which are transcripts greater in length than 200 nucleotides, but with no protein-coding capacity. While some lncRNAs have been demonstrated to be key regulators of gene expression and 3D genome organization, most lncRNAs are still uncharacterized. We thus propose several data mining and machine learning approaches for the functional annotation of human lncRNAs by leveraging the vast amount of data from genetic and genomic studies. Recent results from our studies and those of other groups indicate that genomic data mining can give insights into lncRNA functions and provide valuable information for experimental studies of candidate lncRNAs associated with human disease.


Subject(s)
Data Mining , Genomics , RNA, Long Noncoding/physiology , Autism Spectrum Disorder/genetics , Humans , Machine Learning , RNA, Long Noncoding/analysis , Support Vector Machine
10.
Bioinformatics ; 35(24): 5235-5242, 2019 12 15.
Article in English | MEDLINE | ID: mdl-31077303

ABSTRACT

MOTIVATION: Circular RNAs (circRNAs) are a new class of endogenous RNAs in animals and plants. During pre-RNA splicing, the 5' and 3' termini of exon(s) can be covalently ligated to form circRNAs through back-splicing (head-to-tail splicing). CircRNAs can be conserved across species, show tissue- and developmental stage-specific expression patterns, and may be associated with human disease. However, the mechanism of circRNA formation is still unclear although some sequence features have been shown to affect back-splicing. RESULTS: In this study, by applying the state-of-art machine learning techniques, we have developed the first deep learning model, DeepCirCode, to predict back-splicing for human circRNA formation. DeepCirCode utilizes a convolutional neural network (CNN) with nucleotide sequence as the input, and shows superior performance over conventional machine learning algorithms such as support vector machine and random forest. Relevant features learnt by DeepCirCode are represented as sequence motifs, some of which match human known motifs involved in RNA splicing, transcription or translation. Analysis of these motifs shows that their distribution in RNA sequences can be important for back-splicing. Moreover, some of the human motifs appear to be conserved in mouse and fruit fly. The findings provide new insight into the back-splicing code for circRNA formation. AVAILABILITY AND IMPLEMENTATION: All the datasets and source code for model construction are available at https://github.com/BioDataLearning/DeepCirCode. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Deep Learning , Animals , Mice , RNA Splicing , RNA, Circular , Sequence Analysis, RNA
11.
Elife ; 82019 01 10.
Article in English | MEDLINE | ID: mdl-30628890

ABSTRACT

Long noncoding RNAs (lncRNAs) have been shown to act as important cell biological regulators including cell fate decisions but are often ignored in human genetics. Combining differential lncRNA expression during neuronal lineage induction with copy number variation morbidity maps of a cohort of children with autism spectrum disorder/intellectual disability versus healthy controls revealed focal genomic mutations affecting several lncRNA candidate loci. Here we find that a t(5:12) chromosomal translocation in a family manifesting neurodevelopmental symptoms disrupts specifically lnc-NR2F1. We further show that lnc-NR2F1 is an evolutionarily conserved lncRNA functionally enhances induced neuronal cell maturation and directly occupies and regulates transcription of neuronal genes including autism-associated genes. Thus, integrating human genetics and functional testing in neuronal lineage induction is a promising approach for discovering candidate lncRNAs involved in neurodevelopmental diseases.


Subject(s)
Autism Spectrum Disorder/genetics , Cell Differentiation/genetics , Mutation , Neurodevelopmental Disorders/genetics , Neurons/metabolism , RNA, Long Noncoding/genetics , Autism Spectrum Disorder/pathology , Child , Chromosomes, Human, Pair 12/genetics , Chromosomes, Human, Pair 5/genetics , DNA Copy Number Variations , Female , Gene Expression Profiling/methods , Humans , Male , Neurodevelopmental Disorders/pathology , Neurogenesis/genetics , Neurons/cytology , Pedigree , Translocation, Genetic/genetics
12.
BMC Syst Biol ; 12(Suppl 7): 91, 2018 12 14.
Article in English | MEDLINE | ID: mdl-30547845

ABSTRACT

BACKGROUND: Autism Spectrum Disorder (ASD) is the umbrella term for a group of neurodevelopmental disorders convergent on behavioral phenotypes. While many genes have been implicated in the disorder, the predominant focus of previous research has been on protein coding genes. This leaves a vast number of long non-coding RNAs (lncRNAs) not characterized for their role in the disorder although lncRNAs have been shown to play important roles in development and are highly represented in the brain. Studies have also shown lncRNAs to be differentially expressed in ASD affected brains. However, there has yet to be an enrichment analysis of the shared ontologies and pathways of known ASD genes and lncRNAs in normal brain development. RESULTS: In this study, we performed co-expression network analysis on the developing brain transcriptome to identify potential lncRNAs associated with ASD and possible annotations for functional role of lncRNAs in brain development. We found co-enrichment of lncRNA genes and ASD risk genes in two distinct groups of modules showing elevated prenatal and postnatal expression patterns, respectively. Further enrichment analysis of the module groups indicated that the early expression modules were comprised mainly of transcriptional regulators while the later expression modules were associated with synapse formation. Finally, lncRNAs were prioritized for their connectivity with the known ASD risk genes through analysis of an adjacency matrix. Collectively, the results imply early developmental repression of synaptic genes through lncRNAs and ASD transcriptional regulators. CONCLUSION: Here we demonstrate the utility of mining the publically available brain gene expression data to further functionally annotate the role of lncRNAs in ASD. Our analysis indicates that lncRNAs potentially have a key role in ASD due to their convergence on shared pathways, and we identify lncRNAs of interest that may lead to further avenues of study.


Subject(s)
Autistic Disorder/genetics , Brain/growth & development , Brain/metabolism , Gene Expression Regulation , Genetic Predisposition to Disease/genetics , RNA, Long Noncoding/genetics , Gene Regulatory Networks , Humans , Synapses/genetics , Transcription, Genetic
13.
Sci Rep ; 8(1): 16385, 2018 11 06.
Article in English | MEDLINE | ID: mdl-30401954

ABSTRACT

Long non-coding RNAs are involved in biological processes throughout the cell including the nucleus, chromatin and cytosol. However, most lncRNAs remain unannotated and functional annotation of lncRNAs is difficult due to their low conservation and their tissue and developmentally specific expression. LncRNA subcellular localization is highly informative regarding its biological function, although it is difficult to discover because few prediction methods currently exist. While protein subcellular localization prediction is a well-established research field, lncRNA localization prediction is a novel research problem. We developed DeepLncRNA, a deep learning algorithm which predicts lncRNA subcellular localization directly from lncRNA transcript sequences. We analyzed 93 strand-specific RNA-seq samples of nuclear and cytosolic fractions from multiple cell types to identify differentially localized lncRNAs. We then extracted sequence-based features from the lncRNAs to construct our DeepLncRNA model, which achieved an accuracy of 72.4%, sensitivity of 83%, specificity of 62.4% and area under the receiver operating characteristic curve of 0.787. Our results suggest that primary sequence motifs are a major driving force in the subcellular localization of lncRNAs.


Subject(s)
Computational Biology/methods , Deep Learning , Intracellular Space/metabolism , RNA, Long Noncoding/genetics , RNA, Long Noncoding/metabolism , Sequence Analysis, RNA , Biological Transport , Cell Line , Humans , Neural Networks, Computer
14.
Biomed Pharmacother ; 103: 111-117, 2018 Jul.
Article in English | MEDLINE | ID: mdl-29635123

ABSTRACT

MICA and MICB are stress-induced molecules recognized by NKG2D, one of major activation receptors of natural killer (NK) cells. Upon binding to NKG2D, NKG2D-mediated cytolytic immune response of immune effector cells will be activated against virally infected and tumor cells expressing MICA. In the early oncogenic development, membrane-bound MICA serves as a key signal to recruit anti-tumor immune effectors. Nevertheless, both MICA polymorphic features and its dysregulated expression in evolving tumors have resulted in tumor evasion in various cancer types. Therefore, in order to reconstitute tumor immunosurveilance, it is of great significance that we understand MICA genetics, polymorphisms, mechanisms of MICA-associated tumor escape and molecular/cellular modulation of MICA. In this review, the MICA-associated co-expression networks involving microRNAs (miRNAs) and novel candidate long non-coding RNAs (lncRNAs) were also discussed. Given the current importance in the study of MICA gene, this review paper focuses on the role of MICA in different cancer types, and strategies that we manipulate MICA regulation against tumor proliferation.


Subject(s)
Histocompatibility Antigens Class I/genetics , Neoplasms/genetics , Neoplasms/therapy , Polymorphism, Genetic , Genetic Predisposition to Disease , Humans , MicroRNAs/genetics , MicroRNAs/metabolism , RNA, Long Noncoding/genetics , RNA, Long Noncoding/metabolism
15.
Plant Physiol ; 176(4): 3062-3080, 2018 04.
Article in English | MEDLINE | ID: mdl-29463771

ABSTRACT

Protein kinases play fundamental roles in plant development and environmental stress responses. Here, we identified the STRESS INDUCED FACTOR (SIF) gene family, which encodes four leucine-rich repeat receptor-like protein kinases in Arabidopsis (Arabidopsis thaliana). The four genes, SIF1 to SIF4, are clustered in the genome and highly conserved, but they have temporally and spatially distinct expression patterns. We employed Arabidopsis SIF knockout mutants and overexpression transgenics to examine SIF involvement during plant pathogen defense. SIF genes are rapidly induced by biotic or abiotic stresses, and SIF proteins localize to the plasma membrane. Simultaneous knockout of SIF1 and SIF2 led to improved plant salt tolerance, whereas SIF2 overexpression enhanced PAMP-triggered immunity and prompted basal plant defenses, significantly improving pathogen resistance. Furthermore, SIF2 overexpression plants exhibited up-regulated expression of the defense-related genes WRKY53 and flg22-INDUCED RECEPTOR-LIKE KINASE1 as well as enhanced MPK3/MPK6 phosphorylation upon pathogen and elicitor treatments. The expression of the calcium signaling-related gene PHOSPHATE-INDUCED1 also was enhanced in the SIF2-overexpressing lines upon pathogen inoculation but repressed in the sif2 mutants. Bimolecular fluorescence complementation demonstrates that the BRI1-ASSOCIATED RECEPTOR KINASE1 protein is a coreceptor of the SIF2 kinase in the signal transduction pathway during pathogen invasion. These findings characterize a stress-responsive protein kinase family and illustrate how SIF2 modulates signal transduction for effective plant pathogenic defense.


Subject(s)
Arabidopsis Proteins/genetics , Arabidopsis/genetics , Gene Expression Regulation, Plant , Plant Diseases/genetics , Protein Kinases/genetics , Amino Acid Sequence , Arabidopsis/metabolism , Arabidopsis/microbiology , Arabidopsis Proteins/metabolism , DNA-Binding Proteins/genetics , DNA-Binding Proteins/metabolism , Disease Resistance/genetics , Phylogeny , Plant Diseases/microbiology , Plants, Genetically Modified , Protein Kinases/classification , Protein Kinases/metabolism , Pseudomonas syringae/physiology , Sequence Homology, Amino Acid , Signal Transduction/genetics , Stress, Physiological
16.
DNA Repair (Amst) ; 57: 107-115, 2017 09.
Article in English | MEDLINE | ID: mdl-28719838

ABSTRACT

A recent phylogenetic study on UDG superfamily estimated a new clade of family 3 enzymes (SMUG1-like), which shares a lower homology with canonic SMUG1 enzymes. The enzymatic properties of the newly found putative DNA glycosylase are unknown. To test the potential UDG activity and evaluate phylogenetic classification, we isolated one SMUG1-like glycosylase representative from Listeria innocua (Lin). A biochemical screening of DNA glycosylase activity in vitro indicates that Lin SMUG1-like glycosylase is a single-strand selective uracil DNA glycosylase. The UDG activity on DNA bubble structures provides clue to its physiological significance in vivo. Mutagenesis and molecular modeling analyses reveal that Lin SMUG1-like glycosylase has similar functional motifs with SMUG1 enzymes; however, it contains a distinct catalytic doublet S67-S68 in motif 1 that is not found in any families in the UDG superfamily. Experimental investigation shows that the S67M-S68N double mutant is catalytically more active than either S67M or S68N single mutant. Coupled with mutual information analysis, the results indicate a high degree of correlation in the evolution of SMUG1-like enzymes. This study underscores the functional and catalytic diversity in the evolution of enzymes in UDG superfamily.


Subject(s)
DNA Repair , DNA, Single-Stranded/metabolism , Listeria/enzymology , Uracil-DNA Glycosidase/metabolism , Amino Acid Sequence , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Catalytic Domain , DNA, Bacterial/metabolism , Listeria/genetics , Models, Molecular , Mutagenesis, Site-Directed , Phylogeny , Sequence Alignment , Sequence Homology, Amino Acid , Uracil-DNA Glycosidase/chemistry , Uracil-DNA Glycosidase/genetics
17.
PLoS One ; 12(5): e0178532, 2017.
Article in English | MEDLINE | ID: mdl-28562671

ABSTRACT

Genetic studies have identified many risk loci for autism spectrum disorder (ASD) although causal factors in the majority of cases are still unknown. Currently, known ASD risk genes are all protein-coding genes; however, the vast majority of transcripts in humans are non-coding RNAs (ncRNAs) which do not encode proteins. Recently, long non-coding RNAs (lncRNAs) were shown to be highly expressed in the human brain and crucial for normal brain development. We have constructed a computational pipeline for the integration of various genomic datasets to identify lncRNAs associated with ASD. This pipeline utilizes differential gene expression patterns in affected tissues in conjunction with gene co-expression networks in tissue-matched non-affected samples. We analyzed RNA-seq data from the cortical brain tissues from ASD cases and controls to identify lncRNAs differentially expressed in ASD. We derived a gene co-expression network from an independent human brain developmental transcriptome and detected a convergence of the differentially expressed lncRNAs and known ASD risk genes into specific co-expression modules. Co-expression network analysis facilitates the discovery of associations between previously uncharacterized lncRNAs with known ASD risk genes, affected molecular pathways and at-risk developmental time points. In addition, we show that some of these lncRNAs have a high degree of overlap with major CNVs detected in ASD genetic studies. By utilizing this integrative approach comprised of differential expression analysis in affected tissues and connectivity metrics from a developmental co-expression network, we have prioritized a set of candidate ASD-associated lncRNAs. The identification of lncRNAs as novel ASD susceptibility genes could help explain the genetic pathogenesis of ASD.


Subject(s)
Autistic Disorder/genetics , RNA, Long Noncoding/genetics , DNA Copy Number Variations , Gene Expression Profiling , Humans , Sequence Analysis, RNA
18.
Genes Genomics ; 39(5): 521-532, 2017.
Article in English | MEDLINE | ID: mdl-28458780

ABSTRACT

Olfaction is essential for fish to detect odorant elements in the environment and plays a critical role in navigating, locating food and detecting predators. Olfactory function is produced by the olfactory transduction pathway and is activated by olfactory receptors (ORs) through the binding of odorant elements. Recently, four types of olfactory receptors have been identified in vertebrate olfactory epithelium, including main odorant receptors (MORs), vomeronasal type receptors (VRs), trace-amine associated receptors (TAARs) and formyl peptide receptors (FPRs). It has been hypothesized that migratory fish, which have the ability to perform spawning migration, use olfactory cues to return to natal rivers. Therefore, obtaining OR genes from migratory fish will provide a resource for the study of molecular mechanisms that underlie fish spawning migration behaviors. Previous studies of OR genes have mainly focused on genomic data, however little information has been gained at the transcript level. In this study, we identified the OR genes of an economically important commercial fish Coilia nasus through searching for olfactory epithelium transcriptomes. A total of 142 candidate MOR, 52 V2R/OlfC, 32 TAAR and two FPR putative genes were identified. In addition, through genomic analysis we identified several MOR genes containing introns, which is unusual for vertebrate MOR genes. The transcriptome-scale mining strategy proved to be fruitful in identifying large sets of OR genes from species whose genome information is unavailable. Our findings lay the foundation for further research into the possible molecular mechanisms underlying the spawning migration behavior in C. nasus.

19.
Sci Rep ; 7: 45978, 2017 04 11.
Article in English | MEDLINE | ID: mdl-28397787

ABSTRACT

Enzymes in Uracil DNA glycosylase (UDG) superfamily are essential for the removal of uracil. Family 4 UDGa is a robust uracil DNA glycosylase that only acts on double-stranded and single-stranded uracil-containing DNA. Based on mutational, kinetic and modeling analyses, a catalytic mechanism involving leaving group stabilization by H155 in motif 2 and water coordination by N89 in motif 3 is proposed. Mutual Information analysis identifies a complexed correlated mutation network including a strong correlation in the EG doublet in motif 1 of family 4 UDGa and in the QD doublet in motif 1 of family 1 UNG. Conversion of EG doublet in family 4 Thermus thermophilus UDGa to QD doublet increases the catalytic efficiency by over one hundred-fold and seventeen-fold over the E41Q and G42D single mutation, respectively, rectifying the strong correlation in the doublet. Molecular dynamics simulations suggest that the correlated mutations in the doublet in motif 1 position the catalytic H155 in motif 2 to stabilize the leaving uracilate anion. The integrated approach has important implications in studying enzyme evolution and protein structure and function.


Subject(s)
Biocatalysis , Biological Evolution , Multigene Family , Mutation/genetics , Thermus thermophilus/enzymology , Uracil-DNA Glycosidase/genetics , Amino Acid Sequence , Amino Acid Substitution , Binding Sites , Kinetics , Models, Molecular , Mutant Proteins/chemistry , Mutant Proteins/metabolism , Sequence Alignment , Substrate Specificity , Uracil/metabolism , Uracil-DNA Glycosidase/chemistry
20.
Mar Biol ; 163: 126, 2016.
Article in English | MEDLINE | ID: mdl-27340293

ABSTRACT

A number of studies have suggested that olfaction plays an important role in fish migration. Fish use several distinct families of olfactory receptors to detect environmental odorants, including MORs (main olfactory receptors), V1Rs (vomeronasal type-1 receptors), V2Rs (vomeronasal type-2 receptors), TAARs (trace amine-associated receptors), and FPRs (formyl peptide receptors). The V1Rs have been reported to detect pheromones, and a pheromone hypothesis for the spawning migration of anadromous fish has been proposed. Examining whether Coilia nasus relies on V1R-mediated olfaction for spawning migration is important for understanding the molecular basis of spawning migration behavior. Here, we explored the V1R gene family in anadromous C. nasus. Six V1R genes previously reported in other teleost fish were successfully identified. Interestingly, we detected the largest V1R repertoire in teleost fish from C. nasus and identified a species-specific expansion event of V1R3 gene that has previously been detected as single-copy genes in other teleost fish. The V1R loci were found to be populated with repetitive sequences, especially in the expanded V1R3 genes. Additionally, the divergence of V1R3 genetic structures in different populations of C. nasus indicates the copy number variation (CNV) in V1R3 gene among individuals of C. nasus. Most of the putative C. nasus V1R genes were expressed primarily in the olfactory epithelium, consistent with the role of the gene products as functional olfactory receptors. Significant differences in the expression levels of V1R genes were detected between the anadromous and non-anadromous C. nasus. This study represents a first step in the elucidation of the olfactory communication system of C. nasus at the molecular level. Our results indicate that some V1R genes may be involved in the spawning migration of C. nasus, and the study provides new insights into the spawning migration and genome evolution of C. nasus.

SELECTION OF CITATIONS
SEARCH DETAIL
...