Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 54
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Bioinformatics ; 37(3): 396-403, 2021 04 20.
Artículo en Inglés | MEDLINE | ID: mdl-32790840

RESUMEN

MOTIVATION: Essential genes are required for the reproductive success at either cellular or organismal level. The identification of essential genes is important for understanding the core biological processes and identifying effective therapeutic drug targets. However, experimental identification of essential genes is costly, time consuming and labor intensive. Although several machine learning models have been developed to predict essential genes, these models are not readily applicable to lncRNAs. Moreover, the currently available models cannot be used to predict essential genes in a specific cancer type. RESULTS: In this study, we have developed a new machine learning approach, XGEP (eXpression-based Gene Essentiality Prediction), to predict essential genes and candidate lncRNAs in cancer cells. The novelty of XGEP lies in the utilization of relevant features derived from the TCGA transcriptome dataset through collaborative embedding. When evaluated on the pan-cancer dataset, XGEP was able to accurately predict human essential genes and achieve significantly higher performance than previous models. Notably, several candidate lncRNAs selected by XGEP are reported to promote cell proliferation and inhibit cell apoptosis. Moreover, XGEP also demonstrated superior performance on cancer-type-specific datasets to identify essential genes. The comprehensive lists of candidate essential genes in specific cancer types may be used to guide experimental characterization and facilitate the discovery of drug targets for cancer therapy. AVAILABILITY AND IMPLEMENTATION: The source code and datasets used in this study are freely available at https://github.com/BioDataLearning/XGEP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Neoplasias , ARN Largo no Codificante , Genes Esenciales , Humanos , Aprendizaje Automático , Neoplasias/genética , ARN Largo no Codificante/genética , Programas Informáticos , Transcriptoma
2.
BMC Bioinformatics ; 21(1): 505, 2020 Nov 07.
Artículo en Inglés | MEDLINE | ID: mdl-33160303

RESUMEN

BACKGROUND: Autism spectrum disorders (ASD) refer to a range of neurodevelopmental conditions, which are genetically complex and heterogeneous with most of the genetic risk factors also found in the unaffected general population. Although all the currently known ASD risk genes code for proteins, long non-coding RNAs (lncRNAs) as essential regulators of gene expression have been implicated in ASD. Some lncRNAs show altered expression levels in autistic brains, but their roles in ASD pathogenesis are still unclear. RESULTS: In this study, we have developed a new machine learning approach to predict candidate lncRNAs associated with ASD. Particularly, the knowledge learnt from protein-coding ASD risk genes was transferred to the prediction and prioritization of ASD-associated lncRNAs. Both developmental brain gene expression data and transcript sequence were found to contain relevant information for ASD risk gene prediction. During the pre-training phase of model construction, an autoencoder network was implemented for a representation learning of the gene expression data, and a random-forest-based feature selection was applied to the transcript-sequence-derived k-mers. Our models, including logistic regression, support vector machine and random forest, showed robust performance based on tenfold cross-validations as well as candidate prioritization with hypothetical loci. We then utilized the models to predict and prioritize a list of candidate lncRNAs, including some reported to be cis-regulators of known ASD risk genes, for further investigation. CONCLUSIONS: Our results suggest that ASD risk genes can be accurately predicted using developmental brain gene expression data and transcript sequence features, and the models may provide useful information for functional characterization of the candidate lncRNAs associated with ASD.


Asunto(s)
Trastorno del Espectro Autista/genética , Aprendizaje Automático , ARN Largo no Codificante/metabolismo , Trastorno del Espectro Autista/patología , Encéfalo/crecimiento & desarrollo , Encéfalo/metabolismo , Humanos , ARN Largo no Codificante/genética , Riesgo , Transcriptoma
3.
Bioinformatics ; 35(24): 5235-5242, 2019 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-31077303

RESUMEN

MOTIVATION: Circular RNAs (circRNAs) are a new class of endogenous RNAs in animals and plants. During pre-RNA splicing, the 5' and 3' termini of exon(s) can be covalently ligated to form circRNAs through back-splicing (head-to-tail splicing). CircRNAs can be conserved across species, show tissue- and developmental stage-specific expression patterns, and may be associated with human disease. However, the mechanism of circRNA formation is still unclear although some sequence features have been shown to affect back-splicing. RESULTS: In this study, by applying the state-of-art machine learning techniques, we have developed the first deep learning model, DeepCirCode, to predict back-splicing for human circRNA formation. DeepCirCode utilizes a convolutional neural network (CNN) with nucleotide sequence as the input, and shows superior performance over conventional machine learning algorithms such as support vector machine and random forest. Relevant features learnt by DeepCirCode are represented as sequence motifs, some of which match human known motifs involved in RNA splicing, transcription or translation. Analysis of these motifs shows that their distribution in RNA sequences can be important for back-splicing. Moreover, some of the human motifs appear to be conserved in mouse and fruit fly. The findings provide new insight into the back-splicing code for circRNA formation. AVAILABILITY AND IMPLEMENTATION: All the datasets and source code for model construction are available at https://github.com/BioDataLearning/DeepCirCode. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Profundo , Animales , Ratones , Empalme del ARN , ARN Circular , Análisis de Secuencia de ARN
4.
Plant Physiol ; 176(4): 3062-3080, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-29463771

RESUMEN

Protein kinases play fundamental roles in plant development and environmental stress responses. Here, we identified the STRESS INDUCED FACTOR (SIF) gene family, which encodes four leucine-rich repeat receptor-like protein kinases in Arabidopsis (Arabidopsis thaliana). The four genes, SIF1 to SIF4, are clustered in the genome and highly conserved, but they have temporally and spatially distinct expression patterns. We employed Arabidopsis SIF knockout mutants and overexpression transgenics to examine SIF involvement during plant pathogen defense. SIF genes are rapidly induced by biotic or abiotic stresses, and SIF proteins localize to the plasma membrane. Simultaneous knockout of SIF1 and SIF2 led to improved plant salt tolerance, whereas SIF2 overexpression enhanced PAMP-triggered immunity and prompted basal plant defenses, significantly improving pathogen resistance. Furthermore, SIF2 overexpression plants exhibited up-regulated expression of the defense-related genes WRKY53 and flg22-INDUCED RECEPTOR-LIKE KINASE1 as well as enhanced MPK3/MPK6 phosphorylation upon pathogen and elicitor treatments. The expression of the calcium signaling-related gene PHOSPHATE-INDUCED1 also was enhanced in the SIF2-overexpressing lines upon pathogen inoculation but repressed in the sif2 mutants. Bimolecular fluorescence complementation demonstrates that the BRI1-ASSOCIATED RECEPTOR KINASE1 protein is a coreceptor of the SIF2 kinase in the signal transduction pathway during pathogen invasion. These findings characterize a stress-responsive protein kinase family and illustrate how SIF2 modulates signal transduction for effective plant pathogenic defense.


Asunto(s)
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Regulación de la Expresión Génica de las Plantas , Enfermedades de las Plantas/genética , Proteínas Quinasas/genética , Secuencia de Aminoácidos , Arabidopsis/metabolismo , Arabidopsis/microbiología , Proteínas de Arabidopsis/metabolismo , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo , Resistencia a la Enfermedad/genética , Filogenia , Enfermedades de las Plantas/microbiología , Plantas Modificadas Genéticamente , Proteínas Quinasas/clasificación , Proteínas Quinasas/metabolismo , Pseudomonas syringae/fisiología , Homología de Secuencia de Aminoácido , Transducción de Señal/genética , Estrés Fisiológico
5.
J Integr Plant Biol ; 58(5): 492-502, 2016 May.
Artículo en Inglés | MEDLINE | ID: mdl-26172270

RESUMEN

Domain of unknown function 1644 (DUF1644) is a highly conserved amino acid sequence motif present only in plants. Analysis of expression data of the family of DUF1644-containing genes indicated that they may regulate responses to abiotic stress in rice. Here we present our discovery of the role of OsSIDP366, a member of the DUF1644 gene family, in response to drought and salinity stresses in rice. Transgenic rice plants overexpressing OsSIDP366 showed enhanced drought and salinity tolerance and reduced water loss as compared to that in the control, whereas plants with downregulated OsSIDP366 expression levels using RNA interference (RNAi) were more sensitive to salinity and drought treatments. The sensitivity to abscisic acid (ABA) treatment was not changed in OsSIDP366-overexpressing plants, and OsSIDP366 expression was not affected in ABA-deficient mutants. Subcellular localization analysis revealed that OsSIDP366 is presented in the cytoplasmic foci that colocalized with protein markers for both processing bodies (PBs) and stress granules (SGs) in rice protoplasts. Digital gene expression (DGE) profile analysis indicated that stress-related genes such as SNAC1, OsHAK5 and PRs were upregulated in OsSIDP366-overexpressing plants. These results suggest that OsSIDP366 may function as a regulator of the PBs/SGs and positively regulate salt and drought resistance in rice.


Asunto(s)
Sequías , Genes de Plantas , Oryza/genética , Oryza/fisiología , Proteínas de Plantas/genética , Cloruro de Sodio/farmacología , Estrés Fisiológico/genética , Ácido Abscísico/farmacología , Adaptación Fisiológica/efectos de los fármacos , Adaptación Fisiológica/genética , Secuencia de Bases , Perfilación de la Expresión Génica , Regulación de la Expresión Génica de las Plantas/efectos de los fármacos , Especificidad de Órganos/efectos de los fármacos , Especificidad de Órganos/genética , Oryza/efectos de los fármacos , Presión Osmótica , Filogenia , Reguladores del Crecimiento de las Plantas/farmacología , Proteínas de Plantas/metabolismo , Plantas Modificadas Genéticamente , Transporte de Proteínas/efectos de los fármacos , Interferencia de ARN , Estrés Fisiológico/efectos de los fármacos , Fracciones Subcelulares/metabolismo , Transcripción Genética/efectos de los fármacos
6.
BMC Bioinformatics ; 15 Suppl 17: I1, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25559210

RESUMEN

Advances of high-throughput technologies have rapidly produced more and more data from DNAs and RNAs to proteins, especially large volumes of genome-scale data. However, connection of the genomic information to cellular functions and biological behaviours relies on the development of effective approaches at higher systems level. In particular, advances in RNA-Seq technology has helped the studies of transcriptome, RNA expressed from the genome, while systems biology on the other hand provides more comprehensive pictures, from which genes and proteins actively interact to lead to cellular behaviours and physiological phenotypes. As biological interactions mediate many biological processes that are essential for cellular function or disease development, it is important to systematically identify genomic information including genetic mutations from GWAS (genome-wide association study), differentially expressed genes, bidirectional promoters, intrinsic disordered proteins (IDP) and protein interactions to gain deep insights into the underlying mechanisms of gene regulations and networks. Furthermore, bidirectional promoters can co-regulate many biological pathways, where the roles of bidirectional promoters can be studied systematically for identifying co-regulating genes at interactive network level. Combining information from different but related studies can ultimately help revealing the landscape of molecular mechanisms underlying complex diseases such as cancer.


Asunto(s)
Biología Computacional/métodos , Genoma Humano , Neoplasias/genética , Neoplasias/patología , Transcriptoma , Investigación Biomédica Traslacional , Genómica , Humanos , Fenotipo
7.
BMC Bioinformatics ; 15 Suppl 17: S2, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25559354

RESUMEN

BACKGROUND: Kidney Renal Clear Cell Carcinoma (KIRC) is one of fatal genitourinary diseases and accounts for most malignant kidney tumours. KIRC has been shown resistance to radiotherapy and chemotherapy. Like many types of cancers, there is no curative treatment for metastatic KIRC. Using advanced sequencing technologies, The Cancer Genome Atlas (TCGA) project of NIH/NCI-NHGRI has produced large-scale sequencing data, which provide unprecedented opportunities to reveal new molecular mechanisms of cancer. We combined differentially expressed genes, pathways and network analyses to gain new insights into the underlying molecular mechanisms of the disease development. RESULTS: Followed by the experimental design for obtaining significant genes and pathways, comprehensive analysis of 537 KIRC patients' sequencing data provided by TCGA was performed. Differentially expressed genes were obtained from the RNA-Seq data. Pathway and network analyses were performed. We identified 186 differentially expressed genes with significant p-value and large fold changes (P < 0.01, |log(FC)| > 5). The study not only confirmed a number of identified differentially expressed genes in literature reports, but also provided new findings. We performed hierarchical clustering analysis utilizing the whole genome-wide gene expressions and differentially expressed genes that were identified in this study. We revealed distinct groups of differentially expressed genes that can aid to the identification of subtypes of the cancer. The hierarchical clustering analysis based on gene expression profile and differentially expressed genes suggested four subtypes of the cancer. We found enriched distinct Gene Ontology (GO) terms associated with these groups of genes. Based on these findings, we built a support vector machine based supervised-learning classifier to predict unknown samples, and the classifier achieved high accuracy and robust classification results. In addition, we identified a number of pathways (P < 0.04) that were significantly influenced by the disease. We found that some of the identified pathways have been implicated in cancers from literatures, while others have not been reported in the cancer before. The network analysis leads to the identification of significantly disrupted pathways and associated genes involved in the disease development. Furthermore, this study can provide a viable alternative in identifying effective drug targets. CONCLUSIONS: Our study identified a set of differentially expressed genes and pathways in kidney renal clear cell carcinoma, and represents a comprehensive computational approach to analysis large-scale next-generation sequencing data. The pathway and network analyses suggested that information from distinctly expressed genes can be utilized in the identification of aberrant upstream regulators. Identification of distinctly expressed genes and altered pathways are important in effective biomarker identification for early cancer diagnosis and treatment planning. Combining differentially expressed genes with pathway and network analyses using intelligent computational approaches provide an unprecedented opportunity to identify upstream disease causal genes and effective drug targets.


Asunto(s)
Biomarcadores de Tumor/genética , Carcinoma de Células Renales/genética , Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes , Neoplasias Renales/genética , Riñón/metabolismo , Transducción de Señal , Carcinoma de Células Renales/patología , Estudios de Casos y Controles , Análisis por Conglomerados , Regulación Neoplásica de la Expresión Génica , Humanos , Neoplasias Renales/patología , Máquina de Vectores de Soporte
8.
BMC Genomics ; 15 Suppl 11: S4, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25559331

RESUMEN

BACKGROUND: The importance of mutations in disease phenotype has been studied, with information available in databases such as OMIM. However, it remains a research challenge for the possibility of clustering amino acid residues based on an underlying interaction, such as co-evolution, to understand how mutations in these related sites can lead to different disease phenotypes. RESULTS: This paper presents an integrative approach to identify groups of co-evolving residues, known as protein sectors. By studying a protein family using multiple sequence alignments and statistical coupling analysis, we attempted to determine if it is possible that these groups of residues could be related to disease phenotypes. After the protein sectors were identified, disease-associated residues within these groups of amino acids were mapped to a structure representing the protein family. In this study, we used the proposed pipeline to analyze two test cases of spermine synthase and Rab GDP dissociation inhibitor. CONCLUSIONS: The results suggest that there is a possible link between certain groups of co-evolving residues and different disease phenotypes. The pipeline described in this work could also be used to study other protein families associated with human diseases.


Asunto(s)
Enfermedad/genética , Mutación , Proteínas/genética , Aminoácidos/genética , Análisis por Conglomerados , Evolución Molecular , Inhibidores de Disociación de Guanina Nucleótido/genética , Humanos , Discapacidad Intelectual Ligada al Cromosoma X/genética , Fenotipo , Análisis de Secuencia de Proteína , Espermina Sintasa/genética
9.
Plant Cell Rep ; 33(2): 323-36, 2014 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-24247850

RESUMEN

Domain of Unknown Function 966 (DUF966) gene family was found in the protein family database, which consisted of seven genes in rice. The proteins encoded by these genes contained one or two highly conserved DUF966 domains. The available data of public microarray databases implied that these genes might play crucial roles in plant response to abiotic stresses. In this study, a member of the DUF966 gene family, DUF966-stress repressive gene 2 in Oryza sativa (OsDSR2, Loc_Os01g62200), was cloned and its role in rice responding to salt and simulated drought stresses was functionally characterized. OsDSR2 was expressed mainly in nodes of stems and leaf blades from rice. Expression profile analysis of adversity showed that OsDSR2 had different transcriptional responses to salt, drought, cold, heat and oxidative (H2O2) stresses, as well as abscisic acid (ABA), methyl jasmonate, salicylic acid, gibberellin acid and auxin treatments. Transient expression demonstrated that OsDSR2 was localized in the membrane and nucleus. Overexpression of OsDSR2 could increase salt and simulated drought (polyethyleneglycol)-stress sensitivities in rice by downregulating the expression of ABA- and stress-responsive genes including OsNCED4, SNAC1, OsbZIP23, P5CS, Oslea3 and rab16C. Furthermore, OsDSR2-overexpressing plants showed reduced ABA sensitivity during the post-germination stage. These results suggested that OsDSR2 negatively regulated rice response to salt and simulated drought stresses as well as ABA signaling, which provided some useful data for understanding the functional roles of DUF966 family genes in abiotic stress responses in plants.


Asunto(s)
Regulación de la Expresión Génica de las Plantas , Oryza/genética , Proteínas de Plantas/genética , Transducción de Señal , Estrés Fisiológico , Ácido Abscísico/metabolismo , Biología Computacional , Sequías , Flores/citología , Flores/efectos de los fármacos , Flores/genética , Flores/fisiología , Expresión Génica , Perfilación de la Expresión Génica , Genes Reporteros , Germinación , Especificidad de Órganos , Oryza/citología , Oryza/efectos de los fármacos , Oryza/fisiología , Filogenia , Reguladores del Crecimiento de las Plantas/metabolismo , Proteínas de Plantas/metabolismo , Plantas Modificadas Genéticamente , Estructura Terciaria de Proteína , Plantones/citología , Plantones/efectos de los fármacos , Plantones/genética , Plantones/fisiología , Cloruro de Sodio/farmacología
10.
Genes (Basel) ; 14(11)2023 Oct 26.
Artículo en Inglés | MEDLINE | ID: mdl-38002941

RESUMEN

Phelan-McDermid syndrome (PMS) is a rare genetic neurodevelopmental disorder caused by 22q13 region deletions or SHANK3 gene variants. Deletions vary in size and can affect other genes in addition to SHANK3. PMS is characterized by autism spectrum disorder (ASD), intellectual disability (ID), developmental delays, seizures, speech delay, hypotonia, and minor dysmorphic features. It is challenging to determine individual gene contributions due to variability in deletion sizes and clinical features. We implemented a genomic data mining approach for identifying and prioritizing the candidate genes in the 22q13 region for five phenotypes: ASD, ID, seizures, language impairment, and hypotonia. Weighted gene co-expression networks were constructed using the BrainSpan transcriptome dataset of a human brain. Bioinformatic analyses of the co-expression modules allowed us to select specific candidate genes, including EP300, TCF20, RBX1, XPNPEP3, PMM1, SCO2, BRD1, and SHANK3, for the common neurological phenotypes of PMS. The findings help understand the disease mechanisms and may provide novel therapeutic targets for the precise treatment of PMS.


Asunto(s)
Trastorno del Espectro Autista , Trastorno Autístico , Discapacidad Intelectual , Trastornos del Desarrollo del Lenguaje , Humanos , Trastorno del Espectro Autista/genética , Hipotonía Muscular/genética , Discapacidad Intelectual/genética , Proteínas del Tejido Nervioso/genética , Encéfalo , Trastornos del Desarrollo del Lenguaje/genética , Convulsiones , Factores de Transcripción
11.
Genes (Basel) ; 14(2)2023 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-36833327

RESUMEN

Calcium channels are an integral component in maintaining cellular function. Alterations may lead to channelopathies, primarily manifested in the central nervous system. This study describes the clinical and genetic features of a unique 12-year-old boy harboring two congenital calcium channelopathies, involving the CACNA1A and CACNA1F genes, and provides an unadulterated view of the natural history of sporadic hemiplegic migraine type 1 (SHM1) due to the patient's inability to tolerate any preventative medication. The patient presents with episodes of vomiting, hemiplegia, cerebral edema, seizure, fever, transient blindness, and encephalopathy. He is nonverbal, nonambulatory, and forced to have a very limited diet due to abnormal immune responses. The SHM1 manifestations apparent in the subject are consistent with the phenotype described in the 48 patients identified as part of a systematic literature review. The ocular symptoms of CACNA1F align with the family history of the subject. The presence of multiple pathogenic variants make it difficult to identify a clear phenotype-genotype correlation in the present case. Moreover, the detailed case description and natural history along with the comprehensive review of the literature contribute to the understanding of this complex disorder and point to the need for comprehensive clinical assessments of SHM1.


Asunto(s)
Canalopatías , Migraña con Aura , Masculino , Humanos , Calcio , Canalopatías/genética , Migraña con Aura/complicaciones , Migraña con Aura/genética , Sistema Nervioso Central , Canales de Calcio , Canales de Calcio Tipo L
12.
Amino Acids ; 43(1): 447-55, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-21986959

RESUMEN

Protein sumoylation is a post-translational modification that plays an important role in a wide range of cellular processes. Small ubiquitin-related modifier (SUMO) can be covalently and reversibly conjugated to the sumoylation sites of target proteins, many of which are implicated in various human genetic disorders. The accurate prediction of protein sumoylation sites may help biomedical researchers to design their experiments and understand the molecular mechanism of protein sumoylation. In this study, a new machine learning approach has been developed for predicting sumoylation sites from protein sequence information. Random forests (RFs) and support vector machines (SVMs) were trained with the data collected from the literature. Domain-specific knowledge in terms of relevant biological features was used for input vector encoding. It was shown that RF classifier performance was affected by the sequence context of sumoylation sites, and 20 residues with the core motif ΨKXE in the middle appeared to provide enough context information for sumoylation site prediction. The RF classifiers were also found to outperform SVM models for predicting protein sumoylation sites from sequence features. The results suggest that the machine learning approach gives rise to more accurate prediction of protein sumoylation sites than the other existing methods. The accurate classifiers have been used to develop a new web server, called seeSUMO (http://bioinfo.ggc.org/seesumo/), for sequence-based prediction of protein sumoylation sites.


Asunto(s)
Biología Computacional/métodos , Proteínas/química , Proteínas/metabolismo , Sumoilación , Algoritmos , Bases de Datos de Proteínas , Análisis de Secuencia de Proteína , Proteínas Modificadoras Pequeñas Relacionadas con Ubiquitina/metabolismo
13.
Genes (Basel) ; 13(8)2022 08 20.
Artículo en Inglés | MEDLINE | ID: mdl-36011399

RESUMEN

In the nervous system, synapses are special and pervasive structures between axonal and dendritic terminals, which facilitate electrical and chemical communications among neurons. Extensive studies have been conducted in mice and rats to explore the RNA pool at synapses and investigate RNA transport, local protein synthesis, and synaptic plasticity. However, owing to the experimental difficulties of studying human synaptic transcriptomes, the full pool of human synaptic RNAs remains largely unclear. We developed a new machine learning method, called PredSynRNA, to predict the synaptic localization of human RNAs. Training instances of dendritically localized RNAs were compiled from previous rodent studies, overcoming the shortage of empirical instances of human synaptic RNAs. Using RNA sequence and gene expression data as features, various models with different learning algorithms were constructed and evaluated. Strikingly, the models using the developmental brain gene expression features achieved superior performance for predicting synaptically localized RNAs. We examined the relevant expression features learned by PredSynRNA and used an independent test dataset to further validate the model performance. PredSynRNA models were then applied to the prediction and prioritization of candidate RNAs localized to human synapses, providing valuable targets for experimental investigations into neuronal mechanisms and brain disorders.


Asunto(s)
Neuronas , Sinapsis , Animales , Encéfalo/metabolismo , Humanos , Ratones , Neuronas/metabolismo , Biosíntesis de Proteínas , ARN/genética , ARN/metabolismo , Ratas , Sinapsis/genética
14.
BMC Genomics ; 12 Suppl 5: I1, 2011 Dec 23.
Artículo en Inglés | MEDLINE | ID: mdl-22369358

RESUMEN

This is an editorial report of the supplement to BMC Genomics that includes 15 papers selected from the BIOCOMP'10 - The 2010 International Conference on Bioinformatics & Computational Biology as well as other sources with a focus on genomics studies. BIOCOMP'10 was held on July 12-15 in Las Vegas, Nevada. The congress covered a large variety of research areas, and genomics was one of the major focuses because of the fast development in this field. We set out to launch a supplement to BMC Genomics with manuscripts selected from this congress and invited submissions. With a rigorous peer review process, we selected 15 manuscripts that showed work in cutting-edge genomics fields and proposed innovative methodology. We hope this supplement presents the current computational and statistical challenges faced in genomics studies, and shows the enormous promises and opportunities in the genomic future.


Asunto(s)
Redes Reguladoras de Genes , Genómica , Biología Computacional , Revisión de la Investigación por Pares , Medicina de Precisión
15.
J Comput Biol ; 28(2): 133-145, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-33232622

RESUMEN

The three-dimensional (3D) organization of the human genome is of crucial importance for gene regulation, and the CCCTC-binding factor (CTCF) plays an important role in chromatin interactions. However, it is still unclear what sequence patterns in addition to CTCF motif pairs determine chromatin loop formation. To discover the underlying sequence patterns, we have developed a deep learning model, called DeepCTCFLoop, to predict whether a chromatin loop can be formed between a pair of convergent or tandem CTCF motifs using only the DNA sequences of the motifs and their flanking regions. Our results suggest that DeepCTCFLoop can accurately distinguish the CTCF motif pairs forming chromatin loops from the ones not forming loops. It significantly outperforms CTCF-MP, a machine learning model based on word2vec and boosted trees, when using DNA sequences only. Furthermore, we show that DNA motifs binding to several transcription factors, including ZNF384, ZNF263, ASCL1, SP1, and ZEB1, may constitute the complex sequence patterns for CTCF-mediated chromatin loop formation. DeepCTCFLoop has also been applied to disease-associated sequence variants to identify candidates that may disrupt chromatin loop formation. Therefore, our results provide useful information for understanding the mechanism of 3D genome organization and may also help annotate and prioritize the noncoding sequence variants associated with human diseases.


Asunto(s)
Factor de Unión a CCCTC/metabolismo , Cromatina/genética , Biología Computacional/métodos , ADN/química , ADN/metabolismo , Sitios de Unión , Factor de Unión a CCCTC/química , Línea Celular , Cromatina/metabolismo , Aprendizaje Profundo , Predisposición Genética a la Enfermedad , Células HeLa , Humanos , Células K562 , Motivos de Nucleótidos , Análisis de Secuencia de ADN , Factores de Transcripción/química , Factores de Transcripción/metabolismo
16.
Hum Mutat ; 31(9): 1043-9, 2010 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-20556796

RESUMEN

The Snyder-Robinson syndrome is caused by missense mutations in the spermine sythase gene that encodes a protein (SMS) of 529 amino acids. Here we investigate, in silico, the molecular effect of three missense mutations, c.267G>A (p.G56S), c.496T>G (p.V132G), and c.550T>C (p.I150T) in SMS that were clinically identified to cause the disease. Single-point energy calculations, molecular dynamics simulations, and pKa calculations revealed the effects of these mutations on SMS's stability, flexibility, and interactions. It was predicted that the catalytic residue, Asp276, should be protonated prior binding the substrates. The pKa calculations indicated the p.I150T mutation causes pKa changes with respect to the wild-type SMS, which involve titratable residues interacting with the S-methyl-5'-thioadenosine (MTA) substrate. The p.I150T missense mutation was also found to decrease the stability of the C-terminal domain and to induce structural changes in the vicinity of the MTA binding site. The other two missense mutations, p.G56S and p.V132G, are away from active site and do not perturb its wild-type properties, but affect the stability of both the monomers and the dimer. Specifically, the p.G56S mutation is predicted to greatly reduce the affinity of monomers to form a dimer, and therefore should have a dramatic effect on SMS function because dimerization is essential for SMS activity.


Asunto(s)
Biología Computacional/métodos , Mutación Missense/genética , Adenosina/análogos & derivados , Adenosina/metabolismo , Sitios de Unión , Humanos , Internet , Discapacidad Intelectual Ligada al Cromosoma X/enzimología , Discapacidad Intelectual Ligada al Cromosoma X/genética , Modelos Moleculares , Multimerización de Proteína , Espermina Sintasa/química , Espermina Sintasa/genética , Termodinámica , Tionucleósidos/metabolismo
17.
BMC Genomics ; 11 Suppl 3: S2, 2010 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-21143784

RESUMEN

BACKGROUND: Short interfering RNAs (siRNAs) can be used to knockdown gene expression in functional genomics. For a target gene of interest, many siRNA molecules may be designed, whereas their efficiency of expression inhibition often varies. RESULTS: To facilitate gene functional studies, we have developed a new machine learning method to predict siRNA potency based on random forests and support vector machines. Since there were many potential sequence features, random forests were used to select the most relevant features affecting gene expression inhibition. Support vector machine classifiers were then constructed using the selected sequence features for predicting siRNA potency. Interestingly, gene expression inhibition is significantly affected by nucleotide dimer and trimer compositions of siRNA sequence. CONCLUSIONS: The findings in this study should help design potent siRNAs for functional genomics, and might also provide further insights into the molecular mechanism of RNA interference.


Asunto(s)
Algoritmos , Inteligencia Artificial , ARN Interferente Pequeño/química , Técnicas de Silenciamiento del Gen , Interferencia de ARN , ARN Interferente Pequeño/clasificación , ARN Interferente Pequeño/metabolismo
18.
BMC Genomics ; 11 Suppl 2: S15, 2010 Nov 02.
Artículo en Inglés | MEDLINE | ID: mdl-21047382

RESUMEN

BACKGROUND: Microarray gene expression data are accumulating in public databases. The expression profiles contain valuable information for understanding human gene expression patterns. However, the effective use of public microarray data requires integrating the expression profiles from heterogeneous sources. RESULTS: In this study, we have compiled a compendium of microarray expression profiles of various human tissue samples. The microarray raw data generated in different research laboratories have been obtained and combined into a single dataset after data normalization and transformation. To demonstrate the usefulness of the integrated microarray data for studying human gene expression patterns, we have analyzed the dataset to identify potential tissue-selective genes. A new method has been proposed for genome-wide identification of tissue-selective gene targets using both microarray intensity values and detection calls. The candidate genes for brain, liver and testis-selective expression have been examined, and the results suggest that our approach can select some interesting gene targets for further experimental studies. CONCLUSION: A computational approach has been developed in this study for combining microarray expression profiles from heterogeneous sources. The integrated microarray data can be used to investigate tissue-selective expression patterns of human genes.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Encéfalo/metabolismo , Bases de Datos Genéticas , Humanos , Hígado/metabolismo , Masculino , Testículo/metabolismo
19.
BMC Genomics ; 11 Suppl 2: S5, 2010 Nov 02.
Artículo en Inglés | MEDLINE | ID: mdl-21047386

RESUMEN

BACKGROUND: Protein destabilization is a common mechanism by which amino acid substitutions cause human diseases. Although several machine learning methods have been reported for predicting protein stability changes upon amino acid substitutions, the previous studies did not utilize relevant sequence features representing biological knowledge for classifier construction. RESULTS: In this study, a new machine learning method has been developed for sequence feature-based prediction of protein stability changes upon amino acid substitutions. Support vector machines were trained with data from experimental studies on the free energy change of protein stability upon mutations. To construct accurate classifiers, twenty sequence features were examined for input vector encoding. It was shown that classifier performance varied significantly by using different sequence features. The most accurate classifier in this study was constructed using a combination of six sequence features. This classifier achieved an overall accuracy of 84.59% with 70.29% sensitivity and 90.98% specificity. CONCLUSIONS: Relevant sequence features can be used to accurately predict protein stability changes upon amino acid substitutions. Predictive results at this level of accuracy may provide useful information to distinguish between deleterious and tolerant alterations in disease candidate genes. To make the classifier accessible to the genetics research community, we have developed a new web server, called MuStab (http://bioinfo.ggc.org/mustab/).


Asunto(s)
Sustitución de Aminoácidos , Inteligencia Artificial , Estabilidad Proteica , Análisis de Secuencia de Proteína/métodos , Biología Computacional/métodos , Sensibilidad y Especificidad
20.
Plant Mol Biol ; 72(1-2): 205-13, 2010 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-19876747

RESUMEN

The parasitic plant Cuscuta australis (dodder) invades a variety of species by entwining the stem and leaves of a host and developing haustoria. The twining response prior to haustoria formation is regarded as the first sign for dodders to parasitize host plants, and thus has been the focus of studies on the host-parasite interaction. However, the molecular mechanism is still poorly understood. In the present work, we have investigated the different effects of blue and white light on the twining response, and identified a set of proteins that were differentially expressed in dodder seedlings using a proteomic approach. Approximately 1,800 protein spots were detected on each 2-D gel, and 47 spots with increased or decreased protein levels were selected and analyzed with MALDI-TOF-MS. Peptide mass fingerprints (PMFs) obtained for these spots were used for protein identification through cross-species database searches. The results suggest that the blue light-induced twining response in dodder seedlings may be mediated by proteins involved in light signal transduction, cell wall degradation, cell structure, and metabolism.


Asunto(s)
Cuscuta/metabolismo , Cuscuta/efectos de la radiación , Regulación de la Expresión Génica de las Plantas/efectos de la radiación , Luz , Proteoma/análisis , Electroforesis en Gel Bidimensional , Espectrometría de Masa por Láser de Matriz Asistida de Ionización Desorción
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA