RESUMEN
Riboswitches are cis-acting genetic regulatory elements within a specific mRNA that can regulate both transcription and translation by interacting with their corresponding metabolites. Recently, an increasing number of riboswitches have been identified in different species and investigated for their roles in regulatory functions. Both the sequence contexts and structural conformations are important characteristics of riboswitches. None of the previously developed tools, such as covariance models (CMs), Riboswitch finder, and RibEx, provide a web server for efficiently searching homologous instances of known riboswitches or considers two crucial characteristics of each riboswitch, such as the structural conformations and sequence contexts of functional regions. Therefore, we developed a systematic method for identifying 12 kinds of riboswitches. The method is implemented and provided as a web server, RiboSW, to efficiently and conveniently identify riboswitches within messenger RNA sequences. The predictive accuracy of the proposed method is comparable with other previous tools. The efficiency of the proposed method for identifying riboswitches was improved in order to achieve a reasonable computational time required for the prediction, which makes it possible to have an accurate and convenient web server for biologists to obtain the results of their analysis of a given mRNA sequence. RiboSW is now available on the web at http://RiboSW.mbc.nctu.edu.tw/.
Asunto(s)
Sistemas de Lectura Abierta/genética , ARN Bacteriano/genética , ARN Mensajero/química , Secuencias Reguladoras de Ácido Ribonucleico , Programas Informáticos , Secuencia de Bases , Secuencia Conservada , Regulación Bacteriana de la Expresión Génica , Datos de Secuencia Molecular , Conformación de Ácido Nucleico , ARN Mensajero/metabolismo , Análisis de Secuencia de ADNRESUMEN
This paper presents a web service named MAGIICPRO,which aims to discover functional signatures of a query protein by sequential pattern mining. Automatic discovery of patterns from unaligned biological sequences is an important problem in molecular biology. MAGIIC-PRO is different from several previously established methods performing similar tasks in two major ways. The first remarkable feature of MAGIIC-PRO is its efficiency in delivering long patterns. With incorporating a new type of gap constraints and some of the state-of-theart data mining techniques, MAGIIC-PRO usually identifies satisfied patterns within an acceptable response time. The efficiency of MAGIIC-PRO enables the users to quickly discover functional signatures of which the residues are not from only one region of the protein sequences or are only conserved in few members of a protein family. The second remarkable feature of MAGIIC-PRO is its effort in refining the mining results. Considering large flexible gaps improves the completeness of the derived functional signatures. The users can be directly guided to the patterns with as many blocks as that are conserved simultaneously. In this paper,we show by experiments that MAGIIC-PRO is efficient and effective in identifying ligand-binding sites and hot regions in protein-protein interactions directly from sequences. The web service is availableat http://biominer.bime.ntu.edu.tw/magiicproand a mirror site at http://biominer.cse.yzu.edu.tw/magiicpro.
Asunto(s)
Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Sitios de Unión , Gráficos por Computador , Internet , Ligandos , Modelos Moleculares , Complejos Multiproteicos/química , Conformación Proteica , Interfaz Usuario-ComputadorRESUMEN
This paper presents a web service named MAGIIC-PRO, which aims to discover functional signatures of a query protein by sequential pattern mining. Automatic discovery of patterns from unaligned biological sequences is an important problem in molecular biology. MAGIIC-PRO is different from several previously established methods performing similar tasks in two major ways. The first remarkable feature of MAGIIC-PRO is its efficiency in delivering long patterns. With incorporating a new type of gap constraints and some of the state-of-the-art data mining techniques, MAGIIC-PRO usually identifies satisfied patterns within an acceptable response time. The efficiency of MAGIIC-PRO enables the users to quickly discover functional signatures of which the residues are not from only one region of the protein sequences or are only conserved in few members of a protein family. The second remarkable feature of MAGIIC-PRO is its effort in refining the mining results. Considering large flexible gaps improves the completeness of the derived functional signatures. The users can be directly guided to the patterns with as many blocks as that are conserved simultaneously. In this paper, we show by experiments that MAGIIC-PRO is efficient and effective in identifying ligand-binding sites and hot regions in protein-protein interactions directly from sequences. The web service is available at http://biominer.bime.ntu.edu.tw/magiicpro and a mirror site at http://biominer.cse.yzu.edu.tw/magiicpro.
Asunto(s)
Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Sitios de Unión , Gráficos por Computador , Internet , Ligandos , Modelos Moleculares , Complejos Multiproteicos/química , Conformación Proteica , Interfaz Usuario-ComputadorRESUMEN
BACKGROUND: Identification of protein interacting sites is an important task in computational molecular biology. As more and more protein sequences are deposited without available structural information, it is strongly desirable to predict protein binding regions by their sequences alone. This paper presents a pattern mining approach to tackle this problem. It is observed that a functional region of protein structures usually consists of several peptide segments linked with large wildcard regions. Thus, the proposed mining technology considers large irregular gaps when growing patterns, in order to find the residues that are simultaneously conserved but largely separated on the sequences. A derived pattern is called a cluster-like pattern since the discovered conserved residues are always grouped into several blocks, which each corresponds to a local conserved region on the protein sequence. RESULTS: The experiments conducted in this work demonstrate that the derived long patterns automatically discover the important residues that form one or several hot regions of protein-protein interactions. The methodology is evaluated by conducting experiments on the web server MAGIIC-PRO based on a well known benchmark containing 220 protein chains from 72 distinct complexes. Among the tested 218 proteins, there are 900 sequential blocks discovered, 4.25 blocks per protein chain on average. About 92% of the derived blocks are observed to be clustered in space with at least one of the other blocks, and about 66% of the blocks are found to be near the interface of protein-protein interactions. It is summarized that for about 83% of the tested proteins, at least two interacting blocks can be discovered by this approach. CONCLUSION: This work aims to demonstrate that the important residues associated with the interface of protein-protein interactions may be automatically discovered by sequential pattern mining. The detected regions possess high conservation and thus are considered as the computational hot regions. This information would be useful to characterizing protein sequences, predicting protein function, finding potential partners, and facilitating protein docking for drug discovery.
Asunto(s)
Mapeo de Interacción de Proteínas/métodos , Proteínas Bacterianas/química , Carboxipeptidasas A/química , Subunidades alfa de la Proteína de Unión al GTP Gi-Go/química , Proteínas de Choque Térmico/química , Modelos Biológicos , Proteínas de Unión al GTP rac/química , Proteínas Activadoras de ras GTPasa/química , Proteína RCA2 de Unión a GTPRESUMEN
Asthma is one of the most common chronic diseases in children. It is caused by complicated coactions between various genetic factors and environmental allergens. The study aims to integrate the concept of implementing adaptive neuro-fuzzy inference system (ANFIS) and classification analysis methods for forecasting the association of asthma susceptibility genes on 3 serum IgE groups. The ANFIS model was trained and tested with data sets obtained from 425 asthmatic subjects and 483 non-asthma subjects from the Taiwanese population. We assessed 13 single-nucleotide polymorphisms (SNPs) in seven well-known asthma susceptibility genes; firstly, the proposed ANFIS model learned to reduce input features from the 13 SNPs. And secondly, the classification will be used to classify the serum IgE groups from the simulated SNPs results. The performance of the ANFIS model, classification accuracies and the results confirmed that the integration of ANFIS and classified analysis has potential in association discovery.
Asunto(s)
Asma/clasificación , Asma/genética , Lógica Difusa , Pruebas Genéticas/métodos , Inmunoglobulina E/genética , Asma/diagnóstico , Niño , Preescolar , Diagnóstico Diferencial , Femenino , Predisposición Genética a la Enfermedad , Humanos , Masculino , Modelos Biológicos , Polimorfismo de Nucleótido Simple , Pruebas de Función Respiratoria , TaiwánRESUMEN
BACKGROUND: Automatic extraction of motifs from biological sequences is an important research problem in study of molecular biology. For proteins, it is desired to discover sequence motifs containing a large number of wildcard symbols, as the residues associated with functional sites are usually largely separated in sequences. Discovering such patterns is time-consuming because abundant combinations exist when long gaps (a gap consists of one or more successive wildcards) are considered. Mining algorithms often employ constraints to narrow down the search space in order to increase efficiency. However, improper constraint models might degrade the sensitivity and specificity of the motifs discovered by computational methods. We previously proposed a new constraint model to handle large wildcard regions for discovering functional motifs of proteins. The patterns that satisfy the proposed constraint model are called W-patterns. A W-pattern is a structured motif that groups motif symbols into pattern blocks interleaved with large irregular gaps. Considering large gaps reflects the fact that functional residues are not always from a single region of protein sequences, and restricting motif symbols into clusters corresponds to the observation that short motifs are frequently present within protein families. To efficiently discover W-patterns for large-scale sequence annotation and function prediction, this paper first formally introduces the problem to solve and proposes an algorithm named WildSpan (sequential pattern mining across large wildcard regions) that incorporates several pruning strategies to largely reduce the mining cost. RESULTS: WildSpan is shown to efficiently find W-patterns containing conserved residues that are far separated in sequences. We conducted experiments with two mining strategies, protein-based and family-based mining, to evaluate the usefulness of W-patterns and performance of WildSpan. The protein-based mining mode of WildSpan is developed for discovering functional regions of a single protein by referring to a set of related sequences (e.g. its homologues). The discovered W-patterns are used to characterize the protein sequence and the results are compared with the conserved positions identified by multiple sequence alignment (MSA). The family-based mining mode of WildSpan is developed for extracting sequence signatures for a group of related proteins (e.g. a protein family) for protein function classification. In this situation, the discovered W-patterns are compared with PROSITE patterns as well as the patterns generated by three existing methods performing the similar task. Finally, analysis on execution time of running WildSpan reveals that the proposed pruning strategy is effective in improving the scalability of the proposed algorithm. CONCLUSIONS: The mining results conducted in this study reveal that WildSpan is efficient and effective in discovering functional signatures of proteins directly from sequences. The proposed pruning strategy is effective in improving the scalability of WildSpan. It is demonstrated in this study that the W-patterns discovered by WildSpan provides useful information in characterizing protein sequences. The WildSpan executable and open source codes are available on the web (http://biominer.csie.cyu.edu.tw/wildspan).
RESUMEN
The accurate identification of potential poly(A) sites has contributed to all many studies with regard to alternative polyadenylation. The aim of this study was the development of a machine-learning methodology that will help to discriminate real polyadenylation signals from randomly occurring signals in genomic sequence. Since previous studies have revealed that RNA secondary structure in certain genes has significant impact, the authors tried to computationally pinpoint common structural patterns around the poly(A) sites and to investigate how RNA secondary structure may influence polyadenylation. This involved an initial study on the impact of RNA structure and it was found using motif search tools that hairpin structures might be important. Thus, it was propose that, in addition to the sequence pattern around poly(A) sites, there exists a widespread structural pattern that is also employed during human mRNA polyadenylation. In this study, the authors present a computational model that uses support vector machines to predict human poly(A) sites. The results show that this predictive model has a comparable performance to the current prediction tool. In addition, it was identified common structural patterns associated with polyadenylation using several motif finding programs and this provides new insight into the role of RNA secondary structure plays in polyadenylation.
Asunto(s)
Poliadenilación/genética , ARN Mensajero/genética , Biología Computacional/métodos , Humanos , Secuencias Invertidas Repetidas/genética , Modelos Genéticos , Conformación de Ácido Nucleico , Máquina de Vectores de SoporteRESUMEN
Hard of hearing students usually face more difficulties at school than other students. A classroom environment with wireless technology was implemented to explore whether wireless technology could enhance mathematics learning and teaching activities for a hearing teacher and her 7 hard of hearing students in a Taiwan junior high school. Experiments showed that the highly interactive communication through the wireless network increased student participation in learning activities. Students demonstrated more responses to the teacher and fewer distraction behaviors. Fewer mistakes were made in in-class course work because Tablet PCs provided students scaffolds. Students stated that the environment with wireless technology was desirable and said that they hoped to continue using the environment to learn mathematics.
Asunto(s)
Educación de Personas con Discapacidad Auditiva , Tecnología Educacional/instrumentación , Matemática , Enseñanza/métodos , Escolaridad , Humanos , Microcomputadores , Programas Informáticos , Interfaz Usuario-ComputadorRESUMEN
Recent investigations on the stability of proteins have demonstrated various structural factors, but few have considered sequence factors such as protein motifs. These motifs represent highly conserved regions and describe critical regions that may only exist on proteins that remain functional at high temperatures. This investigation presents a method for identifying and comparing corresponding mesophilic and thermophilic sequence motifs between protein families. Discriminative motifs that are conserved only in the mesophilic or thermophilic subfamily are identified. Analysis of the results shows that, although the subfamilies of most protein families share similar motifs, some discriminative motifs are present in particular thermophilic/mesophilic subfamilies. The thermophilic discriminative motifs are conserved only in thermophilic organisms, revealing that physiochemical principles support thermostability.
Asunto(s)
Células Procariotas/química , Proteínas/química , Temperatura , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Iones/química , Modelos Moleculares , Datos de Secuencia Molecular , Estructura Terciaria de Proteína , Alineación de SecuenciaRESUMEN
ProSplicer is a database of putative alternative splicing information derived from the alignment of proteins, mRNA sequences and expressed sequence tags (ESTs) against human genomic DNA sequences. Proteins, mRNA and ESTs provide valuable evidence that can reveal splice variants of genes. The alternative splicing information in the database can help users investigate the alternative splicing and tissue-specific expression of genes.