Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
1.
Proteomics ; 9(16): 4000-16, 2009 Aug.
Article in English | MEDLINE | ID: mdl-19701905

ABSTRACT

In this study iTRAQ was used to produce a highly confident catalogue of 542 proteins identified in porcine muscle (false positive<5%). To our knowledge this is the largest reported set of skeletal muscle proteins in livestock. Comparison with human muscle proteome demonstrated a low level of false positives with 83% of the proteins common to both proteomes. In addition, for the first time we assess variations in the muscle proteome caused by sexually dimorphic gene expression and diet dephytinization. Preliminary analysis identified 19 skeletal muscle proteins differentially expressed between male and female pigs (> or = 1.2-fold, p<0.05), but only one of them, GDP-dissociation inhibitor 1, was significant (p<0.05) after false discovery rate correction. Diet dephytinization affected expression of 20 proteins (p<0.05). This study would contribute to an evaluation of the suitability of the pig as a model to study human gender-related differences in gene expression. Transgenic pigs used in this study might also serve as a useful model to understand changes in human physiology resulting from diet dephytinization.


Subject(s)
Diet , Muscle, Skeletal/metabolism , Phytic Acid/metabolism , Proteome/analysis , 6-Phytase/genetics , 6-Phytase/metabolism , Animals , Animals, Genetically Modified , Electrophoresis, Gel, Two-Dimensional , Female , Humans , Male , Mass Spectrometry , Sex Factors , Swine
2.
Theor Biol Med Model ; 4: 47, 2007 Dec 06.
Article in English | MEDLINE | ID: mdl-18062814

ABSTRACT

BACKGROUND: Abel and Trevors have delineated three aspects of sequence complexity, Random Sequence Complexity (RSC), Ordered Sequence Complexity (OSC) and Functional Sequence Complexity (FSC) observed in biosequences such as proteins. In this paper, we provide a method to measure functional sequence complexity. METHODS AND RESULTS: We have extended Shannon uncertainty by incorporating the data variable with a functionality variable. The resulting measured unit, which we call Functional bit (Fit), is calculated from the sequence data jointly with the defined functionality variable. To demonstrate the relevance to functional bioinformatics, a method to measure functional sequence complexity was developed and applied to 35 protein families. Considerations were made in determining how the measure can be used to correlate functionality when relating to the whole molecule and sub-molecule. In the experiment, we show that when the proposed measure is applied to the aligned protein sequences of ubiquitin, 6 of the 7 highest value sites correlate with the binding domain. CONCLUSION: For future extensions, measures of functional bioinformatics may provide a means to evaluate potential evolving pathways from effects such as mutations, as well as analyzing the internal structural and functional relationships within the 3-D structure of proteins.


Subject(s)
Computational Biology/methods , Proteins/physiology , Sequence Analysis, Protein , Animals , Humans , Multigene Family , Protein Structure, Tertiary/genetics , Proteins/genetics , Software , Uncertainty
3.
Brief Funct Genomics ; 15(1): 38-46, 2016 Jan.
Article in English | MEDLINE | ID: mdl-26072035

ABSTRACT

Long noncoding RNAs (lncRNAs), generally longer than 200 nucleotides and with poor protein coding potential, are usually considered collectively as a heterogeneous class of RNAs. Recently, an increasing number of studies have shown that lncRNAs can involve in various critical biological processes and a number of complex human diseases. Not only the primary sequences of many lncRNAs are directly interrelated to a specific functional role, strong evidence suggests that their secondary structures are even more interrelated to their known functions. As functional molecules, lncRNAs have become more and more relevant to many researchers. Here, we review recent, state-of-the-art advances in the three levels (the primary sequence, the secondary structure and the function annotation) of the lncRNA research, as well as computational methods for lncRNA data analysis.


Subject(s)
Molecular Sequence Annotation , RNA, Long Noncoding/chemistry , RNA, Long Noncoding/genetics , Humans
4.
J Bioinform Comput Biol ; 3(2): 281-301, 2005 Apr.
Article in English | MEDLINE | ID: mdl-15852506

ABSTRACT

The combined interpretation of gene expression data and gene sequences is important for the investigation of the intricate relationships of gene expression at the transcription level. The expression data produced by microarray hybridization experiments can lead to the identification of clusters of co-expressed genes that are likely co-regulated by the same regulatory mechanisms. By analyzing the promoter regions of co-expressed genes, the common regulatory patterns characterized by transcription factor binding sites can be revealed. Many clustering algorithms have been used to uncover inherent clusters in gene expression data. In this paper, based on experiments using simulated and real data, we show that the performance of these algorithms could be further improved. For the clustering of expression data typically characterized by a lot of noise, we propose to use a two-phase clustering algorithm consisting of an initial clustering phase and a second re-clustering phase. The proposed algorithm has several desirable features: (i) it utilizes both local and global information by computing both a "local" pairwise distance between two gene expression profiles in Phase 1 and a "global" probabilistic measure of interestingness of cluster patterns in Phase 2, (ii) it distinguishes between relevant and irrelevant expression values when performing re-clustering, and (iii) it makes explicit the patterns discovered in each cluster for possible interpretations. Experimental results show that the proposed algorithm can be an effective algorithm for discovering clusters in the presence of very noisy data. The patterns that are discovered in each cluster are found to be meaningful and statistically significant, and cannot otherwise be easily discovered. Based on these discovered patterns, genes co-expressed under the same experimental conditions and range of expression levels have been identified and evaluated. When identifying regulatory patterns at the promoter regions of the co-expressed genes, we also discovered well-known transcription factor binding sites in them. These binding sites can provide explanations for the co-expressed patterns.


Subject(s)
Algorithms , Artificial Intelligence , Chromosome Mapping/methods , Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis/methods , Pattern Recognition, Automated/methods , Sequence Analysis, DNA/methods , Cluster Analysis , Transcription Factors/genetics
5.
Appl Bioinformatics ; 4(2): 85-92, 2005.
Article in English | MEDLINE | ID: mdl-16128610

ABSTRACT

BACKGROUND: In order to understand the intricacy of biomolecules more comprehensively, significant patterns extracted from related data collected from diverse sources must be integrated. These data sources may be local or distributed, possibly with different representation schemes. Often, related data from different sources correspond only with respect to some of their values. METHODS: In biological sequence analysis, a goal is to identify new, previously unknown, relevant patterns, to obtain additional insights into the biomolecule. This is known as a pattern discovery task, rather than a pattern matching task. In this research, we present a method to tackle this problem typically found in molecular sequence analysis when the alignment of the sequences is represented as a relation. In this article, we propose an information measure to select attribute values that reflect multiple patterns of significant interdependence information. Based on these selected values, the patterns are evaluated with data values from other sources. RESULTS: In the experiments, a cancer-suppressor gene known as TP53 (encoding tumour protein p53) is analysed with the mutation records of patients. The experiments identify previously unknown points in the molecule that have patterns negatively associated with the occurrence of cancer. CONCLUSION: Since the evaluated interdependence pattern is a global property of the molecule, we conjecture that the identified points might also be a reflection of the molecule's cancer-suppressor characteristics. The experiments also confirm the usefulness of the proposed method.


Subject(s)
Biomarkers, Tumor/chemistry , Models, Molecular , Neoplasm Proteins/chemistry , Neoplasms/metabolism , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Tumor Suppressor Protein p53/chemistry , Amino Acid Sequence , Biomarkers, Tumor/analysis , Computer Simulation , Database Management Systems , Databases, Protein , Humans , Information Storage and Retrieval/methods , Molecular Sequence Data , Neoplasm Proteins/analysis , Pattern Recognition, Automated/methods , Protein Structure, Tertiary , Statistics as Topic , Tumor Suppressor Protein p53/analysis
6.
PLoS One ; 8(7): e70412, 2013.
Article in English | MEDLINE | ID: mdl-23894653

ABSTRACT

The identification of deregulated modules (such as induced by oncogenes) is a crucial step for exploring the pathogenic process of complex diseases. Most of the existing methods focus on deregulation of genes rather than the links of the path among them. In this study, we emphasize on the detection of deregulated links, and develop a novel and effective regulatory path-based approach in finding deregulated modules. Observing that a regulatory pathway between two genes might involve in multiple rather than a single path, we identify condition-specific core regulatory path (CCRP) to detect the significant deregulation of regulatory links. Using time-series gene expression, we define the regulatory strength within each gene pair based on statistical dependence analysis. The CCRPs in regulatory networks can then be identified using the shortest path algorithm. Finally, we derive the deregulated modules by integrating the differential edges (as deregulated links) of the CCRPs between the case and the control group. To demonstrate the effectiveness of our approach, we apply the method to expression data associated with different states of Human Epidermal Growth Factor Receptor 2 (HER2). The experimental results show that the genes as well as the links in the deregulated modules are significantly enriched in multiple KEGG pathways and GO biological processes, most of which can be validated to suffer from impact of this oncogene based on previous studies. Additionally, we find the regulatory mechanism associated with the crucial gene SNAI1 significantly deregulated resulting from the activation of HER2. Hence, our method provides not only a strategy for detecting the deregulated links in regulatory networks, but also a way to identify concerning deregulated modules, thus contributing to the target selection of edgetic drugs.


Subject(s)
Gene Regulatory Networks/physiology , Algorithms , Gene Regulatory Networks/genetics , Humans , Models, Theoretical , Receptor, ErbB-2/genetics , Receptor, ErbB-2/metabolism , Snail Family Transcription Factors , Transcription Factors/genetics , Transcription Factors/metabolism
8.
Comput Biol Chem ; 35(5): 298-307, 2011 Oct 12.
Article in English | MEDLINE | ID: mdl-22000801

ABSTRACT

Since cellular functionality is typically envisioned as having a hierarchical structure, we propose a framework to identify modules (or clusters) within protein-protein interaction (PPI) networks in this paper. Based on the within-module and between-module edges of subgraphs and degree distribution, we present a formal module definition in PPI networks. Using the new module definition, an effective quantitative measure is introduced for the evaluation of the partition of PPI networks. Because of the hierarchical nature of functional modules, a hierarchical agglomerative clustering algorithm is developed based on the new measure in order to solve the problem of complexes detection within PPI networks. We use gold standard sets of protein complexes to validate the biological significance of predicted complexes. A comprehensive comparison is performed between our method and other four representative methods. The results show that our algorithm finds more protein complexes with high biological significance and a significant improvement. Furthermore, the predicted complexes by our method, whether dense or sparse, match well with known biological characteristics.


Subject(s)
Protein Interaction Mapping/methods , Protein Interaction Maps/physiology , Saccharomyces cerevisiae Proteins , Algorithms , Cluster Analysis , Computational Biology , Databases, Protein , Models, Biological , Multiprotein Complexes/chemistry , Multiprotein Complexes/metabolism , Protein Interaction Mapping/statistics & numerical data , Saccharomyces cerevisiae Proteins/chemistry , Saccharomyces cerevisiae Proteins/metabolism
9.
J Bioinform Comput Biol ; 8(5): 789-807, 2010 Oct.
Article in English | MEDLINE | ID: mdl-20981888

ABSTRACT

Comparative genomics is concerned with the study of genome structure and function of different species. It can provide useful information for the derivation of evolutionary and functional relationships between genomes. Previous work on genome comparison focuses mainly on comparing the entire genomes for visualization without further analysis. As many interesting patterns may exist between genomes and may lead to the discovering of functional gene segments (groups of genes), we propose an algorithm called Multi-Level Genome Comparison Algorithm (MGC) that can be used to facilitate the analysis of genomes at multi-levels during the comparison process to discover sequential and regional consistency in gene segments. Different genomes may have common sub-sequences that differ from each other due to mutations, lateral gene transfers, gene rearrangements, etc., and these sub-sequences are usually not easily identified. Not all the genes can have a perfect one-to-one matching with each other. It is quite possible for one-to-many or many-to-many ambiguous relationships to exist between them. To perform the tasks effectively, MGC takes such ambiguity into consideration during genome comparison by representing genomes in a graph and then make use of a graph mining algorithm called the Multi-Level Attributed Graph Mining Algorithm (MAGMA) to build a hierarchical multi-level graph structure to facilitate genome comparison. To determine the effectiveness of these proposed algorithms, experiments were performed using intra- and inter-species of Microbial genomes. The results show that the proposed algorithms are able to discover multiple level matching patterns that show the similarities and dissimilarities among different genomes, in addition to confirming the specific role of the genes in the genomes.


Subject(s)
Algorithms , Data Mining/methods , Genomics/statistics & numerical data , Animals , Chlamydia muridarum/classification , Chlamydia muridarum/genetics , Chlamydiales/classification , Chlamydiales/genetics , Chlamydophila pneumoniae/classification , Chlamydophila pneumoniae/genetics , Computational Biology , Genome, Bacterial , Humans , Models, Genetic , Sequence Alignment/statistics & numerical data , Species Specificity
10.
Article in English | MEDLINE | ID: mdl-20483222

ABSTRACT

Porcine liver proteome iTRAQ analysis enabled the confident identification of 880 proteins with a rate of false positive identifications of less than 5%. Proteins involved in energy metabolism, catabolism, protein biosynthesis, electron transport, and other oxidoreductase reactions were highly enriched confirming the central role of liver as the major chemical and energy factory. Comparative analysis with human and mouse liver proteomes demonstrated that 80% of proteins were common to all three liver proteomes. In addition, it was also demonstrated that both sex of the animal and introduction of a novel phytase transgene into the genome each affected around 5% of total liver proteome. After controlling the false discovery rate (FDR

11.
Article in English | MEDLINE | ID: mdl-18427583

ABSTRACT

Decomposing a biological sequence into its functional regions is an important prerequisite to understand the molecule. Using the multiple alignments of the sequences, we evaluate a segmentation based on the type of statistical variation pattern from each of the aligned sites. To describe such a more general pattern, we introduce multipattern consensus regions as segmented regions based on conserved as well as interdependent patterns. Thus the proposed consensus region considers patterns that are statistically significant and extends a local neighborhood. To show its relevance in protein sequence analysis, a cancer suppressor gene called p53 is examined. The results show significant associations between the detected regions and tendency of mutations, location on the 3D structure, and cancer hereditable factors that can be inferred from human twin studies.

SELECTION OF CITATIONS
SEARCH DETAIL