Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 23
Filter
1.
J Am Stat Assoc ; 118(542): 830-845, 2023.
Article in English | MEDLINE | ID: mdl-37519438

ABSTRACT

Point process modeling is gaining increasing attention, as point process type data are emerging in a large variety of scientific applications. In this article, motivated by a neuronal spike trains study, we propose a novel point process regression model, where both the response and the predictor can be a high-dimensional point process. We model the predictor effects through the conditional intensities using a set of basis transferring functions in a convolutional fashion. We organize the corresponding transferring coefficients in the form of a three-way tensor, then impose the low-rank, sparsity, and subgroup structures on this coefficient tensor. These structures help reduce the dimensionality, integrate information across different individual processes, and facilitate the interpretation. We develop a highly scalable optimization algorithm for parameter estimation. We derive the large sample error bound for the recovered coefficient tensor, and establish the subgroup identification consistency, while allowing the dimension of the multivariate point process to diverge. We demonstrate the efficacy of our method through both simulations and a cross-area neuronal spike trains analysis in a sensory cortex study.

2.
IEEE Trans Nanobioscience ; 22(4): 728-733, 2023 10.
Article in English | MEDLINE | ID: mdl-37167036

ABSTRACT

In recent years, due to the contribution to elucidating the functional mechanisms of miRNAs and lncRNAs, the research on miRNA-lncRNA interaction prediction has increased exponentially. However, the prediction research is challenging in bioinformatics domain. It is expensive and time-consuming to verify the interactions by biological experiments. The existing prediction models have some limitations, such as the need to manually extract features, the potential loss of features from pre-treatment approaches, long-distance dependency to be solved, and so on. Additionally, most of the current models prefer to the animal data. However, the establishment of an efficient and accurate plant miRNA-lncRNA interaction prediction model is necessary. In this work, a new deep learning model called PmlIPM is presented to infer plant miRNA-lncRNA associations. PmlIPM is a four-step framework including Input Embedding, Positional Encoding, Multi-Head Attention and Max Pooling. PmlIPM accepts separately input of miRNA and lncRNA to extract sequence features, avoiding information loss caused by direct splicing the two sequences as model inputs. The attention mechanisms give the model the ability to capture long distance features. PmlIPM is compared with the existing models on 2 benchmark datasets. The results show that our model performs better than other methods and obtains AUC scores of 0.8412, 0.8587, 0.9666 and 0.9225 in the four independent test sets of Arabidopsis lyrata (A.ly), Solanum lycopersicum (S.ly), Brachypodium distachyon (B.di) and Solanum tuberosum (S.tu), respectively.


Subject(s)
Arabidopsis , Deep Learning , MicroRNAs , RNA, Long Noncoding , Animals , MicroRNAs/genetics , RNA, Long Noncoding/genetics , Computational Biology/methods , Arabidopsis/genetics
3.
Int J Mol Sci ; 23(11)2022 May 31.
Article in English | MEDLINE | ID: mdl-35682864

ABSTRACT

Dyslipidemia is considered a risk factor for type 2 diabetes (T2D), yet studies with statins and candidate genes suggest that circulating lipids may protect against T2D development. Apoe-null (Apoe-/-) mouse strains develop spontaneous dyslipidemia and exhibit a wide variation in susceptibility to diet-induced T2D. We thus used Apoe-/- mice to elucidate phenotypic and genetic relationships of circulating lipids with T2D. A male F2 cohort was generated from an intercross between LP/J and BALB/cJ Apoe-/- mice and fed 12 weeks of a Western diet. Fasting, non-fasting plasma glucose, and lipid levels were measured and genotyping was performed using miniMUGA arrays. We uncovered a major QTL near 60 Mb on chromosome 15, Nhdlq18, which affected non-HDL cholesterol and triglyceride levels under both fasting and non-fasting states. This QTL was coincident with Bglu20, a QTL that modulates fasting and non-fasting glucose levels. The plasma levels of non-HDL cholesterol and triglycerides were closely correlated with the plasma glucose levels in F2 mice. Bglu20 disappeared after adjustment for non-HDL cholesterol or triglycerides. These results demonstrate a causative role for dyslipidemia in T2D development in mice.


Subject(s)
Diabetes Mellitus, Type 2 , Dyslipidemias , Hyperlipidemias , Animals , Apolipoproteins E/genetics , Blood Glucose , Cholesterol , Crosses, Genetic , Diabetes Mellitus, Type 2/genetics , Dyslipidemias/genetics , Humans , Hyperlipidemias/genetics , Male , Mice , Mice, Knockout , Quantitative Trait Loci , Triglycerides
4.
J Am Stat Assoc ; 117(540): 1669-1683, 2022.
Article in English | MEDLINE | ID: mdl-36875798

ABSTRACT

DNA methylation (DNAm) has been suggested to play a critical role in post-traumatic stress disorder (PTSD), through mediating the relationship between trauma and PTSD. However, this underlying mechanism of PTSD for African Americans still remains unknown. To fill this gap, in this article, we investigate how DNAm mediates the effects of traumatic experiences on PTSD symptoms in the Detroit Neighborhood Health Study (DNHS) (2008-2013) which involves primarily African Americans adults. To achieve this, we develop a new mediation analysis approach for high-dimensional potential DNAm mediators. A key novelty of our method is that we consider heterogeneity in mediation effects across subpopulations. Specifically, mediators in different subpopulations could have opposite effects on the outcome, and thus could be difficult to identify under a traditional homogeneous model framework. In contrast, the proposed method can estimate heterogeneous mediation effects and identifies subpopulations in which individuals share similar effects. Simulation studies demonstrate that the proposed method outperforms existing methods for both homogeneous and heterogeneous data. We also present our mediation analysis results of a dataset with 125 participants and more than 450,000 CpG sites from the DNHS study. The proposed method finds three subgroups of subjects and identifies DNAm mediators corresponding to genes such as HSP90AA1 and NFATC1 which have been linked to PTSD symptoms in literature. Our finding could be useful in future finer-grained investigation of PTSD mechanism and in the development of new treatments for PTSD.

5.
Am J Med Sci ; 362(3): 297-302, 2021 09.
Article in English | MEDLINE | ID: mdl-34197739

ABSTRACT

BACKGROUND: Glucometers are widely used in animal research due to simplicity and ease of utilization, but their accuracy in blood glucose assessment for hyperlipidemic mice is unknown. METHODS: Here, we compared blood glucose levels measured by a glucometer with plasma glucose levels measured by a standard enzymatic assay for 325 genetically diverse F2 mice derived from LP and BALB/c (BALB) Apoe-/- mice. Non-fasting glucose levels were measured before initiation of a Western diet and after 11 weeks on the diet. RESULTS: On chow diet, lab-measured plasma glucose levels were 279.5 ± 42.6 mg/dl (mean ± SD), while blood glucose values measured by glucometer were 138.7 ± 16.6 mg/dl. The two measures had no correlation (R2 = 0.006, p = 0.167). On the Western diet, plasma glucose levels rose to 351.1 ± 121.6 mg/dl, while glucometer-measured blood glucose fell to 128.7 ± 27.9 mg/dl. The two measures showed a moderate correlation (R2 = 0.111, p = 3.1E-9). Lab-measured plasma glucose showed strong positive correlations with plasma triglyceride and non-high-density lipoprotein cholesterol levels, while glucometer-measured blood glucose showed an inverse correlation with non-high-density lipoprotein levels on the chow diet. CONCLUSIONS: Our results indicate that hyperlipidemia affects the accuracy of glucometers in measuring blood glucose levels of mice.


Subject(s)
Blood Chemical Analysis/standards , Blood Glucose/genetics , Blood Glucose/metabolism , Genetic Variation/physiology , Hyperlipidemias/blood , Hyperlipidemias/genetics , Animals , Female , Male , Mice , Mice, Inbred BALB C , Mice, Knockout
6.
Brief Bioinform ; 22(2): 2043-2057, 2021 03 22.
Article in English | MEDLINE | ID: mdl-32186712

ABSTRACT

Accumulating evidence has shown that microRNAs (miRNAs) play crucial roles in different biological processes, and their mutations and dysregulations have been proved to contribute to tumorigenesis. In silico identification of disease-associated miRNAs is a cost-effective strategy to discover those most promising biomarkers for disease diagnosis and treatment. The increasing available omics data sources provide unprecedented opportunities to decipher the underlying relationships between miRNAs and diseases by computational models. However, most existing methods are biased towards a single representation of miRNAs or diseases and are also not capable of discovering unobserved associations for new miRNAs or diseases without association information. In this study, we present a novel computational method with adaptive multi-source multi-view latent feature learning (M2LFL) to infer potential disease-associated miRNAs. First, we adopt multiple data sources to obtain similarity profiles and capture different latent features according to the geometric characteristic of miRNA and disease spaces. Then, the multi-modal latent features are projected to a common subspace to discover unobserved miRNA-disease associations in both miRNA and disease views, and an adaptive joint graph regularization term is developed to preserve the intrinsic manifold structures of multiple similarity profiles. Meanwhile, the Lp,q-norms are imposed into the projection matrices to ensure the sparsity and improve interpretability. The experimental results confirm the superior performance of our proposed method in screening reliable candidate disease miRNAs, which suggests that M2LFL could be an efficient tool to discover diagnostic biomarkers for guiding laborious clinical trials.


Subject(s)
Computational Biology/methods , MicroRNAs/genetics , Biomarkers/metabolism , Carcinoma, Hepatocellular/genetics , Carcinoma, Renal Cell/genetics , Computer Simulation , Humans , Kidney Neoplasms/genetics , Liver Neoplasms/genetics
7.
Mol Genet Genomics ; 296(1): 223-233, 2021 Jan.
Article in English | MEDLINE | ID: mdl-33159254

ABSTRACT

Circular RNAs (circRNAs) are a special class of non-coding RNAs with covalently closed-loop structures. Studies prove that circRNAs perform critical roles in various biological processes, and the aberrant expression of circRNAs is closely related to tumorigenesis. Therefore, identifying potential circRNA-disease associations is beneficial to understand the pathogenesis of complex diseases at the circRNA level and helps biomedical researchers and practitioners to discover diagnostic biomarkers accurately. However, it is tremendously laborious and time-consuming to discover disease-related circRNAs with conventional biological experiments. In this study, we develop an integrative framework, called iCDA-CMG, to predict potential associations between circRNAs and diseases. By incorporating multi-source prior knowledge, including known circRNA-disease associations, disease similarities and circRNA similarities, we adopt a collective matrix completion-based graph learning model to prioritize the most promising disease-related circRNAs for guiding laborious clinical trials. The results show that iCDA-CMG outperforms other state-of-the-art models in terms of cross-validation and independent prediction. Moreover, the case studies for several representative cancers suggest the effectiveness of iCDA-CMG in screening circRNA candidates for human diseases, which will contribute to elucidating the pathogenesis mechanisms and unveiling new opportunities for disease diagnosis and targeted therapy.


Subject(s)
Algorithms , Models, Statistical , Neoplasms/genetics , RNA, Circular/genetics , RNA, Neoplasm/genetics , Computational Biology/methods , Datasets as Topic , Humans , Models, Genetic , Neoplasms/classification , Neoplasms/diagnosis , Neoplasms/pathology , RNA, Circular/metabolism , RNA, Neoplasm/metabolism , Research Design
8.
IEEE Trans Nanobioscience ; 19(3): 556-561, 2020 07.
Article in English | MEDLINE | ID: mdl-32340955

ABSTRACT

Due to technological advances the quality and availability of biological data has increased dramatically in the last decade. Analysing protein-protein interaction networks (PPINs) in an integrated way, together with subcellular compartment data, provides such biological context, helps to fill in the gaps between a single type of biological data and genes causing diseases and can identify novel genes related to disease. In this study, we present BCCGD, a method for integrating subcellular localization data with PPINs that detects breast cancer candidate genes in protein complexes. We achieve this by defining the significance of the compartment, constructing edge-weighted PPINs, finding protein complexes with a non-negative matrix factorization approach, generating disease-specific networks based on the known disease genes, prioritizing disease candidate genes with a WDC method. As a case study, we investigate the breast cancer but the techniques described here are applicable to other disorders. For the top genes scored by BCCGD approach, we utilize the literature retrieving method to test the correlations of them with the breast cancer. The results show that BCCGD discover some novel breast cancer candidate genes which are valuable references for the biomedical scientists.


Subject(s)
Biomarkers, Tumor/genetics , Breast Neoplasms/genetics , Intracellular Space/genetics , Protein Interaction Mapping/methods , Protein Interaction Maps/genetics , Biomarkers, Tumor/metabolism , Breast Neoplasms/metabolism , Computational Biology , Databases, Factual , Female , Humans , Intracellular Space/metabolism
9.
Genome Med ; 12(1): 15, 2020 02 17.
Article in English | MEDLINE | ID: mdl-32066500

ABSTRACT

BACKGROUND: While clinical factors such as age, grade, stage, and histological subtype provide physicians with information about patient prognosis, genomic data can further improve these predictions. Previous studies have shown that germline variants in known cancer driver genes are predictive of patient outcome, but no study has systematically analyzed multiple cancers in an unbiased way to identify genetic loci that can improve patient outcome predictions made using clinical factors. METHODS: We analyzed sequencing data from the over 10,000 cancer patients available through The Cancer Genome Atlas to identify germline variants associated with patient outcome using multivariate Cox regression models. RESULTS: We identified 79 prognostic germline variants in individual cancers and 112 prognostic germline variants in groups of cancers. The germline variants identified in individual cancers provide additional predictive power about patient outcomes beyond clinical information currently in use and may therefore augment clinical decisions based on expected tumor aggressiveness. Molecularly, at least 12 of the germline variants are likely associated with patient outcome through perturbation of protein structure and at least five through association with gene expression differences. Almost half of these germline variants are in previously reported tumor suppressors, oncogenes or cancer driver genes with the other half pointing to genomic loci that should be further investigated for their roles in cancers. CONCLUSIONS: Germline variants are predictive of outcome in cancer patients and specific germline variants can improve patient outcome predictions beyond predictions made using clinical factors alone. The germline variants also implicate new means by which known oncogenes, tumor suppressor genes, and driver genes are perturbed in cancer and suggest roles in cancer for other genes that have not been extensively studied in oncology. Further studies in other cancer cohorts are necessary to confirm that germline variation is associated with outcome in cancer patients as this is a proof-of-principle study.


Subject(s)
Biomarkers, Tumor/genetics , Germ-Line Mutation , Neoplasms/genetics , Genetic Testing/statistics & numerical data , Humans , Neoplasms/pathology , Oncogene Proteins/genetics , Polymorphism, Single Nucleotide , Quantitative Trait Loci , Tumor Suppressor Proteins/genetics
10.
Comp Med ; 69(4): 311-320, 2019 08 01.
Article in English | MEDLINE | ID: mdl-31375150

ABSTRACT

Methicillin-resistant Staphylococcus aureus (MRSA) carriage and infection are well documented in the human and veterinary literature; however only limited information is available regarding MRSA carriage and infection in laboratory NHP populations. The objective of this study was to characterize MRSA carriage in a representative research colony of rhesus and cynomolgus macaques through a cross-sectional analysis of 300 animals. MRSA carriage was determined by using nasal culture. Demographic characteristics of carriers and noncarriers were compared to determine factors linked to increased risk of carriage, and MRSA isolates were analyzed to determine antimicrobial susceptibility patterns, staphylococcal chromosome cassette mec (SCCmec) type, and multilocus sequence type (ST). Culture results demonstrated MRSA carriage in 6.3% of the study population. Animals with greater numbers of veterinary or experimental interventions including antibiotic administration, steroid administration, dental procedures, and surgery were more likely to carry MRSA. Susceptibility results indicated that MRSA isolates were resistant to ß-lactams, and all isolates were resistant to between 1 and 4 non ß-lactam antibiotics. In addition, 73.7% of MRSA isolates were identified as ST188-SCCmec IV, an isolate previously observed in an unrelated population of macaques and 15.8% were ST3268-SCCmec V, which has only been described in macaques. A single isolate had a novel sequence type, ST3478, and carried SCCmec V. These results suggest that NHP-adapted strains of MRSA exist and highlight the emergence of antimicrobial resistance in laboratory NHP populations.


Subject(s)
Macaca fascicularis , Macaca mulatta , Methicillin-Resistant Staphylococcus aureus/drug effects , Staphylococcal Infections/veterinary , Animals , Anti-Bacterial Agents/therapeutic use , Cross-Sectional Studies , Methicillin-Resistant Staphylococcus aureus/isolation & purification
11.
Mol Neurobiol ; 56(7): 4786-4798, 2019 Jul.
Article in English | MEDLINE | ID: mdl-30392137

ABSTRACT

Diffuse low-grade and intermediate-grade gliomas (together known as lower grade gliomas, WHO grade II and III) develop in the supporting glial cells of brain and are the most common types of primary brain tumor. Despite a better prognosis for lower grade gliomas, 70% of patients undergo high-grade transformation within 10 years, stressing the importance of better prognosis. Long non-coding RNAs (lncRNAs) are gaining attention as potential biomarkers for cancer diagnosis and prognosis. We have developed a computational model, UVA8, for prognosis of lower grade gliomas by combining lncRNA expression, Cox regression, and L1-LASSO penalization. The model was trained on a subset of patients in TCGA. Patients in TCGA, as well as a completely independent validation set (CGGA) could be dichotomized based on their risk score, a linear combination of the level of each prognostic lncRNA weighted by its multivariable Cox regression coefficient. UVA8 is an independent predictor of survival and outperforms standard epidemiological approaches and previous published lncRNA-based predictors as a survival model. Guilt-by-association studies of the lncRNAs in UVA8, all of which predict good outcome, suggest they have a role in suppressing interferon-stimulated response and epithelial to mesenchymal transition. The expression levels of eight lncRNAs can be combined to produce a prognostic tool applicable to diverse populations of glioma patients. The 8 lncRNA (UVA8) based score can identify grade II and grade III glioma patients with poor outcome, and thus identify patients who should receive more aggressive therapy at the outset.


Subject(s)
Brain Neoplasms/genetics , Brain Neoplasms/pathology , Gene Expression Regulation, Neoplastic , Glioma/genetics , Glioma/pathology , RNA, Long Noncoding/genetics , Humans , Interferons/metabolism , Kaplan-Meier Estimate , Neoplasm Grading , Prognosis , RNA, Long Noncoding/metabolism , Risk Factors , Signal Transduction
12.
BMC Bioinformatics ; 19(Suppl 20): 507, 2018 Dec 21.
Article in English | MEDLINE | ID: mdl-30577839

ABSTRACT

BACKGROUND: In biomedical information extraction, event extraction plays a crucial role. Biological events are used to describe the dynamic effects or relationships between biological entities such as proteins and genes. Event extraction is generally divided into trigger detection and argument recognition. The performance of trigger detection directly affects the results of the event extraction. In general, the traditional method is used to address the trigger detection as a classification task, as well as the use of machine learning or rules method, which construct many features to improve the classification results. Moreover, the classification model only recognizes triggers composed of single words, whereas for multiple words, the result is unsatisfactory. RESULTS: The corpus of our model is MLEE. If we were to only use the biomedical LSTM and CRF model without other features, the F-score would reach about 78.08%. Comparing entity to part of speech (POS), we find the entity features more conducive to the improvement of performance of detection, with the F-score potentially reaching about 80%. Furthermore, we also experiment on the other three corpora (BioNLP 2009, BioNLP 2011, and BioNLP 2013) to verify the generalization of our model. Hence, F-scores can reach more than 60%, which are better than the comparative experiments. CONCLUSIONS: The trigger recognition method based on the sequence annotation model does not require initial complex feature engineering, and only requires a simple labeling mechanism to complete the training. Therefore, generalization of our model is better compared to other traditional models. Secondly, this method can identify multi-word triggers, thereby improving the F-scores of trigger recognition. Thirdly, details on the entity have a crucial impact on trigger detection. Finally, the combination of character-level word embedding and word-level word embedding provides increasingly effective information for the model; therefore, it is a key to the success of the experiment.


Subject(s)
Algorithms , Biomedical Research , Semantics , Information Storage and Retrieval , Machine Learning
13.
IEEE Trans Nanobioscience ; 17(3): 243-250, 2018 07.
Article in English | MEDLINE | ID: mdl-29993553

ABSTRACT

Essential proteins as a vital part of maintaining the cells' life play an important role in the study of biology and drug design. With the generation of large amounts of biological data related to essential proteins, an increasing number of computational methods have been proposed. Different from the methods which adopt a single machine learning method or an ensemble machine learning method, this paper proposes a predicting framework named by XGBFEMF for identifying essential proteins, which includes a SUB-EXPAND-SHRINK method for constructing the composite features with original features and obtaining the better subset of features for essential protein prediction, and also includes a model fusion method for getting a more effective prediction model. We carry out experiments on Yeast data to assess the performance of the XGBFEMF with ROC analysis, accuracy analysis, and top analysis. Meanwhile, we set up experiments on E. coli data for the validation of performance. The test results show that the XGBFEMF framework can effectively improve many essential indicators. In addition, we analyze each step in the XGBFEMF framework; our results show that both each step of the SUB-EXPAND-SHRINK method as well as the step of multi-model fusion can improve prediction performance.


Subject(s)
Computational Biology/methods , Protein Interaction Mapping/methods , Proteins , Algorithms , Databases, Protein , Proteins/classification , Proteins/physiology , Software
14.
Ann Surg Oncol ; 25(1): 131-136, 2018 Jan.
Article in English | MEDLINE | ID: mdl-29134380

ABSTRACT

BACKGROUND: Radioactive seed localization (RSL) is a safe and effective alternative to wire localization (WL) for nonpalpable breast lesions. While several large academic institutions currently utilize RSL, few community hospitals have adopted this technique. OBJECTIVE: The aim of this study was to examine the experience of RSL versus WL at a large community hospital. METHODS: A retrospective chart review of patients who underwent RSL or WL for breast-conserving surgery from 1 November 2013 to 31 November 2015. RESULTS: The total number of lesions examined was 382. RSL was utilized in 205 (54%) lesions, with 187 undergoing single RSL, while WL was used in 155 (40%) lesions, with 109 undergoing single WL; both techniques were used in 22 (6%) lesions. Pathology was benign in 142 (48%) lesions, with 93 RSLs and 49 WLs. For malignant lesions, mean specimen size was 36.3 g for single RSL and 35.9 g for single WL (p = 0.904). Re-excision for margin clearance was required for 16 (17%) malignant lesions in the RSL group and 10 (17%) in the WL group (p = 0.954). For malignant lesions, mean operating room time was 86 min for single RSL versus 70 min for single WL (p = 0.014). CONCLUSIONS: The use of RSL is a viable option in the community setting, with several benefits over WL. While operative times were slightly longer with RSL, there was no difference in specimen size or re-excision rate for malignant lesions.


Subject(s)
Breast Neoplasms/diagnostic imaging , Breast Neoplasms/surgery , Carcinoma, Ductal, Breast/diagnostic imaging , Carcinoma, Ductal, Breast/surgery , Carcinoma, Intraductal, Noninfiltrating/diagnostic imaging , Carcinoma, Intraductal, Noninfiltrating/surgery , Fiducial Markers , Adult , Aged , Breast Neoplasms/pathology , Carcinoma, Ductal, Breast/secondary , Carcinoma, Intraductal, Noninfiltrating/secondary , Female , Hospitals, Community , Humans , Lymphatic Metastasis , Margins of Excision , Mastectomy, Segmental , Middle Aged , Operative Time , Radioisotopes , Reoperation , Retrospective Studies , Tumor Burden
15.
BMC Bioinformatics ; 18(Suppl 13): 470, 2017 Dec 01.
Article in English | MEDLINE | ID: mdl-29219067

ABSTRACT

BACKGROUND: Essential proteins are indispensable to the survival and development process of living organisms. To understand the functional mechanisms of essential proteins, which can be applied to the analysis of disease and design of drugs, it is important to identify essential proteins from a set of proteins first. As traditional experimental methods designed to test out essential proteins are usually expensive and laborious, computational methods, which utilize biological and topological features of proteins, have attracted more attention in recent years. Protein-protein interaction networks, together with other biological data, have been explored to improve the performance of essential protein prediction. RESULTS: The proposed method SCP is evaluated on Saccharomyces cerevisiae datasets and compared with five other methods. The results show that our method SCP outperforms the other five methods in terms of accuracy of essential protein prediction. CONCLUSIONS: In this paper, we propose a novel algorithm named SCP, which combines the ranking by a modified PageRank algorithm based on subcellular compartments information, with the ranking by Pearson correlation coefficient (PCC) calculated from gene expression data. Experiments show that subcellular localization information is promising in boosting essential protein prediction.


Subject(s)
Algorithms , Computational Biology/methods , Gene Expression Regulation, Fungal , Genes, Essential , Protein Interaction Maps , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae Proteins/genetics , Subcellular Fractions
16.
BMC Genomics ; 17 Suppl 4: 433, 2016 08 18.
Article in English | MEDLINE | ID: mdl-27535125

ABSTRACT

BACKGROUND: Diabetes mellitus characterized by hyperglycemia as a result of insufficient production of or reduced sensitivity to insulin poses a growing threat to the health of people. It is a heterogeneous disorder with multiple etiologies consisting of type 1 diabetes, type 2 diabetes, gestational diabetes and so on. Diabetes-associated protein/gene prediction is a key step to understand the cellular mechanisms related to diabetes mellitus. Compared with experimental methods, computational predictions of candidate proteins/genes are cheaper and more effortless. Protein-protein interaction (PPI) data produced by the high-throughput technology have been used to prioritize candidate disease genes/proteins. However, the false interactions in the PPI data seriously hurt computational methods performance. In order to address that particular question, new methods are developed to identify candidate disease genes/proteins via integrating biological data from other sources. RESULTS: In this study, a new framework called PDMG is proposed to predict candidate disease genes/proteins. First, the weighted networks are building in terms of the combination of the subcellular localization information and PPI data. To form the weighted networks, the importance of each compartment is evaluated based on the number of interacted proteins in this compartment. This is because the very different roles played by different compartments in cell activities. Besides, some compartments are more important than others. Based on the evaluated compartments, the interactions between proteins are scored and the weighted PPI networks are constructed. Second, the known disease genes are extracted from OMIM database as the seed genes to expand disease-specific networks based on the weighted networks. Third, the weighted values between a protein and its neighbors in the disease-related networks are added together and the sum is as the score of the protein. Last but not least, the proteins are ranked based on descending order of their scores. The candidate proteins in the top are considered to be associated with the diseases and are potential disease-related proteins. Various types of data, such as type 2 diabetes-associated genes, subcellular localizations and protein interactions, are used to test PDMG method. CONCLUSIONS: The results show that the proteins/genes functionally exerting a direct influence over diabetes are consistently placed at the head of the queue. PDMG expands and ranks 445 candidate proteins from the seed set including original 27 type 2 diabetes proteins. Out of the top 27 proteins, 14 proteins are the real type 2 diabetes proteins. The literature extracted from the PubMed database has proved that, out of 13 novel proteins, 8 proteins are associated with diabetes.


Subject(s)
Computational Biology/methods , Diabetes Mellitus, Type 2/genetics , Protein Interaction Mapping/methods , Protein Interaction Maps/genetics , Algorithms , Humans , Proteins/genetics , Proteins/metabolism , Software
17.
Biomed Res Int ; 2014: 354539, 2014.
Article in English | MEDLINE | ID: mdl-24818139

ABSTRACT

Most biological processes are carried out by protein complexes. A substantial number of false positives of the protein-protein interaction (PPI) data can compromise the utility of the datasets for complexes reconstruction. In order to reduce the impact of such discrepancies, a number of data integration and affinity scoring schemes have been devised. The methods encode the reliabilities (confidence) of physical interactions between pairs of proteins. The challenge now is to identify novel and meaningful protein complexes from the weighted PPI network. To address this problem, a novel protein complex mining algorithm ClusterBFS (Cluster with Breadth-First Search) is proposed. Based on the weighted density, ClusterBFS detects protein complexes of the weighted network by the breadth first search algorithm, which originates from a given seed protein used as starting-point. The experimental results show that ClusterBFS performs significantly better than the other computational approaches in terms of the identification of protein complexes.


Subject(s)
Algorithms , Multiprotein Complexes/metabolism , Cluster Analysis , Computational Biology/methods , Databases, Protein , Humans , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/metabolism
18.
Article in English | MEDLINE | ID: mdl-26355787

ABSTRACT

Essential proteins are vital for an organism's viability under a variety of conditions. There are many experimental and computational methods developed to identify essential proteins. Computational prediction of essential proteins based on the global protein-protein interaction (PPI) network is severely restricted because of the insufficiency of the PPI data, but fortunately the gene expression profiles help to make up the deficiency. In this work, Pearson correlation coefficient (PCC) is used to bridge the gap between PPI and gene expression data. Based on PCC and edge clustering coefficient (ECC), a new centrality measure, i.e., the weighted degree centrality (WDC), is developed to achieve the reliable prediction of essential proteins. WDC is employed to identify essential proteins in the yeast PPI and e-Coli networks in order to estimate its performance. For comparison, other prediction technologies are also performed to identify essential proteins. Some evaluation methods are used to analyze the results from various prediction approaches. The prediction results and comparative analyses are shown in the paper. Furthermore, the parameter λ in the method WDC will be analyzed in detail and an optimal λ value will be found. Based on the optimal λ value, the differentiation of WDC and another prediction method PeC is discussed. The analyses prove that WDC outperforms other methods including DC, BC, CC, SC, EC, IC, NC, and PeC. At the same time, the analyses also mean that it is an effective way to predict essential proteins by means of integrating different data sources.


Subject(s)
Computational Biology/methods , Protein Interaction Maps/genetics , Proteins/chemistry , Proteins/metabolism , Transcriptome/genetics , Cluster Analysis , Proteins/genetics , ROC Curve
19.
IET Syst Biol ; 7(5): 223-30, 2013 Oct.
Article in English | MEDLINE | ID: mdl-24067423

ABSTRACT

Protein complexes are a cornerstone of many biological processes. Protein-protein interaction (PPI) data enable a number of computational methods for predicting protein complexes. However, the insufficiency of the PPI data significantly lowers the accuracy of computational methods. In the current work, the authors develop a novel method named clustering based on multiple biological information (CMBI) to discover protein complexes via the integration of multiple biological resources including gene expression profiles, essential protein information and PPI data. First, CMBI defines the functional similarity of each pair of interacting proteins based on the edge-clustering coefficient and the Pearson correlation coefficient. Second, CMBI selects essential proteins as seeds to build the protein complexes. A redundancy-filtering procedure is performed to eliminate redundant complexes. In addition to the essential proteins, CMBI also uses other proteins as seeds to expand protein complexes. To check the performance of CMBI, the authors compare the complexes discovered by CMBI with the ones found by other techniques by matching the predicted complexes against the reference complexes. The authors use subsequently GO::TermFinder to analyse the complexes predicted by various methods. Finally, the effect of parameters T and R is investigated. The results from GO functional enrichment and matching analyses show that CMBI performs significantly better than the state-of-the-art methods.


Subject(s)
Cluster Analysis , Computational Biology/methods , Protein Interaction Maps , Saccharomyces cerevisiae Proteins/chemistry , Algorithms , Gene Expression Profiling , Genes, Fungal , Models, Statistical , Protein Interaction Mapping , Saccharomyces cerevisiae/metabolism , Software
20.
BMC Bioinformatics ; 12: 339, 2011 Aug 15.
Article in English | MEDLINE | ID: mdl-21849017

ABSTRACT

BACKGROUND: Cellular systems are highly dynamic and responsive to cues from the environment. Cellular function and response patterns to external stimuli are regulated by biological networks. A protein-protein interaction (PPI) network with static connectivity is dynamic in the sense that the nodes implement so-called functional activities that evolve in time. The shift from static to dynamic network analysis is essential for further understanding of molecular systems. RESULTS: In this paper, Time Course Protein Interaction Networks (TC-PINs) are reconstructed by incorporating time series gene expression into PPI networks. Then, a clustering algorithm is used to create functional modules from three kinds of networks: the TC-PINs, a static PPI network and a pseudorandom network. For the functional modules from the TC-PINs, repetitive modules and modules contained within bigger modules are removed. Finally, matching and GO enrichment analyses are performed to compare the functional modules detected from those networks. CONCLUSIONS: The comparative analyses show that the functional modules from the TC-PINs have much more significant biological meaning than those from static PPI networks. Moreover, it implies that many studies on static PPI networks can be done on the TC-PINs and accordingly, the experimental results are much more satisfactory. The 36 PPI networks corresponding to 36 time points, identified as part of this study, and other materials are available at http://bioinfo.csu.edu.cn/txw/TC-PINs.


Subject(s)
Algorithms , Protein Interaction Maps , Proteins/metabolism , Cluster Analysis , Databases, Genetic , Saccharomyces cerevisiae/metabolism , Signal Transduction , Time Factors
SELECTION OF CITATIONS
SEARCH DETAIL
...