Búsqueda | Portal Regional de la BVS

1.

Drug repurposing based on the DTD-GNN graph neural network: revealing the relationships among drugs, targets and diseases.

Li, Wenjun; Ma, Wanjun; Yang, Mengyun; Tang, Xiwei.

BMC Genomics ; 25(1): 584, 2024 Jun 11.

Artículo en Inglés | MEDLINE | ID: mdl-38862928

RESUMEN

MOTIVATION: The rational modelling of the relationship among drugs, targets and diseases is crucial for drug retargeting. While significant progress has been made in studying binary relationships, further research is needed to deepen our understanding of ternary relationships. The application of graph neural networks in drug retargeting is increasing, but further research is needed to determine the appropriate modelling method for ternary relationships and how to capture their complex multi-feature structure. RESULTS: The aim of this study was to construct relationships among drug, targets and diseases. To represent the complex relationships among these entities, we used a heterogeneous graph structure. Additionally, we propose a DTD-GNN model that combines graph convolutional networks and graph attention networks to learn feature representations and association information, facilitating a more thorough exploration of the relationships. The experimental results demonstrate that the DTD-GNN model outperforms other graph neural network models in terms of AUC, Precision, and F1-score. The study has important implications for gaining a comprehensive understanding of the relationships between drugs and diseases, as well as for further research and application in exploring the mechanisms of drug-disease interactions. The study reveals these relationships, providing possibilities for innovative therapeutic strategies in medicine.

Asunto(s)

Reposicionamiento de Medicamentos , Redes Neurales de la Computación , Reposicionamiento de Medicamentos/métodos , Humanos , Algoritmos , Biología Computacional/métodos

2.

Multivariate Temporal Point Process Regression.

Tang, Xiwei; Li, Lexin.

J Am Stat Assoc ; 118(542): 830-845, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-37519438

RESUMEN

Point process modeling is gaining increasing attention, as point process type data are emerging in a large variety of scientific applications. In this article, motivated by a neuronal spike trains study, we propose a novel point process regression model, where both the response and the predictor can be a high-dimensional point process. We model the predictor effects through the conditional intensities using a set of basis transferring functions in a convolutional fashion. We organize the corresponding transferring coefficients in the form of a three-way tensor, then impose the low-rank, sparsity, and subgroup structures on this coefficient tensor. These structures help reduce the dimensionality, integrate information across different individual processes, and facilitate the interpretation. We develop a highly scalable optimization algorithm for parameter estimation. We derive the large sample error bound for the recovered coefficient tensor, and establish the subgroup identification consistency, while allowing the dimension of the multivariate point process to diverge. We demonstrate the efficacy of our method through both simulations and a cross-area neuronal spike trains analysis in a sensory cortex study.

3.

Predicting Plant miRNA-lncRNA Interactions via a Deep Learning Method.

Tang, Xiwei; Ji, Lu.

IEEE Trans Nanobioscience ; 22(4): 728-733, 2023 10.

Artículo en Inglés | MEDLINE | ID: mdl-37167036

RESUMEN

In recent years, due to the contribution to elucidating the functional mechanisms of miRNAs and lncRNAs, the research on miRNA-lncRNA interaction prediction has increased exponentially. However, the prediction research is challenging in bioinformatics domain. It is expensive and time-consuming to verify the interactions by biological experiments. The existing prediction models have some limitations, such as the need to manually extract features, the potential loss of features from pre-treatment approaches, long-distance dependency to be solved, and so on. Additionally, most of the current models prefer to the animal data. However, the establishment of an efficient and accurate plant miRNA-lncRNA interaction prediction model is necessary. In this work, a new deep learning model called PmlIPM is presented to infer plant miRNA-lncRNA associations. PmlIPM is a four-step framework including Input Embedding, Positional Encoding, Multi-Head Attention and Max Pooling. PmlIPM accepts separately input of miRNA and lncRNA to extract sequence features, avoiding information loss caused by direct splicing the two sequences as model inputs. The attention mechanisms give the model the ability to capture long distance features. PmlIPM is compared with the existing models on 2 benchmark datasets. The results show that our model performs better than other methods and obtains AUC scores of 0.8412, 0.8587, 0.9666 and 0.9225 in the four independent test sets of Arabidopsis lyrata (A.ly), Solanum lycopersicum (S.ly), Brachypodium distachyon (B.di) and Solanum tuberosum (S.tu), respectively.

Asunto(s)

Arabidopsis , Aprendizaje Profundo , MicroARNs , ARN Largo no Codificante , Animales , MicroARNs/genética , ARN Largo no Codificante/genética , Biología Computacional/métodos , Arabidopsis/genética

4.

Genetic Evidence for a Causal Relationship between Hyperlipidemia and Type 2 Diabetes in Mice.

Shi, Lisa J; Tang, Xiwei; He, Jiang; Shi, Weibin.

Int J Mol Sci ; 23(11)2022 May 31.

Artículo en Inglés | MEDLINE | ID: mdl-35682864

RESUMEN

Dyslipidemia is considered a risk factor for type 2 diabetes (T2D), yet studies with statins and candidate genes suggest that circulating lipids may protect against T2D development. Apoe-null (Apoe-/-) mouse strains develop spontaneous dyslipidemia and exhibit a wide variation in susceptibility to diet-induced T2D. We thus used Apoe-/- mice to elucidate phenotypic and genetic relationships of circulating lipids with T2D. A male F2 cohort was generated from an intercross between LP/J and BALB/cJ Apoe-/- mice and fed 12 weeks of a Western diet. Fasting, non-fasting plasma glucose, and lipid levels were measured and genotyping was performed using miniMUGA arrays. We uncovered a major QTL near 60 Mb on chromosome 15, Nhdlq18, which affected non-HDL cholesterol and triglyceride levels under both fasting and non-fasting states. This QTL was coincident with Bglu20, a QTL that modulates fasting and non-fasting glucose levels. The plasma levels of non-HDL cholesterol and triglycerides were closely correlated with the plasma glucose levels in F2 mice. Bglu20 disappeared after adjustment for non-HDL cholesterol or triglycerides. These results demonstrate a causative role for dyslipidemia in T2D development in mice.

Asunto(s)

Diabetes Mellitus Tipo 2 , Dislipidemias , Hiperlipidemias , Animales , Apolipoproteínas E/genética , Glucemia , Colesterol , Cruzamientos Genéticos , Diabetes Mellitus Tipo 2/genética , Dislipidemias/genética , Humanos , Hiperlipidemias/genética , Masculino , Ratones , Ratones Noqueados , Sitios de Carácter Cuantitativo , Triglicéridos

5.

Heterogeneous Mediation Analysis on Epigenomic PTSD and Traumatic Stress in a Predominantly African American Cohort.

Xue, Fei; Tang, Xiwei; Kim, Grace; Koenen, Karestan C; Martin, Chantel L; Galea, Sandro; Wildman, Derek; Uddin, Monica; Qu, Annie.

J Am Stat Assoc ; 117(540): 1669-1683, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-36875798

RESUMEN

DNA methylation (DNAm) has been suggested to play a critical role in post-traumatic stress disorder (PTSD), through mediating the relationship between trauma and PTSD. However, this underlying mechanism of PTSD for African Americans still remains unknown. To fill this gap, in this article, we investigate how DNAm mediates the effects of traumatic experiences on PTSD symptoms in the Detroit Neighborhood Health Study (DNHS) (2008-2013) which involves primarily African Americans adults. To achieve this, we develop a new mediation analysis approach for high-dimensional potential DNAm mediators. A key novelty of our method is that we consider heterogeneity in mediation effects across subpopulations. Specifically, mediators in different subpopulations could have opposite effects on the outcome, and thus could be difficult to identify under a traditional homogeneous model framework. In contrast, the proposed method can estimate heterogeneous mediation effects and identifies subpopulations in which individuals share similar effects. Simulation studies demonstrate that the proposed method outperforms existing methods for both homogeneous and heterogeneous data. We also present our mediation analysis results of a dataset with 125 participants and more than 450,000 CpG sites from the DNHS study. The proposed method finds three subgroups of subjects and identifies DNAm mediators corresponding to genes such as HSP90AA1 and NFATC1 which have been linked to PTSD symptoms in literature. Our finding could be useful in future finer-grained investigation of PTSD mechanism and in the development of new treatments for PTSD.

6.

Hyperlipidemia Influences the Accuracy of Glucometer-Measured Blood Glucose Concentrations in Genetically Diverse Mice.

Shi, Lisa J; Tang, Xiwei; He, Jiang; Shi, Weibin.

Am J Med Sci ; 362(3): 297-302, 2021 09.

Artículo en Inglés | MEDLINE | ID: mdl-34197739

RESUMEN

BACKGROUND: Glucometers are widely used in animal research due to simplicity and ease of utilization, but their accuracy in blood glucose assessment for hyperlipidemic mice is unknown. METHODS: Here, we compared blood glucose levels measured by a glucometer with plasma glucose levels measured by a standard enzymatic assay for 325 genetically diverse F2 mice derived from LP and BALB/c (BALB) Apoe-/- mice. Non-fasting glucose levels were measured before initiation of a Western diet and after 11 weeks on the diet. RESULTS: On chow diet, lab-measured plasma glucose levels were 279.5 ± 42.6 mg/dl (mean ± SD), while blood glucose values measured by glucometer were 138.7 ± 16.6 mg/dl. The two measures had no correlation (R2 = 0.006, p = 0.167). On the Western diet, plasma glucose levels rose to 351.1 ± 121.6 mg/dl, while glucometer-measured blood glucose fell to 128.7 ± 27.9 mg/dl. The two measures showed a moderate correlation (R2 = 0.111, p = 3.1E-9). Lab-measured plasma glucose showed strong positive correlations with plasma triglyceride and non-high-density lipoprotein cholesterol levels, while glucometer-measured blood glucose showed an inverse correlation with non-high-density lipoprotein levels on the chow diet. CONCLUSIONS: Our results indicate that hyperlipidemia affects the accuracy of glucometers in measuring blood glucose levels of mice.

Asunto(s)

Análisis Químico de la Sangre/normas , Glucemia/genética , Glucemia/metabolismo , Variación Genética/fisiología , Hiperlipidemias/sangre , Hiperlipidemias/genética , Animales , Femenino , Masculino , Ratones , Ratones Endogámicos BALB C , Ratones Noqueados

7.

Adaptive multi-source multi-view latent feature learning for inferring potential disease-associated miRNAs.

Xiao, Qiu; Zhang, Ning; Luo, Jiawei; Dai, Jianhua; Tang, Xiwei.

Brief Bioinform ; 22(2): 2043-2057, 2021 03 22.

Artículo en Inglés | MEDLINE | ID: mdl-32186712

RESUMEN

Accumulating evidence has shown that microRNAs (miRNAs) play crucial roles in different biological processes, and their mutations and dysregulations have been proved to contribute to tumorigenesis. In silico identification of disease-associated miRNAs is a cost-effective strategy to discover those most promising biomarkers for disease diagnosis and treatment. The increasing available omics data sources provide unprecedented opportunities to decipher the underlying relationships between miRNAs and diseases by computational models. However, most existing methods are biased towards a single representation of miRNAs or diseases and are also not capable of discovering unobserved associations for new miRNAs or diseases without association information. In this study, we present a novel computational method with adaptive multi-source multi-view latent feature learning (M2LFL) to infer potential disease-associated miRNAs. First, we adopt multiple data sources to obtain similarity profiles and capture different latent features according to the geometric characteristic of miRNA and disease spaces. Then, the multi-modal latent features are projected to a common subspace to discover unobserved miRNA-disease associations in both miRNA and disease views, and an adaptive joint graph regularization term is developed to preserve the intrinsic manifold structures of multiple similarity profiles. Meanwhile, the Lp,q-norms are imposed into the projection matrices to ensure the sparsity and improve interpretability. The experimental results confirm the superior performance of our proposed method in screening reliable candidate disease miRNAs, which suggests that M2LFL could be an efficient tool to discover diagnostic biomarkers for guiding laborious clinical trials.

Asunto(s)

Biología Computacional/métodos , MicroARNs/genética , Biomarcadores/metabolismo , Carcinoma Hepatocelular/genética , Carcinoma de Células Renales/genética , Simulación por Computador , Humanos , Neoplasias Renales/genética , Neoplasias Hepáticas/genética

8.

iCDA-CMG: identifying circRNA-disease associations by federating multi-similarity fusion and collective matrix completion.

Xiao, Qiu; Zhong, Jiancheng; Tang, Xiwei; Luo, Jiawei.

Mol Genet Genomics ; 296(1): 223-233, 2021 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-33159254

RESUMEN

Circular RNAs (circRNAs) are a special class of non-coding RNAs with covalently closed-loop structures. Studies prove that circRNAs perform critical roles in various biological processes, and the aberrant expression of circRNAs is closely related to tumorigenesis. Therefore, identifying potential circRNA-disease associations is beneficial to understand the pathogenesis of complex diseases at the circRNA level and helps biomedical researchers and practitioners to discover diagnostic biomarkers accurately. However, it is tremendously laborious and time-consuming to discover disease-related circRNAs with conventional biological experiments. In this study, we develop an integrative framework, called iCDA-CMG, to predict potential associations between circRNAs and diseases. By incorporating multi-source prior knowledge, including known circRNA-disease associations, disease similarities and circRNA similarities, we adopt a collective matrix completion-based graph learning model to prioritize the most promising disease-related circRNAs for guiding laborious clinical trials. The results show that iCDA-CMG outperforms other state-of-the-art models in terms of cross-validation and independent prediction. Moreover, the case studies for several representative cancers suggest the effectiveness of iCDA-CMG in screening circRNA candidates for human diseases, which will contribute to elucidating the pathogenesis mechanisms and unveiling new opportunities for disease diagnosis and targeted therapy.

Asunto(s)

Algoritmos , Modelos Estadísticos , Neoplasias/genética , ARN Circular/genética , ARN Neoplásico/genética , Biología Computacional/métodos , Conjuntos de Datos como Asunto , Humanos , Modelos Genéticos , Neoplasias/clasificación , Neoplasias/diagnóstico , Neoplasias/patología , ARN Circular/metabolismo , ARN Neoplásico/metabolismo , Proyectos de Investigación

9.

Breast Cancer Candidate Gene Detection Through Integration of Subcellular Localization Data With Protein-Protein Interaction Networks.

Tang, Xiwei; Xiao, Qiu; Yu, Kai.

IEEE Trans Nanobioscience ; 19(3): 556-561, 2020 07.

Artículo en Inglés | MEDLINE | ID: mdl-32340955

RESUMEN

Due to technological advances the quality and availability of biological data has increased dramatically in the last decade. Analysing protein-protein interaction networks (PPINs) in an integrated way, together with subcellular compartment data, provides such biological context, helps to fill in the gaps between a single type of biological data and genes causing diseases and can identify novel genes related to disease. In this study, we present BCCGD, a method for integrating subcellular localization data with PPINs that detects breast cancer candidate genes in protein complexes. We achieve this by defining the significance of the compartment, constructing edge-weighted PPINs, finding protein complexes with a non-negative matrix factorization approach, generating disease-specific networks based on the known disease genes, prioritizing disease candidate genes with a WDC method. As a case study, we investigate the breast cancer but the techniques described here are applicable to other disorders. For the top genes scored by BCCGD approach, we utilize the literature retrieving method to test the correlations of them with the breast cancer. The results show that BCCGD discover some novel breast cancer candidate genes which are valuable references for the biomedical scientists.

Asunto(s)

Biomarcadores de Tumor/genética , Neoplasias de la Mama/genética , Espacio Intracelular/genética , Mapeo de Interacción de Proteínas/métodos , Mapas de Interacción de Proteínas/genética , Biomarcadores de Tumor/metabolismo , Neoplasias de la Mama/metabolismo , Biología Computacional , Bases de Datos Factuales , Femenino , Humanos , Espacio Intracelular/metabolismo

10.

The pan-cancer landscape of prognostic germline variants in 10,582 patients.

Chatrath, Ajay; Przanowska, Roza; Kiran, Shashi; Su, Zhangli; Saha, Shekhar; Wilson, Briana; Tsunematsu, Takaaki; Ahn, Ji-Hye; Lee, Kyung Yong; Paulsen, Teressa; Sobierajska, Ewelina; Kiran, Manjari; Tang, Xiwei; Li, Tianxi; Kumar, Pankaj; Ratan, Aakrosh; Dutta, Anindya.

Genome Med ; 12(1): 15, 2020 02 17.

Artículo en Inglés | MEDLINE | ID: mdl-32066500

RESUMEN

BACKGROUND: While clinical factors such as age, grade, stage, and histological subtype provide physicians with information about patient prognosis, genomic data can further improve these predictions. Previous studies have shown that germline variants in known cancer driver genes are predictive of patient outcome, but no study has systematically analyzed multiple cancers in an unbiased way to identify genetic loci that can improve patient outcome predictions made using clinical factors. METHODS: We analyzed sequencing data from the over 10,000 cancer patients available through The Cancer Genome Atlas to identify germline variants associated with patient outcome using multivariate Cox regression models. RESULTS: We identified 79 prognostic germline variants in individual cancers and 112 prognostic germline variants in groups of cancers. The germline variants identified in individual cancers provide additional predictive power about patient outcomes beyond clinical information currently in use and may therefore augment clinical decisions based on expected tumor aggressiveness. Molecularly, at least 12 of the germline variants are likely associated with patient outcome through perturbation of protein structure and at least five through association with gene expression differences. Almost half of these germline variants are in previously reported tumor suppressors, oncogenes or cancer driver genes with the other half pointing to genomic loci that should be further investigated for their roles in cancers. CONCLUSIONS: Germline variants are predictive of outcome in cancer patients and specific germline variants can improve patient outcome predictions beyond predictions made using clinical factors alone. The germline variants also implicate new means by which known oncogenes, tumor suppressor genes, and driver genes are perturbed in cancer and suggest roles in cancer for other genes that have not been extensively studied in oncology. Further studies in other cancer cohorts are necessary to confirm that germline variation is associated with outcome in cancer patients as this is a proof-of-principle study.

Asunto(s)

Biomarcadores de Tumor/genética , Mutación de Línea Germinal , Neoplasias/genética , Pruebas Genéticas/estadística & datos numéricos , Humanos , Neoplasias/patología , Proteínas Oncogénicas/genética , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Proteínas Supresoras de Tumor/genética

11.

Carriage of Methicillin-resistant Staphylococcus aureus in a Colony of Rhesus (Macaca mulatta) and Cynomolgus (Macaca fascicularis) Macaques.

Greenstein, Abigail W; Boyle-Vavra, Susan; Maddox, Carol W; Tang, Xiwei; Halliday, Lisa C; Fortman, Jeffrey D.

Comp Med ; 69(4): 311-320, 2019 08 01.

Artículo en Inglés | MEDLINE | ID: mdl-31375150

RESUMEN

Methicillin-resistant Staphylococcus aureus (MRSA) carriage and infection are well documented in the human and veterinary literature; however only limited information is available regarding MRSA carriage and infection in laboratory NHP populations. The objective of this study was to characterize MRSA carriage in a representative research colony of rhesus and cynomolgus macaques through a cross-sectional analysis of 300 animals. MRSA carriage was determined by using nasal culture. Demographic characteristics of carriers and noncarriers were compared to determine factors linked to increased risk of carriage, and MRSA isolates were analyzed to determine antimicrobial susceptibility patterns, staphylococcal chromosome cassette mec (SCCmec) type, and multilocus sequence type (ST). Culture results demonstrated MRSA carriage in 6.3% of the study population. Animals with greater numbers of veterinary or experimental interventions including antibiotic administration, steroid administration, dental procedures, and surgery were more likely to carry MRSA. Susceptibility results indicated that MRSA isolates were resistant to ß-lactams, and all isolates were resistant to between 1 and 4 non ß-lactam antibiotics. In addition, 73.7% of MRSA isolates were identified as ST188-SCCmec IV, an isolate previously observed in an unrelated population of macaques and 15.8% were ST3268-SCCmec V, which has only been described in macaques. A single isolate had a novel sequence type, ST3478, and carried SCCmec V. These results suggest that NHP-adapted strains of MRSA exist and highlight the emergence of antimicrobial resistance in laboratory NHP populations.

Asunto(s)

Macaca fascicularis , Macaca mulatta , Staphylococcus aureus Resistente a Meticilina/efectos de los fármacos , Infecciones Estafilocócicas/veterinaria , Animales , Antibacterianos/uso terapéutico , Estudios Transversales , Staphylococcus aureus Resistente a Meticilina/aislamiento & purificación

12.

A Prognostic Signature for Lower Grade Gliomas Based on Expression of Long Non-Coding RNAs.

Kiran, Manjari; Chatrath, Ajay; Tang, Xiwei; Keenan, Daniel Macrae; Dutta, Anindya.

Mol Neurobiol ; 56(7): 4786-4798, 2019 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-30392137

RESUMEN

Diffuse low-grade and intermediate-grade gliomas (together known as lower grade gliomas, WHO grade II and III) develop in the supporting glial cells of brain and are the most common types of primary brain tumor. Despite a better prognosis for lower grade gliomas, 70% of patients undergo high-grade transformation within 10 years, stressing the importance of better prognosis. Long non-coding RNAs (lncRNAs) are gaining attention as potential biomarkers for cancer diagnosis and prognosis. We have developed a computational model, UVA8, for prognosis of lower grade gliomas by combining lncRNA expression, Cox regression, and L1-LASSO penalization. The model was trained on a subset of patients in TCGA. Patients in TCGA, as well as a completely independent validation set (CGGA) could be dichotomized based on their risk score, a linear combination of the level of each prognostic lncRNA weighted by its multivariable Cox regression coefficient. UVA8 is an independent predictor of survival and outperforms standard epidemiological approaches and previous published lncRNA-based predictors as a survival model. Guilt-by-association studies of the lncRNAs in UVA8, all of which predict good outcome, suggest they have a role in suppressing interferon-stimulated response and epithelial to mesenchymal transition. The expression levels of eight lncRNAs can be combined to produce a prognostic tool applicable to diverse populations of glioma patients. The 8 lncRNA (UVA8) based score can identify grade II and grade III glioma patients with poor outcome, and thus identify patients who should receive more aggressive therapy at the outset.

Asunto(s)

Neoplasias Encefálicas/genética , Neoplasias Encefálicas/patología , Regulación Neoplásica de la Expresión Génica , Glioma/genética , Glioma/patología , ARN Largo no Codificante/genética , Humanos , Interferones/metabolismo , Estimación de Kaplan-Meier , Clasificación del Tumor , Pronóstico , ARN Largo no Codificante/metabolismo , Factores de Riesgo , Transducción de Señal

13.

Bidirectional long short-term memory with CRF for detecting biomedical event trigger in FastText semantic space.

Wang, Yan; Wang, Jian; Lin, Hongfei; Tang, Xiwei; Zhang, Shaowu; Li, Lishuang.

BMC Bioinformatics ; 19(Suppl 20): 507, 2018 Dec 21.

Artículo en Inglés | MEDLINE | ID: mdl-30577839

RESUMEN

BACKGROUND: In biomedical information extraction, event extraction plays a crucial role. Biological events are used to describe the dynamic effects or relationships between biological entities such as proteins and genes. Event extraction is generally divided into trigger detection and argument recognition. The performance of trigger detection directly affects the results of the event extraction. In general, the traditional method is used to address the trigger detection as a classification task, as well as the use of machine learning or rules method, which construct many features to improve the classification results. Moreover, the classification model only recognizes triggers composed of single words, whereas for multiple words, the result is unsatisfactory. RESULTS: The corpus of our model is MLEE. If we were to only use the biomedical LSTM and CRF model without other features, the F-score would reach about 78.08%. Comparing entity to part of speech (POS), we find the entity features more conducive to the improvement of performance of detection, with the F-score potentially reaching about 80%. Furthermore, we also experiment on the other three corpora (BioNLP 2009, BioNLP 2011, and BioNLP 2013) to verify the generalization of our model. Hence, F-scores can reach more than 60%, which are better than the comparative experiments. CONCLUSIONS: The trigger recognition method based on the sequence annotation model does not require initial complex feature engineering, and only requires a simple labeling mechanism to complete the training. Therefore, generalization of our model is better compared to other traditional models. Secondly, this method can identify multi-word triggers, thereby improving the F-scores of trigger recognition. Thirdly, details on the entity have a crucial impact on trigger detection. Finally, the combination of character-level word embedding and word-level word embedding provides increasingly effective information for the model; therefore, it is a key to the success of the experiment.

Asunto(s)

Algoritmos , Investigación Biomédica , Semántica , Almacenamiento y Recuperación de la Información , Aprendizaje Automático

14.

XGBFEMF: An XGBoost-Based Framework for Essential Protein Prediction.

Zhong, Jiancheng; Sun, Yusui; Peng, Wei; Xie, Minzhu; Yang, Jiahong; Tang, Xiwei.

IEEE Trans Nanobioscience ; 17(3): 243-250, 2018 07.

Artículo en Inglés | MEDLINE | ID: mdl-29993553

RESUMEN

Essential proteins as a vital part of maintaining the cells' life play an important role in the study of biology and drug design. With the generation of large amounts of biological data related to essential proteins, an increasing number of computational methods have been proposed. Different from the methods which adopt a single machine learning method or an ensemble machine learning method, this paper proposes a predicting framework named by XGBFEMF for identifying essential proteins, which includes a SUB-EXPAND-SHRINK method for constructing the composite features with original features and obtaining the better subset of features for essential protein prediction, and also includes a model fusion method for getting a more effective prediction model. We carry out experiments on Yeast data to assess the performance of the XGBFEMF with ROC analysis, accuracy analysis, and top analysis. Meanwhile, we set up experiments on E. coli data for the validation of performance. The test results show that the XGBFEMF framework can effectively improve many essential indicators. In addition, we analyze each step in the XGBFEMF framework; our results show that both each step of the SUB-EXPAND-SHRINK method as well as the step of multi-model fusion can improve prediction performance.

Asunto(s)

Biología Computacional/métodos , Mapeo de Interacción de Proteínas/métodos , Proteínas , Algoritmos , Bases de Datos de Proteínas , Proteínas/clasificación , Proteínas/fisiología , Programas Informáticos

15.

Radioactive Seed Localization Versus Wire Localization for Nonpalpable Breast Lesions: A Two-Year Initial Experience at a Large Community Hospital.

Stelle, Lacey; Schoenheit, Taylor; Brubaker, Allison; Tang, Xiwei; Qu, Peiyong; Cradock, Kimberly; Higham, Anna.

Ann Surg Oncol ; 25(1): 131-136, 2018 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-29134380

RESUMEN

BACKGROUND: Radioactive seed localization (RSL) is a safe and effective alternative to wire localization (WL) for nonpalpable breast lesions. While several large academic institutions currently utilize RSL, few community hospitals have adopted this technique. OBJECTIVE: The aim of this study was to examine the experience of RSL versus WL at a large community hospital. METHODS: A retrospective chart review of patients who underwent RSL or WL for breast-conserving surgery from 1 November 2013 to 31 November 2015. RESULTS: The total number of lesions examined was 382. RSL was utilized in 205 (54%) lesions, with 187 undergoing single RSL, while WL was used in 155 (40%) lesions, with 109 undergoing single WL; both techniques were used in 22 (6%) lesions. Pathology was benign in 142 (48%) lesions, with 93 RSLs and 49 WLs. For malignant lesions, mean specimen size was 36.3 g for single RSL and 35.9 g for single WL (p = 0.904). Re-excision for margin clearance was required for 16 (17%) malignant lesions in the RSL group and 10 (17%) in the WL group (p = 0.954). For malignant lesions, mean operating room time was 86 min for single RSL versus 70 min for single WL (p = 0.014). CONCLUSIONS: The use of RSL is a viable option in the community setting, with several benefits over WL. While operative times were slightly longer with RSL, there was no difference in specimen size or re-excision rate for malignant lesions.

Asunto(s)

Neoplasias de la Mama/diagnóstico por imagen , Neoplasias de la Mama/cirugía , Carcinoma Ductal de Mama/diagnóstico por imagen , Carcinoma Ductal de Mama/cirugía , Carcinoma Intraductal no Infiltrante/diagnóstico por imagen , Carcinoma Intraductal no Infiltrante/cirugía , Marcadores Fiduciales , Adulto , Anciano , Neoplasias de la Mama/patología , Carcinoma Ductal de Mama/secundario , Carcinoma Intraductal no Infiltrante/secundario , Femenino , Hospitales Comunitarios , Humanos , Metástasis Linfática , Márgenes de Escisión , Mastectomía Segmentaria , Persona de Mediana Edad , Tempo Operativo , Radioisótopos , Reoperación , Estudios Retrospectivos , Carga Tumoral

16.

Prediction of essential proteins based on subcellular localization and gene expression correlation.

Fan, Yetian; Tang, Xiwei; Hu, Xiaohua; Wu, Wei; Ping, Qing.

BMC Bioinformatics ; 18(Suppl 13): 470, 2017 Dec 01.

Artículo en Inglés | MEDLINE | ID: mdl-29219067

RESUMEN

BACKGROUND: Essential proteins are indispensable to the survival and development process of living organisms. To understand the functional mechanisms of essential proteins, which can be applied to the analysis of disease and design of drugs, it is important to identify essential proteins from a set of proteins first. As traditional experimental methods designed to test out essential proteins are usually expensive and laborious, computational methods, which utilize biological and topological features of proteins, have attracted more attention in recent years. Protein-protein interaction networks, together with other biological data, have been explored to improve the performance of essential protein prediction. RESULTS: The proposed method SCP is evaluated on Saccharomyces cerevisiae datasets and compared with five other methods. The results show that our method SCP outperforms the other five methods in terms of accuracy of essential protein prediction. CONCLUSIONS: In this paper, we propose a novel algorithm named SCP, which combines the ranking by a modified PageRank algorithm based on subcellular compartments information, with the ranking by Pearson correlation coefficient (PCC) calculated from gene expression data. Experiments show that subcellular localization information is promising in boosting essential protein prediction.

Asunto(s)

Algoritmos , Biología Computacional/métodos , Regulación Fúngica de la Expresión Génica , Genes Esenciales , Mapas de Interacción de Proteínas , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética , Fracciones Subcelulares

17.

Predicting diabetes mellitus genes via protein-protein interaction and protein subcellular localization information.

Tang, Xiwei; Hu, Xiaohua; Yang, Xuejun; Fan, Yetian; Li, Yongfan; Hu, Wei; Liao, Yongzhong; Zheng, Ming Cai; Peng, Wei; Gao, Li.

BMC Genomics ; 17 Suppl 4: 433, 2016 08 18.

Artículo en Inglés | MEDLINE | ID: mdl-27535125

RESUMEN

BACKGROUND: Diabetes mellitus characterized by hyperglycemia as a result of insufficient production of or reduced sensitivity to insulin poses a growing threat to the health of people. It is a heterogeneous disorder with multiple etiologies consisting of type 1 diabetes, type 2 diabetes, gestational diabetes and so on. Diabetes-associated protein/gene prediction is a key step to understand the cellular mechanisms related to diabetes mellitus. Compared with experimental methods, computational predictions of candidate proteins/genes are cheaper and more effortless. Protein-protein interaction (PPI) data produced by the high-throughput technology have been used to prioritize candidate disease genes/proteins. However, the false interactions in the PPI data seriously hurt computational methods performance. In order to address that particular question, new methods are developed to identify candidate disease genes/proteins via integrating biological data from other sources. RESULTS: In this study, a new framework called PDMG is proposed to predict candidate disease genes/proteins. First, the weighted networks are building in terms of the combination of the subcellular localization information and PPI data. To form the weighted networks, the importance of each compartment is evaluated based on the number of interacted proteins in this compartment. This is because the very different roles played by different compartments in cell activities. Besides, some compartments are more important than others. Based on the evaluated compartments, the interactions between proteins are scored and the weighted PPI networks are constructed. Second, the known disease genes are extracted from OMIM database as the seed genes to expand disease-specific networks based on the weighted networks. Third, the weighted values between a protein and its neighbors in the disease-related networks are added together and the sum is as the score of the protein. Last but not least, the proteins are ranked based on descending order of their scores. The candidate proteins in the top are considered to be associated with the diseases and are potential disease-related proteins. Various types of data, such as type 2 diabetes-associated genes, subcellular localizations and protein interactions, are used to test PDMG method. CONCLUSIONS: The results show that the proteins/genes functionally exerting a direct influence over diabetes are consistently placed at the head of the queue. PDMG expands and ranks 445 candidate proteins from the seed set including original 27 type 2 diabetes proteins. Out of the top 27 proteins, 14 proteins are the real type 2 diabetes proteins. The literature extracted from the PubMed database has proved that, out of 13 novel proteins, 8 proteins are associated with diabetes.

Asunto(s)

Biología Computacional/métodos , Diabetes Mellitus Tipo 2/genética , Mapeo de Interacción de Proteínas/métodos , Mapas de Interacción de Proteínas/genética , Algoritmos , Humanos , Proteínas/genética , Proteínas/metabolismo , Programas Informáticos

18.

A novel algorithm for detecting protein complexes with the breadth first search.

Tang, Xiwei; Wang, Jianxin; Li, Min; He, Yiming; Pan, Yi.

Biomed Res Int ; 2014: 354539, 2014.

Artículo en Inglés | MEDLINE | ID: mdl-24818139

RESUMEN

Most biological processes are carried out by protein complexes. A substantial number of false positives of the protein-protein interaction (PPI) data can compromise the utility of the datasets for complexes reconstruction. In order to reduce the impact of such discrepancies, a number of data integration and affinity scoring schemes have been devised. The methods encode the reliabilities (confidence) of physical interactions between pairs of proteins. The challenge now is to identify novel and meaningful protein complexes from the weighted PPI network. To address this problem, a novel protein complex mining algorithm ClusterBFS (Cluster with Breadth-First Search) is proposed. Based on the weighted density, ClusterBFS detects protein complexes of the weighted network by the breadth first search algorithm, which originates from a given seed protein used as starting-point. The experimental results show that ClusterBFS performs significantly better than the other computational approaches in terms of the identification of protein complexes.

Asunto(s)

Algoritmos , Complejos Multiproteicos/metabolismo , Análisis por Conglomerados , Biología Computacional/métodos , Bases de Datos de Proteínas , Humanos , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo

19.

Predicting Essential Proteins Based on Weighted Degree Centrality.

Tang, Xiwei; Wang, Jianxin; Zhong, Jiancheng; Pan, Yi.

IEEE/ACM Trans Comput Biol Bioinform ; 11(2): 407-18, 2014.

Artículo en Inglés | MEDLINE | ID: mdl-26355787

RESUMEN

Essential proteins are vital for an organism's viability under a variety of conditions. There are many experimental and computational methods developed to identify essential proteins. Computational prediction of essential proteins based on the global protein-protein interaction (PPI) network is severely restricted because of the insufficiency of the PPI data, but fortunately the gene expression profiles help to make up the deficiency. In this work, Pearson correlation coefficient (PCC) is used to bridge the gap between PPI and gene expression data. Based on PCC and edge clustering coefficient (ECC), a new centrality measure, i.e., the weighted degree centrality (WDC), is developed to achieve the reliable prediction of essential proteins. WDC is employed to identify essential proteins in the yeast PPI and e-Coli networks in order to estimate its performance. For comparison, other prediction technologies are also performed to identify essential proteins. Some evaluation methods are used to analyze the results from various prediction approaches. The prediction results and comparative analyses are shown in the paper. Furthermore, the parameter λ in the method WDC will be analyzed in detail and an optimal λ value will be found. Based on the optimal λ value, the differentiation of WDC and another prediction method PeC is discussed. The analyses prove that WDC outperforms other methods including DC, BC, CC, SC, EC, IC, NC, and PeC. At the same time, the analyses also mean that it is an effective way to predict essential proteins by means of integrating different data sources.

Asunto(s)

Biología Computacional/métodos , Mapas de Interacción de Proteínas/genética , Proteínas/química , Proteínas/metabolismo , Transcriptoma/genética , Análisis por Conglomerados , Proteínas/genética , Curva ROC

20.

Clustering based on multiple biological information: approach for predicting protein complexes.

Tang, Xiwei; Feng, Qilong; Wang, Jianxin; He, Yiming; Pan, Yi.

IET Syst Biol ; 7(5): 223-30, 2013 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-24067423

RESUMEN

Protein complexes are a cornerstone of many biological processes. Protein-protein interaction (PPI) data enable a number of computational methods for predicting protein complexes. However, the insufficiency of the PPI data significantly lowers the accuracy of computational methods. In the current work, the authors develop a novel method named clustering based on multiple biological information (CMBI) to discover protein complexes via the integration of multiple biological resources including gene expression profiles, essential protein information and PPI data. First, CMBI defines the functional similarity of each pair of interacting proteins based on the edge-clustering coefficient and the Pearson correlation coefficient. Second, CMBI selects essential proteins as seeds to build the protein complexes. A redundancy-filtering procedure is performed to eliminate redundant complexes. In addition to the essential proteins, CMBI also uses other proteins as seeds to expand protein complexes. To check the performance of CMBI, the authors compare the complexes discovered by CMBI with the ones found by other techniques by matching the predicted complexes against the reference complexes. The authors use subsequently GO::TermFinder to analyse the complexes predicted by various methods. Finally, the effect of parameters T and R is investigated. The results from GO functional enrichment and matching analyses show that CMBI performs significantly better than the state-of-the-art methods.

Asunto(s)

Análisis por Conglomerados , Biología Computacional/métodos , Mapas de Interacción de Proteínas , Proteínas de Saccharomyces cerevisiae/química , Algoritmos , Perfilación de la Expresión Génica , Genes Fúngicos , Modelos Estadísticos , Mapeo de Interacción de Proteínas , Saccharomyces cerevisiae/metabolismo , Programas Informáticos

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA