Búsqueda | Portal de Búsqueda de la BVS Colombia

Prediction of metastasis in advanced colorectal carcinomas using CGH data.

Saghapour, Ehsan; Sehhati, Mohammadreza.

J Theor Biol ; 429: 116-123, 2017 09 21.

Artículo en Inglés | MEDLINE | ID: mdl-28647497

RESUMEN

Logistic Regression Model (LRM) and artificial neural networks (ANNs) as two nonlinear models have been used to establish a novel two-stage hybrid modeling procedure for prediction of metastasis in advanced colorectal carcinomas. Two different datasets were used in training and testing procedures. For the first stage of hybrid modeling procedure, LRM was used to evaluate the contribution of DNA sequence copy number aberrations detected by Comparative Genomic Hybridization in advanced colorectal carcinoma and its metastasis. Then, the most effective parameters were selected by the LRM. Selected effective parameters among 565 detected chromosomal gains and losses were as follows: gain of 20q11.2, loss of 1q42, loss of 13q34, gain of 5q12, gain of 17p13, loss of 2q22, loss of 11q24 and gain of 2p11.2. Consequently, neural network models were constructed and fed by the parameters selected by LRM to build hybrid predictors on the two databases during self-consistency and jackknife tests, and performance of the hybrid model was verified. The results showed that our two-stage hybrid model approach is very promising for prediction of metastasis in advanced colorectal carcinomas.

Asunto(s)

Neoplasias Colorrectales/patología , Hibridación Genómica Comparativa/métodos , Metástasis de la Neoplasia , Variaciones en el Número de Copia de ADN/genética , Humanos , Modelos Logísticos , Redes Neurales de la Computación , Probabilidad

An NLP-based technique to extract meaningful features from drug SMILES.

Sharma, Rahul; Saghapour, Ehsan; Chen, Jake Y.

iScience ; 27(3): 109127, 2024 Mar 15.

Artículo en Inglés | MEDLINE | ID: mdl-38455979

RESUMEN

NLP is a well-established field in ML for developing language models that capture the sequence of words in a sentence. Similarly, drug molecule structures can also be represented as sequences using the SMILES notation. However, unlike natural language texts, special characters in drug SMILES have specific meanings and cannot be ignored. We introduce a novel NLP-based method that extracts interpretable sequences and essential features from drug SMILES notation using N-grams. Our method compares these features to Morgan fingerprint bit-vectors using UMAP-based embedding, and we validate its effectiveness through two personalized drug screening (PSD) case studies. Our NLP-based features are sparse and, when combined with gene expressions and disease phenotype features, produce better ML models for PSD. This approach provides a new way to analyze drug molecule structures represented as SMILES notation, which can help accelerate drug discovery efforts. We have also made our method accessible through a Python library.

Explorative Discovery of Gene Signatures and Clinotypes in Glioblastoma Cancer Through GeneTerrain Knowledge Map Representation.

Saghapour, Ehsan; Yue, Zongliang; Sharma, Rahul; Kumar, Sidharth; Sembay, Zhandos; Willey, Christopher D; Chen, Jake Y.

bioRxiv ; 2024 Apr 02.

Artículo en Inglés | MEDLINE | ID: mdl-38617348

RESUMEN

This study introduces the GeneTerrain Knowledge Map Representation (GTKM), a novel method for visualizing gene expression data in cancer research. GTKM leverages protein-protein interactions to graphically display differentially expressed genes (DEGs) on a 2-dimensional contour plot, offering a more nuanced understanding of gene interactions and expression patterns compared to traditional heatmap methods. The research demonstrates GTKM's utility through four case studies on glioblastoma (GBM) datasets, focusing on survival analysis, subtype identification, IDH1 mutation analysis, and drug sensitivities of different tumor cell lines. Additionally, a prototype website has been developed to showcase these findings, indicating the method's adaptability for various cancer types. The study reveals that GTKM effectively identifies gene patterns associated with different clinical outcomes in GBM, and its profiles enable the identification of sub-gene signature patterns crucial for predicting survival. The methodology promises significant advancements in precision medicine, providing a powerful tool for understanding complex gene interactions and identifying potential therapeutic targets in cancer treatment.

ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data.

Zhou, Yi-Hui; Saghapour, Ehsan.

Front Genet ; 12: 691274, 2021.

Artículo en Inglés | MEDLINE | ID: mdl-34276792

RESUMEN

Electronic health records (EHRs) have been widely adopted in recent years, but often include a high proportion of missing data, which can create difficulties in implementing machine learning and other tools of personalized medicine. Completed datasets are preferred for a number of analysis methods, and successful imputation of missing EHR data can improve interpretation and increase our power to predict health outcomes. However, use of the most popular imputation methods mainly require scripting skills, and are implemented using various packages and syntax. Thus, the implementation of a full suite of methods is generally out of reach to all except experienced data scientists. Moreover, imputation is often considered as a separate exercise from exploratory data analysis, but should be considered as art of the data exploration process. We have created a new graphical tool, ImputEHR, that is based on a Python base and allows implementation of a range of simple and sophisticated (e.g., gradient-boosted tree-based and neural network) data imputation approaches. In addition to imputation, the tool enables data exploration for informed decision-making, as well as implementing machine learning prediction tools for response data selected by the user. Although the approach works for any missing data problem, the tool is primarily motivated by problems encountered for EHR and other biomedical data. We illustrate the tool using multiple real datasets, providing performance measures of imputation and downstream predictive analysis.

Exploring the Limits of Combined Image/'omics Analysis for Non-cancer Histological Phenotypes.

Gallins, Paul; Saghapour, Ehsan; Zhou, Yi-Hui.

Front Genet ; 11: 555886, 2020.

Artículo en Inglés | MEDLINE | ID: mdl-33193632

RESUMEN

The last several years have witnessed an explosion of methods and applications for combining image data with 'omics data, and for prediction of clinical phenotypes. Much of this research has focused on cancer histology, for which genetic perturbations are large, and the signal to noise ratio is high. Related research on chronic, complex diseases is limited by tissue sample availability, lower genomic signal strength, and the less extreme and tissue-specific nature of intermediate histological phenotypes. Data from the GTEx Consortium provides a unique opportunity to investigate the connections among phenotypic histological variation, imaging data, and 'omics profiling, from multiple tissue-specific phenotypes at the sub-clinical level. Investigating histological designations in multiple tissues, we survey the evidence for genomic association and prediction of histology, and use the results to test the limits of prediction accuracy using machine learning methods applied to the imaging data, genomics data, and their combination. We find that expression data has similar or superior accuracy for pathology prediction as our use of imaging data, despite the fact that pathological determination is made from the images themselves. A variety of machine learning methods have similar performance, while network embedding methods offer at best limited improvements. These observations hold across a range of tissues and predictor types. The results are supportive of the use of genomic measurements for prediction, and in using the same target tissue in which pathological phenotyping has been performed. Although this last finding is sensible, to our knowledge our study is the first to demonstrate this fact empirically. Even while prediction accuracy remains a challenge, the results show clear evidence of pathway and tissue-specific biology.

Physicochemical Position-Dependent Properties in the Protein Secondary Structures

Saghapour, Ehsan; Sehhati, Mohammadreza.

Iran Biomed J ; 23(4): 253-61, 2019 07.

Artículo en Inglés | MEDLINE | ID: mdl-30954029

RESUMEN

Background: Establishing theories for designing arbitrary protein structures is complicated and depends on understanding the principles for protein folding, which is affected by applied features. Computer algorithms can reach high precision and stability in computationally designed enzymes and binders by applying informative features obtained from natural structures. Methods: In this study, a position-specific analysis of secondary structures (α-helix, ß-strand, and tight turn) was performed to reveal novel features for protein structure prediction and protein design. Results: Our results showed that the secondary structures in the N-termini region tend to be more compact than C-termini. Decoying periodicity in length and distribution of amino acids in α-helices is deciphered using the curve-fitting methods. Compared with α-helix, ß-strands do not show distinct periodicities in length. Also, significant differences in position-dependent distribution of physicochemical properties are shown in secondary structures. Conclusion: Position-specific propensities in our study underline valuable parameters that could be used by researchers in the field of structural biology, particularly protein design through site-directed mutagenesis.

Asunto(s)

Fenómenos Químicos , Proteínas/química , Aminoácidos/química , Bases de Datos de Proteínas , Posición Específica de Matrices de Puntuación , Estructura Secundaria de Proteína

A novel feature ranking method for prediction of cancer stages using proteomics data.

Saghapour, Ehsan; Kermani, Saeed; Sehhati, Mohammadreza.

PLoS One ; 12(9): e0184203, 2017.

Artículo en Inglés | MEDLINE | ID: mdl-28934234

RESUMEN

Proteomic analysis of cancers' stages has provided new opportunities for the development of novel, highly sensitive diagnostic tools which helps early detection of cancer. This paper introduces a new feature ranking approach called FRMT. FRMT is based on the Technique for Order of Preference by Similarity to Ideal Solution method (TOPSIS) which select the most discriminative proteins from proteomics data for cancer staging. In this approach, outcomes of 10 feature selection techniques were combined by TOPSIS method, to select the final discriminative proteins from seven different proteomic databases of protein expression profiles. In the proposed workflow, feature selection methods and protein expressions have been considered as criteria and alternatives in TOPSIS, respectively. The proposed method is tested on seven various classifier models in a 10-fold cross validation procedure that repeated 30 times on the seven cancer datasets. The obtained results proved the higher stability and superior classification performance of method in comparison with other methods, and it is less sensitive to the applied classifier. Moreover, the final introduced proteins are informative and have the potential for application in the real medical practice.

Asunto(s)

Algoritmos , Neoplasias/metabolismo , Proteoma , Proteómica/métodos , Biomarcadores de Tumor/metabolismo , Conjuntos de Datos como Asunto , Humanos , Modelos Biológicos , Neoplasias/clasificación , Programas Informáticos

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA