Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 56
Filtrar
1.
BMC Bioinformatics ; 24(1): 60, 2023 Feb 23.
Artigo em Inglês | MEDLINE | ID: mdl-36823571

RESUMO

BACKGROUND: Cell homeostasis relies on the concerted actions of genes, and dysregulated genes can lead to diseases. In living organisms, genes or their products do not act alone but within networks. Subsets of these networks can be viewed as modules that provide specific functionality to an organism. The Kyoto encyclopedia of genes and genomes (KEGG) systematically analyzes gene functions, proteins, and molecules and combines them into pathways. Measurements of gene expression (e.g., RNA-seq data) can be mapped to KEGG pathways to determine which modules are affected or dysregulated in the disease. However, genes acting in multiple pathways and other inherent issues complicate such analyses. Many current approaches may only employ gene expression data and need to pay more attention to some of the existing knowledge stored in KEGG pathways for detecting dysregulated pathways. New methods that consider more precompiled information are required for a more holistic association between gene expression and diseases. RESULTS: PriPath is a novel approach that transfers the generic process of grouping and scoring, followed by modeling to analyze gene expression with KEGG pathways. In PriPath, KEGG pathways are utilized as the grouping function as part of a machine learning algorithm for selecting the most significant KEGG pathways. A machine learning model is trained to differentiate between diseases and controls using those groups. We have tested PriPath on 13 gene expression datasets of various cancers and other diseases. Our proposed approach successfully assigned biologically and clinically relevant KEGG terms to the samples based on the differentially expressed genes. We have comparatively evaluated the performance of PriPath against other tools, which are similar in their merit. For each dataset, we manually confirmed the top results of PriPath in the literature and found that most predictions can be supported by previous experimental research. CONCLUSIONS: PriPath can thus aid in determining dysregulated pathways, which applies to medical diagnostics. In the future, we aim to advance this approach so that it can perform patient stratification based on gene expression and identify druggable targets. Thereby, we cover two aspects of precision medicine.


Assuntos
Biologia Computacional , Neoplasias , Humanos , Biologia Computacional/métodos , Neoplasias/genética , Genoma , Algoritmos , Expressão Gênica , Perfilação da Expressão Gênica
2.
BMC Musculoskelet Disord ; 24(1): 218, 2023 Mar 23.
Artigo em Inglês | MEDLINE | ID: mdl-36949452

RESUMO

BACKGROUND: Degenerative lumbar spinal stenosis (DLSS) is the most common spine disease in the elderly population. It is usually associated with lumbar spine joints/or ligaments degeneration. Machine learning technique is an exclusive method for handling big data analysis; however, the development of this method for spine pathology is rare. This study aims to detect the essential variables that predict the development of symptomatic DLSS using the random forest of machine learning (ML) algorithms technique. METHODS: A retrospective study with two groups of individuals. The first included 165 with symptomatic DLSS (sex ratio 80 M/85F), and the second included 180 individuals from the general population (sex ratio: 90 M/90F) without lumbar spinal stenosis symptoms. Lumbar spine measurements such as vertebral or spinal canal diameters from L1 to S1 were conducted on computerized tomography (CT) images. Demographic and health data of all the participants (e.g., body mass index and diabetes mellitus) were also recorded. RESULTS: The decision tree model of ML demonstrate that the anteroposterior diameter of the bony canal at L5 (males) and L4 (females) levels have the greatest stimulus for symptomatic DLSS (scores of 1 and 0.938). In addition, combination of these variables with other lumbar spine features is mandatory for developing the DLSS. CONCLUSIONS: Our results indicate that combination of lumbar spine characteristics such as bony canal and vertebral body dimensions rather than the presence of a sole variable is highly associated with symptomatic DLSS onset.


Assuntos
Doenças da Coluna Vertebral , Estenose Espinal , Masculino , Feminino , Humanos , Idoso , Estenose Espinal/diagnóstico , Estudos Retrospectivos , Doenças da Coluna Vertebral/patologia , Tomografia Computadorizada por Raios X , Vértebras Lombares/diagnóstico por imagem , Vértebras Lombares/patologia , Algoritmos
3.
Isr Med Assoc J ; 24(4): 246-252, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-35415984

RESUMO

BACKGROUND: Hookah smoking is a common activity around the world and has recently become a trend among youth. Studies have indicated a relationship between hookah smoking and a high prevalence of chronic diseases, cancer, cardiovascular, and infectious diseases. In Israel, there has been a sharp increase in hookah smoking among the Arabs. Most studies have focused mainly on hookah smoking among young people. OBJECTIVES: To examine the association between hookah smoking and socioeconomic characteristics, health status and behaviors, and knowledge in the adult Arab population and to build a prediction model using machine learning methods. METHODS: This quantitative study based is on data from the Health and Environment Survey conducted by the Galilee Society in 2015-2016. The data were collected through face-to-face interviews with 2046 adults aged 18 years and older. RESULTS: Using machine learning, a prediction model was built based on eight features. Of the total study population, 13.0% smoked hookah. In the 18-34 age group, 19.5% smoked. Men, people with lower level of health knowledge, heavy consumers of energy drinks and alcohol, and unemployed people were more likely to smoke hookah. Younger and more educated people were more likely to smoke hookah. CONCLUSIONS: Hookah smoking is a widespread behavior among adult Arabs in Israel. The model generated by our study is intended to help health organizations reach people at risk for smoking hookah and to suggest different approaches to eliminate this phenomenon.


Assuntos
Árabes , Fumar Cachimbo de Água , Adolescente , Adulto , Algoritmos , Humanos , Israel/epidemiologia , Aprendizado de Máquina , Masculino , Fumar Cachimbo de Água/epidemiologia , Adulto Jovem
4.
Bioinformatics ; 35(20): 4020-4028, 2019 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-30895309

RESUMO

MOTIVATION: Disease is often manifested via changes in transcript and protein abundance. MicroRNAs (miRNAs) are instrumental in regulating protein abundance and may measurably influence transcript levels. miRNAs often target more than one mRNA (for humans, the average is three), and mRNAs are often targeted by more than one miRNA (for the genes considered in this study, the average is also three). Therefore, it is difficult to determine the miRNAs that may cause the observed differential gene expression. We present a novel approach, maTE, which is based on machine learning, that integrates information about miRNA target genes with gene expression data. maTE depends on the availability of a sufficient amount of patient and control samples. The samples are used to train classifiers to accurately classify the samples on a per miRNA basis. Multiple high scoring miRNAs are used to build a final classifier to improve separation. RESULTS: The aim of the study is to find a set of miRNAs causing the regulation of their target genes that best explains the difference between groups (e.g. cancer versus control). maTE provides a list of significant groups of genes where each group is targeted by a specific miRNA. For the datasets used in this study, maTE generally achieves an accuracy well above 80%. Also, the results show that when the accuracy is much lower (e.g. ∼50%), the set of miRNAs provided is likely not causative of the difference in expression. This new approach of integrating miRNA regulation with expression data yields powerful results and is independent of external labels and training data. Thereby, this approach allows new avenues for exploring miRNA regulation and may enable the development of miRNA-based biomarkers and drugs. AVAILABILITY AND IMPLEMENTATION: The KNIME workflow, implementing maTE, is available at Bioinformatics online. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
MicroRNAs/genética , Perfilação da Expressão Gênica , Humanos , Aprendizado de Máquina , Neoplasias , RNA Mensageiro
5.
Entropy (Basel) ; 23(1)2020 Dec 22.
Artigo em Inglês | MEDLINE | ID: mdl-33374969

RESUMO

In the last two decades, there have been massive advancements in high throughput technologies, which resulted in the exponential growth of public repositories of gene expression datasets for various phenotypes. It is possible to unravel biomarkers by comparing the gene expression levels under different conditions, such as disease vs. control, treated vs. not treated, drug A vs. drug B, etc. This problem refers to a well-studied problem in the machine learning domain, i.e., the feature selection problem. In biological data analysis, most of the computational feature selection methodologies were taken from other fields, without considering the nature of the biological data. Thus, integrative approaches that utilize the biological knowledge while performing feature selection are necessary for this kind of data. The main idea behind the integrative gene selection process is to generate a ranked list of genes considering both the statistical metrics that are applied to the gene expression data, and the biological background information which is provided as external datasets. One of the main goals of this review is to explore the existing methods that integrate different types of information in order to improve the identification of the biomolecular signatures of diseases and the discovery of new potential targets for treatment. These integrative approaches are expected to aid the prediction, diagnosis, and treatment of diseases, as well as to enlighten us on disease state dynamics, mechanisms of their onset and progression. The integration of various types of biological information will necessitate the development of novel techniques for integration and data analysis. Another aim of this review is to boost the bioinformatics community to develop new approaches for searching and determining significant groups/clusters of features based on one or more biological grouping functions.

6.
BMC Bioinformatics ; 18(1): 170, 2017 Mar 14.
Artigo em Inglês | MEDLINE | ID: mdl-28292266

RESUMO

BACKGROUND: Post-transcriptional gene dysregulation can be a hallmark of diseases like cancer and microRNAs (miRNAs) play a key role in the modulation of translation efficiency. Known pre-miRNAs are listed in miRBase, and they have been discovered in a variety of organisms ranging from viruses and microbes to eukaryotic organisms. The computational detection of pre-miRNAs is of great interest, and such approaches usually employ machine learning to discriminate between miRNAs and other sequences. Many features have been proposed describing pre-miRNAs, and we have previously introduced the use of sequence motifs and k-mers as useful ones. There have been reports of xeno-miRNAs detected via next generation sequencing. However, they may be contaminations and to aid that important decision-making process, we aimed to establish a means to differentiate pre-miRNAs from different species. RESULTS: To achieve distinction into species, we used one species' pre-miRNAs as the positive and another species' pre-miRNAs as the negative training and test data for the establishment of machine learned models based on sequence motifs and k-mers as features. This approach resulted in higher accuracy values between distantly related species while species with closer relation produced lower accuracy values. CONCLUSIONS: We were able to differentiate among species with increasing success when the evolutionary distance increases. This conclusion is supported by previous reports of fast evolutionary changes in miRNAs since even in relatively closely related species a fairly good discrimination was possible.


Assuntos
MicroRNAs/metabolismo , Animais , Sequência de Bases , Fabaceae/classificação , Fabaceae/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , MicroRNAs/química , MicroRNAs/genética , Filogenia , Precursores de RNA/genética , Precursores de RNA/metabolismo
7.
BMC Bioinformatics ; 18(1): 473, 2017 Nov 09.
Artigo em Inglês | MEDLINE | ID: mdl-29121868

RESUMO

BACKGROUND: Identification of essential genes is not only useful for our understanding of the minimal gene set required for cellular life but also aids the identification of novel drug targets in pathogens. In this work, we present a simple and effective gene essentiality prediction method using information-theoretic features that are derived exclusively from the gene sequences. RESULTS: We developed a Random Forest classifier and performed an extensive model performance evaluation among and within 15 selected bacteria. In intra-organism predictions, where training and testing sets are taken from the same organism, AUC (Area Under the Curve) scores ranging from 0.73 to 0.90, 0.84 on average, were obtained. Cross-organism predictions using 5-fold cross-validation, pairwise, leave-one-species-out, leave-one-taxon-out, and cross-taxon yielded average AUC scores of 0.88, 0.75, 0.80, 0.82, and 0.78, respectively. To further show the applicability of our method in other domains of life, we predicted the essential genes of the yeast Schizosaccharomyces pombe and obtained a similar accuracy (AUC 0.84). CONCLUSIONS: The proposed method enables a simple and reliable identification of essential genes without searching in databases for orthologs and demanding further experimental data such as network topology and gene-expression.


Assuntos
Bactérias/genética , Genes Essenciais , Modelos Teóricos , Área Sob a Curva , Sequência de Bases , Aprendizado de Máquina , Cadeias de Markov , Curva ROC
8.
Artigo em Inglês | MEDLINE | ID: mdl-38865233

RESUMO

Antimicrobial peptides (AMPs) have drawn the interest of the researchers since they offer an alternative to the traditional antibiotics in the fight against antibiotic resistance and they exhibit additional pharmaceutically significant properties. Recently, computational approaches attemp to reveal how antibacterial activity is determined from a machine learning perspective and they aim to search and find the biological cues or characteristics that control antimicrobial activity via incorporating motif match scores. This study is dedicated to the development of a machine learning framework aimed at devising novel antimicrobial peptide (AMP) sequences potentially effective against Gram-positive /Gram-negative bacteria. In order to design newly generated sequences classified as either AMP or non-AMP, various classification models were trained. These novel sequences underwent validation utilizingthe "DBAASP:strain-specific antibacterial prediction based on machine learning approaches and data on AMP sequences" tool. The findings presented herein represent a significant stride in this computational research, streamlining the process of AMP creation or modification within wet lab environments.

9.
Comput Biol Med ; 182: 109098, 2024 Sep 17.
Artigo em Inglês | MEDLINE | ID: mdl-39293338

RESUMO

Colorectal cancer (CRC) ranks as the third most common cancer globally and the second leading cause of cancer-related deaths. Recent research highlights the pivotal role of the gut microbiota in CRC development and progression. Understanding the complex interplay between disease development and metagenomic data is essential for CRC diagnosis and treatment. Current computational models employ machine learning to identify metagenomic biomarkers associated with CRC, yet there is a need to improve their accuracy through a holistic biological knowledge perspective. This study aims to evaluate CRC-associated metagenomic data at species, enzymes, and pathway levels via conducting global and population-specific analyses. These analyses utilize relative abundance values from human gut microbiome sequencing data and robust classification models are built for disease prediction and biomarker identification. For global CRC prediction and biomarker identification, the features that are identified by SelectKBest (SKB), Information Gain (IG), and Extreme Gradient Boosting (XGBoost) methods are combined. Population-based analysis includes within-population, leave-one-dataset-out (LODO) and cross-population approaches. Four classification algorithms are employed for CRC classification. Random Forest achieved an AUC of 0.83 for species data, 0.78 for enzyme data and 0.76 for pathway data globally. On the global scale, potential taxonomic biomarkers include ruthenibacterium lactatiformanas; enzyme biomarkers include RNA 2' 3' cyclic 3' phosphodiesterase; and pathway biomarkers include pyruvate fermentation to acetone pathway. This study underscores the potential of machine learning models trained on metagenomic data for improved disease prediction and biomarker discovery. The proposed model and associated files are available at https://github.com/TemizMus/CCPRED.

10.
Eukaryot Cell ; 11(4): 430-41, 2012 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-22307976

RESUMO

Leishmania double transfectants (DTs) expressing the 2nd and 3rd enzymes in the heme biosynthetic pathway were previously reported to show neogenesis of uroporphyrin I (URO) when induced with delta-aminolevulinate (ALA), the product of the 1st enzyme in the pathway. The ensuing accumulation of URO in DT promastigotes rendered them light excitable to produce reactive oxygen species (ROS), resulting in their cytolysis. Evidence is presented showing that the DTs retained wild-type infectivity to their host cells and that the intraphagolysosomal/parasitophorous vacuolar (PV) DTs remained ALA inducible for uroporphyrinogenesis/photolysis. Exposure of DT-infected cells to ALA was noted by fluorescence microscopy to result in host-parasite differential porphyrinogenesis: porphyrin fluorescence emerged first in the host cells and then in the intra-PV amastigotes. DT-infected and control cells differed qualitatively and quantitatively in their porphyrin species, consistent with the expected multi- and monoporphyrinogenic specificities of the host cells and the DTs, respectively. After ALA removal, the neogenic porphyrins were rapidly lost from the host cells but persisted as URO in the intra-PV DTs. These DTs were thus extremely light sensitive and were lysed selectively by illumination under nonstringent conditions in the relatively ROS-resistant phagolysosomes. Photolysis of the intra-PV DTs returned the distribution of major histocompatibility complex (MHC) class II molecules and the global gene expression profiles of host cells to their preinfection patterns and, when transfected with ovalbumin, released this antigen for copresentation with MHC class I molecules. These Leishmania mutants thus have considerable potential as a novel model of a universal vaccine carrier for photodynamic immunotherapy/immunoprophylaxis.


Assuntos
Ácido Aminolevulínico/farmacologia , Leishmania/genética , Fagócitos/parasitologia , Fagossomos/parasitologia , Fármacos Fotossensibilizantes/farmacologia , Porfirinas/biossíntese , Vacinação/métodos , Animais , Apresentação de Antígeno , Antígenos de Protozoários/imunologia , Células Cultivadas , Células Dendríticas/metabolismo , Células Dendríticas/parasitologia , Células Dendríticas/efeitos da radiação , Perfilação da Expressão Gênica , Antígenos de Histocompatibilidade Classe I/metabolismo , Leishmania/imunologia , Leishmania/efeitos da radiação , Macrófagos Peritoneais/metabolismo , Macrófagos Peritoneais/parasitologia , Macrófagos Peritoneais/efeitos da radiação , Camundongos , Camundongos Endogâmicos BALB C , Análise de Sequência com Séries de Oligonucleotídeos , Organismos Geneticamente Modificados/imunologia , Fotólise
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa