Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 86
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
BMC Med ; 22(1): 68, 2024 Feb 16.
Artículo en Inglés | MEDLINE | ID: mdl-38360711

RESUMEN

BACKGROUND: Follow-up visits for very preterm infants (VPI) after hospital discharge is crucial for their neurodevelopmental trajectories, but ensuring their attendance before 12 months corrected age (CA) remains a challenge. Current prediction models focus on future outcomes at discharge, but post-discharge data may enhance predictions of neurodevelopmental trajectories due to brain plasticity. Few studies in this field have utilized machine learning models to achieve this potential benefit with transparency, explainability, and transportability. METHODS: We developed four prediction models for cognitive or motor function at 24 months CA separately at each follow-up visits, two for the 6-month and two for the 12-month CA visits, using hospitalized and follow-up data of VPI from the Taiwan Premature Infant Follow-up Network from 2010 to 2017. Regression models were employed at 6 months CA, defined as a decline in The Bayley Scales of Infant Development 3rd edition (BSIDIII) composite score > 1 SD between 6- and 24-month CA. The delay models were developed at 12 months CA, defined as a BSIDIII composite score < 85 at 24 months CA. We used an evolutionary-derived machine learning method (EL-NDI) to develop models and compared them to those built by lasso regression, random forest, and support vector machine. RESULTS: One thousand two hundred forty-four VPI were in the developmental set and the two validation cohorts had 763 and 1347 VPI, respectively. EL-NDI used only 4-10 variables, while the others required 29 or more variables to achieve similar performance. For models at 6 months CA, the area under the receiver operating curve (AUC) of EL-NDI were 0.76-0.81(95% CI, 0.73-0.83) for cognitive regress with 4 variables and 0.79-0.83 (95% CI, 0.76-0.86) for motor regress with 4 variables. For models at 12 months CA, the AUC of EL-NDI were 0.75-0.78 (95% CI, 0.72-0.82) for cognitive delay with 10 variables and 0.73-0.82 (95% CI, 0.72-0.85) for motor delay with 4 variables. CONCLUSIONS: Our EL-NDI demonstrated good performance using simpler, transparent, explainable models for clinical purpose. Implementing these models for VPI during follow-up visits may facilitate more informed discussions between parents and physicians and identify high-risk infants more effectively for early intervention.


Asunto(s)
Enfermedades del Prematuro , Recien Nacido Prematuro , Lactante , Niño , Recién Nacido , Humanos , Estudios Retrospectivos , Estudios Longitudinales , Cuidados Posteriores , Unidades de Cuidado Intensivo Neonatal , Alta del Paciente
2.
Carcinogenesis ; 44(8-9): 650-661, 2023 12 02.
Artículo en Inglés | MEDLINE | ID: mdl-37701974

RESUMEN

OBJECTIVE: Hepatocellular carcinoma (HCC) is one of the leading cancer types with increasing annual incidence and high mortality in the USA. MicroRNAs (miRNAs) have emerged as valuable prognostic indicators in cancer patients. To identify a miRNA signature predictive of survival in patients with HCC, we developed a machine learning-based HCC survival estimation method, HCCse, using the miRNA expression profiles of 122 patients with HCC. METHODS: The HCCse method was designed using an optimal feature selection algorithm incorporated with support vector regression. RESULTS: HCCse identified a robust miRNA signature consisting of 32 miRNAs and obtained a mean correlation coefficient (R) and mean absolute error (MAE) of 0.87 ±â€…0.02 and 0.73 years between the actual and estimated survival times of patients with HCC; and the jackknife test achieved an R and MAE of 0.73 and 0.97 years between actual and estimated survival times, respectively. The identified signature has seven prognostic miRNAs (hsa-miR-146a-3p, hsa-miR-200a-3p, hsa-miR-652-3p, hsa-miR-34a-3p, hsa-miR-132-5p, hsa-miR-1301-3p and hsa-miR-374b-3p) and four diagnostic miRNAs (hsa-miR-1301-3p, hsa-miR-17-5p, hsa-miR-34a-3p and hsa-miR-200a-3p). Notably, three of these miRNAs, hsa-miR-200a-3p, hsa-miR-1301-3p and hsa-miR-17-5p, also displayed association with tumor stage, further emphasizing their clinical relevance. Furthermore, we performed pathway enrichment analysis and found that the target genes of the identified miRNA signature were significantly enriched in the hepatitis B pathway, suggesting its potential involvement in HCC pathogenesis. CONCLUSIONS: Our study developed HCCse, a machine learning-based method, to predict survival in HCC patients using miRNA expression profiles. We identified a robust miRNA signature of 32 miRNAs with prognostic and diagnostic value, highlighting their clinical relevance in HCC management and potential involvement in HCC pathogenesis.


Asunto(s)
Carcinoma Hepatocelular , Hepatitis B , Neoplasias Hepáticas , MicroARNs , Humanos , Carcinoma Hepatocelular/patología , Pronóstico , Neoplasias Hepáticas/patología , MicroARNs/genética , MicroARNs/metabolismo
3.
J Proteome Res ; 20(5): 2942-2952, 2021 05 07.
Artículo en Inglés | MEDLINE | ID: mdl-33856796

RESUMEN

There is an urgent need to elucidate the underlying mechanisms of coronavirus disease (COVID-19) so that vaccines and treatments can be devised. Severe acute respiratory syndrome coronavirus 2 has genetic similarity with bats and pangolin viruses, but a comprehensive understanding of the functions of its proteins at the amino acid sequence level is lacking. A total of 4320 sequences of human and nonhuman coronaviruses was retrieved from the Global Initiative on Sharing All Influenza Data and the National Center for Biotechnology Information. This work proposes an optimization method COVID-Pred with an efficient feature selection algorithm to classify the species-specific coronaviruses based on physicochemical properties (PCPs) of their sequences. COVID-Pred identified a set of 11 PCPs using a support vector machine and achieved 10-fold cross-validation and test accuracies of 99.53% and 97.80%, respectively. These findings could provide key insights into understanding the driving forces during the course of infection and assist in developing effective therapies.


Asunto(s)
COVID-19 , Quirópteros , Secuencia de Aminoácidos , Animales , Humanos , SARS-CoV-2 , Glicoproteína de la Espiga del Coronavirus
4.
Bioinformatics ; 36(12): 3833-3840, 2020 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-32399550

RESUMEN

MOTIVATION: Non-linear ordinary differential equation (ODE) models that contain numerous parameters are suitable for inferring an emulated gene regulatory network (eGRN). However, the number of experimental measurements is usually far smaller than the number of parameters of the eGRN model that leads to an underdetermined problem. There is no unique solution to the inference problem for an eGRN using insufficient measurements. RESULTS: This work proposes an evolutionary modelling algorithm (EMA) that is based on evolutionary intelligence to cope with the underdetermined problem. EMA uses an intelligent genetic algorithm to solve the large-scale parameter optimization problem. An EMA-based method, GREMA, infers a novel type of gene regulatory network with confidence levels for every inferred regulation. The higher the confidence level is, the more accurate the inferred regulation is. GREMA gradually determines the regulations of an eGRN with confidence levels in descending order using either an S-system or a Hill function-based ODE model. The experimental results showed that the regulations with high-confidence levels are more accurate and robust than regulations with low-confidence levels. Evolutionary intelligence enhanced the mean accuracy of GREMA by 19.2% when using the S-system model with benchmark datasets. An increase in the number of experimental measurements may increase the mean confidence level of the inferred regulations. GREMA performed well compared with existing methods that have been previously applied to the same S-system, DREAM4 challenge and SOS DNA repair benchmark datasets. AVAILABILITY AND IMPLEMENTATION: All of the datasets that were used and the GREMA-based tool are freely available at https://nctuiclab.github.io/GREMA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Redes Reguladoras de Genes , Evolución Biológica , Biología Computacional , Inteligencia
5.
Bioinformatics ; 33(5): 661-668, 2017 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-28062441

RESUMEN

Motivation: Numerous ubiquitination sites remain undiscovered because of the limitations of mass spectrometry-based methods. Existing prediction methods use randomly selected non-validated sites as non-ubiquitination sites to train ubiquitination site prediction models. Results: We propose an evolutionary screening algorithm (ESA) to select effective negatives among non-validated sites and an ESA-based prediction method, ESA-UbiSite, to identify human ubiquitination sites. The ESA selects non-validated sites least likely to be ubiquitination sites as training negatives. Moreover, the ESA and ESA-UbiSite use a set of well-selected physicochemical properties together with a support vector machine for accurate prediction. Experimental results show that ESA-UbiSite with effective negatives achieved 0.92 test accuracy and a Matthews's correlation coefficient of 0.48, better than existing prediction methods. The ESA increased ESA-UbiSite's test accuracy from 0.75 to 0.92 and can improve other post-translational modification site prediction methods. Availability and Implementation: An ESA-UbiSite-based web server has been established at http://iclab.life.nctu.edu.tw/iclab_webtools/ESAUbiSite/ . Contact: syho@mail.nctu.edu.tw. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Programas Informáticos , Máquina de Vectores de Soporte , Ubiquitinación , Humanos
6.
BMC Bioinformatics ; 17(Suppl 19): 503, 2016 Dec 22.
Artículo en Inglés | MEDLINE | ID: mdl-28155647

RESUMEN

BACKGROUND: Most of hydrophilic and hydrophobic residues are thought to be exposed and buried in proteins, respectively. In contrast to the majority of the existing studies on protein folding characteristics using protein structures, in this study, our aim was to design predictors for estimating relative solvent accessibility (RSA) of amino acid residues to discover protein folding characteristics from sequences. METHODS: The proposed 20 real-value RSA predictors were designed on the basis of the support vector regression method with a set of informative physicochemical properties (PCPs) obtained by means of an optimal feature selection algorithm. Then, molecular dynamics simulations were performed for validating the knowledge discovered by analysis of the selected PCPs. RESULTS: The RSA predictors had the mean absolute error of 14.11% and a correlation coefficient of 0.69, better than the existing predictors. The hydrophilic-residue predictors preferred PCPs of buried amino acid residues to PCPs of exposed ones as prediction features. A hydrophobic spine composed of exposed hydrophobic residues of an α-helix was discovered by analyzing the PCPs of RSA predictors corresponding to hydrophobic residues. For example, the results of a molecular dynamics simulation of wild-type sequences and their mutants showed that proteins 1MOF and 2WRP_H16I (Protein Data Bank IDs), which have a perfectly hydrophobic spine, have more stable structures than 1MOF_I54D and 2WRP do (which do not have a perfectly hydrophobic spine). CONCLUSIONS: We identified informative PCPs to design high-performance RSA predictors and to analyze these PCPs for identification of novel protein folding characteristics. A hydrophobic spine in a protein can help to stabilize exposed α-helices.


Asunto(s)
Algoritmos , Aminoácidos/química , Conformación Proteica en Hélice alfa , Pliegue de Proteína , Proteínas/química , Solventes/química , Simulación por Computador , Interpretación Estadística de Datos , Bases de Datos de Proteínas , Humanos , Interacciones Hidrofóbicas e Hidrofílicas , Modelos Moleculares , Estructura Secundaria de Proteína
7.
BMC Bioinformatics ; 17(Suppl 19): 514, 2016 Dec 22.
Artículo en Inglés | MEDLINE | ID: mdl-28155663

RESUMEN

BACKGROUND: Bacterial tyrosine-kinases (BY-kinases), which play an important role in numerous cellular processes, are characterized as a separate class of enzymes and share no structural similarity with their eukaryotic counterparts. However, in silico methods for predicting BY-kinases have not been developed yet. Since these enzymes are involved in key regulatory processes, and are promising targets for anti-bacterial drug design, it is desirable to develop a simple and easily interpretable predictor to gain new insights into bacterial tyrosine phosphorylation. This study proposes a novel SCMBYK method for predicting and characterizing BY-kinases. RESULTS: A dataset consisting of 797 BY-kinases and 783 non-BY-kinases was established to design the SCMBYK predictor, which achieved training and test accuracies of 97.55 and 96.73%, respectively. Furthermore, the leave-one-phylum-out method was used to predict specific bacterial phyla hosts of target sequences, gaining 97.39% average test accuracy. After analyzing SCMBYK-derived propensity scores, four characteristics of BY-kinases were determined: 1) BY-kinases tend to be composed of α-helices; 2) the amino-acid content of extracellular regions of BY-kinases is expected to be dominated by residues such as Val, Ile, Phe and Tyr; 3) BY-kinases structurally resemble nuclear proteins; 4) different domains play different roles in triggering BY-kinase activity. CONCLUSIONS: The SCMBYK predictor is an effective method for identification of possible BY-kinases. Furthermore, it can be used as a part of a novel drug repurposing method, which recognizes putative BY-kinases and matches them to approved drugs. Among other results, our analysis revealed that azathioprine could suppress the virulence of M. tuberculosis, and thus be considered as a potential antibiotic for tuberculosis treatment.


Asunto(s)
Bacterias/enzimología , Proteínas Bacterianas/química , Dipéptidos/química , Proteínas Tirosina Quinasas/química , Programas Informáticos , Tirosina/química , Bases de Datos de Proteínas , Puntaje de Propensión
8.
BMC Genomics ; 17(Suppl 13): 1022, 2016 12 22.
Artículo en Inglés | MEDLINE | ID: mdl-28155650

RESUMEN

BACKGROUND: Though glioblastoma multiforme (GBM) is the most frequently occurring brain malignancy in adults, clinical treatment still faces challenges due to poor prognoses and tumor relapses. Recently, microRNAs (miRNAs) have been extensively used with the aim of developing accurate molecular therapies, because of their emerging role in the regulation of cancer-related genes. This work aims to identify the miRNA signatures related to survival of GBM patients for developing molecular therapies. RESULTS: This work proposes a support vector regression (SVR)-based estimator, called SVR-GBM, to estimate the survival time in patients with GBM using their miRNA expression profiles. SVR-GBM identified 24 out of 470 miRNAs that were significantly associated with survival of GBM patients. SVR-GBM had a mean absolute error of 0.63 years and a correlation coefficient of 0.76 between the real and predicted survival time. The 10 top-ranked miRNAs according to prediction contribution are as follows: hsa-miR-222, hsa-miR-345, hsa-miR-587, hsa-miR-526a, hsa-miR-335, hsa-miR-122, hsa-miR-24, hsa-miR-433, hsa-miR-574 and hsa-miR-320. Biological analysis using the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway on the identified miRNAs revealed their influence in GBM cancer. CONCLUSION: The proposed SVR-GBM using an optimal feature selection algorithm and an optimized SVR to identify the 24 miRNA signatures associated with survival of GBM patients. These miRNA signatures are helpful to uncover the individual role of miRNAs in GBM prognosis and develop miRNA-based therapies.


Asunto(s)
Neoplasias Encefálicas/genética , Neoplasias Encefálicas/mortalidad , Glioblastoma/genética , Glioblastoma/mortalidad , MicroARNs/genética , Transcriptoma , Neoplasias Encefálicas/metabolismo , Análisis por Conglomerados , Biología Computacional/métodos , Bases de Datos de Ácidos Nucleicos , Perfilación de la Expresión Génica , Glioblastoma/metabolismo , Humanos , Pronóstico , Interferencia de ARN , ARN Mensajero/genética , Transducción de Señal
9.
Bioinformatics ; 31(13): 2151-8, 2015 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-25717191

RESUMEN

MOTIVATION: The establishment of quantitative gene regulatory networks (qGRNs) through existing network component analysis (NCA) approaches suffers from shortcomings such as usage limitations of problem constraints and the instability of inferred qGRNs. The proposed GeNOSA framework uses a global optimization algorithm (OptNCA) to cope with the stringent limitations of NCA approaches in large-scale qGRNs. RESULTS: OptNCA performs well against existing NCA-derived algorithms in terms of utilization of connectivity information and reconstruction accuracy of inferred GRNs using synthetic and real Escherichia coli datasets. For comparisons with other non-NCA-derived algorithms, OptNCA without using known qualitative regulations is also evaluated in terms of qualitative assessments using a synthetic Saccharomyces cerevisiae dataset of the DREAM3 challenges. We successfully demonstrate GeNOSA in several applications including deducing condition-dependent regulations, establishing high-consensus qGRNs and validating a sub-network experimentally for dose-response and time-course microarray data, and discovering and experimentally confirming a novel regulation of CRP on AscG. AVAILABILITY AND IMPLEMENTATION: All datasets and the GeNOSA framework are freely available from http://e045.life.nctu.edu.tw/GeNOSA. CONTACT: syho@mail.nctu.edu.tw SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Escherichia coli/genética , Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Saccharomyces cerevisiae/genética
10.
BMC Bioinformatics ; 16 Suppl 1: S8, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25708243

RESUMEN

BACKGROUND: Photosynthetic proteins (PSPs) greatly differ in their structure and function as they are involved in numerous subprocesses that take place inside an organelle called a chloroplast. Few studies predict PSPs from sequences due to their high variety of sequences and structues. This work aims to predict and characterize PSPs by establishing the datasets of PSP and non-PSP sequences and developing prediction methods. RESULTS: A novel bioinformatics method of predicting and characterizing PSPs based on scoring card method (SCMPSP) was used. First, a dataset consisting of 649 PSPs was established by using a Gene Ontology term GO:0015979 and 649 non-PSPs from the SwissProt database with sequence identity <= 25%.- Several prediction methods are presented based on support vector machine (SVM), decision tree J48, Bayes, BLAST, and SCM. The SVM method using dipeptide features-performed well and yielded - a test accuracy of 72.31%. The SCMPSP method uses the estimated propensity scores of 400 dipeptides - as PSPs and has a test accuracy of 71.54%, which is comparable to that of the SVM method. The derived propensity scores of 20 amino acids were further used to identify informative physicochemical properties for characterizing PSPs. The analytical results reveal the following four characteristics of PSPs: 1) PSPs favour hydrophobic side chain amino acids; 2) PSPs are composed of the amino acids prone to form helices in membrane environments; 3) PSPs have low interaction with water; and 4) PSPs prefer to be composed of the amino acids of electron-reactive side chains. CONCLUSIONS: The SCMPSP method not only estimates the propensity of a sequence to be PSPs, it also discovers characteristics that further improve understanding of PSPs. The SCMPSP source code and the datasets used in this study are available at http://iclab.life.nctu.edu.tw/SCMPSP/.


Asunto(s)
Proteínas de Cloroplastos/metabolismo , Biología Computacional/métodos , Fotosíntesis , Teorema de Bayes , Proteínas de Cloroplastos/química , Proteínas de Cloroplastos/genética , Bases de Datos de Proteínas , Dipéptidos/química , Dipéptidos/metabolismo , Ontología de Genes , Membranas Intracelulares/metabolismo , Estructura Secundaria de Proteína , Máquina de Vectores de Soporte , Agua/metabolismo
11.
BMC Bioinformatics ; 16 Suppl 18: S14, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26681483

RESUMEN

BACKGROUND: Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. RESULTS: This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. CONCLUSIONS: The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein complexes. The characterization analysis revealed that the average numbers of beta turns and hydrogen bonds at protein-protein interfaces in high binding affinity complexes are more than those in low binding affinity complexes.


Asunto(s)
Proteínas/química , Máquina de Vectores de Soporte , Área Bajo la Curva , Dimerización , Enlace de Hidrógeno , Análisis de Componente Principal , Unión Proteica , Mapas de Interacción de Proteínas , Estructura Terciaria de Proteína , Proteínas/metabolismo , Curva ROC
12.
BMC Bioinformatics ; 16: 54, 2015 Feb 21.
Artículo en Inglés | MEDLINE | ID: mdl-25881029

RESUMEN

BACKGROUND: Few studies have investigated prognostic biomarkers of distant metastases of lung cancer. One of the central difficulties in identifying biomarkers from microarray data is the availability of only a small number of samples, which results overtraining. Recently obtained evidence reveals that epithelial-mesenchymal transition (EMT) of tumor cells causes metastasis, which is detrimental to patients' survival. RESULTS: This work proposes a novel optimization approach to discovering EMT-related prognostic biomarkers to predict the distant metastasis of lung cancer using both microarray and survival data. This weighted objective function maximizes both the accuracy of prediction of distant metastasis and the area between the disease-free survival curves of the non-distant and distant metastases. Seventy-eight patients with lung cancer and a follow-up time of 120 months are used to identify a set of gene markers and an independent cohort of 26 patients is used to evaluate the identified biomarkers. The medical records of the 78 patients show a significant difference between the disease-free survival times of the 37 non-distant- and the 41 distant-metastasis patients. The experimental results thus obtained are as follows. 1) The use of disease-free survival curves can compensate for the shortcoming of insufficient samples and greatly increase the test accuracy by 11.10%; and 2) the support vector machine with a set of 17 transcripts, such as CCL16 and CDKN2AIP, can yield a leave-one-out cross-validation accuracy of 93.59%, a test accuracy of 76.92%, a large disease-free survival area of 74.81%, and a mean survival prediction error of 3.99 months. The identified putative biomarkers are examined using related studies and signaling pathways to reveal the potential effectiveness of the biomarkers in prospective confirmatory studies. CONCLUSIONS: The proposed new optimization approach to identifying prognostic biomarkers by combining multiple sources of data (microarray and survival) can facilitate the accurate selection of biomarkers that are most relevant to the disease while solving the problem of insufficient samples.


Asunto(s)
Adenocarcinoma/secundario , Biomarcadores de Tumor/genética , Carcinoma de Células Grandes/secundario , Carcinoma de Células Escamosas/secundario , Transición Epitelial-Mesenquimal , Neoplasias Pulmonares/patología , Análisis por Micromatrices , Adenocarcinoma/genética , Adenocarcinoma/mortalidad , Anciano , Carcinoma de Células Grandes/genética , Carcinoma de Células Grandes/mortalidad , Carcinoma de Células Escamosas/genética , Carcinoma de Células Escamosas/mortalidad , Femenino , Estudios de Seguimiento , Humanos , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/mortalidad , Masculino , Estadificación de Neoplasias , Pronóstico , Estudios Prospectivos , Transducción de Señal , Tasa de Supervivencia
13.
BMC Genomics ; 16 Suppl 12: S6, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26677931

RESUMEN

BACKGROUND: Identifying putative membrane transport proteins (MTPs) and understanding the transport mechanisms involved remain important challenges for the advancement of structural and functional genomics. However, the transporter characters are mainly acquired from MTP crystal structures which are hard to crystalize. Therefore, it is desirable to develop bioinformatics tools for the effective large-scale analysis of available sequences to identify novel transporters and characterize such transporters. RESULTS: This work proposes a novel method (SCMMTP) based on the scoring card method (SCM) using dipeptide composition to identify and characterize MTPs from an existing dataset containing 900 MTPs and 660 non-MTPs which are separated into a training dataset consisting 1,380 proteins and an independent dataset consisting 180 proteins. The SCMMTP produced estimating propensity scores for amino acids and dipeptides as MTPs. The SCMMTP training and test accuracy levels respectively reached 83.81% and 76.11%. The test accuracy of support vector machine (SVM) using a complicated classification method with a low possibility for biological interpretation and position-specific substitution matrix (PSSM) as a protein feature is 80.56%, thus SCMMTP is comparable to SVM-PSSM. To identify MTPs, SCMMTP is applied to three datasets including: 1) human transmembrane proteins, 2) a photosynthetic protein dataset, and 3) a human protein database. MTPs showing α-helix rich structure is agreed with previous studies. The MTPs used residues with low hydration energy. It is hypothesized that, after filtering substrates, the hydrated water molecules need to be released from the pore regions. CONCLUSIONS: SCMMTP yields estimating propensity scores for amino acids and dipeptides as MTPs, which can be used to identify novel MTPs and characterize transport mechanisms for use in further experiments. AVAILABILITY: http://iclab.life.nctu.edu.tw/iclab_webtools/SCMMTP/.


Asunto(s)
Biología Computacional/métodos , Dipéptidos/química , Proteínas de Transporte de Membrana/química , Proteínas de Transporte de Membrana/metabolismo , Algoritmos , Secuencia de Aminoácidos , Aminoácidos/química , Computadores Moleculares , Bases de Datos de Proteínas , Humanos , Modelos Moleculares , Puntaje de Propensión , Estructura Secundaria de Proteína
14.
BMC Bioinformatics ; 15 Suppl 16: S4, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25522279

RESUMEN

BACKGROUND: Heme binding proteins (HBPs) are metalloproteins that contain a heme ligand (an iron-porphyrin complex) as the prosthetic group. Several computational methods have been proposed to predict heme binding residues and thereby to understand the interactions between heme and its host proteins. However, few in silico methods for identifying HBPs have been proposed. RESULTS: This work proposes a scoring card method (SCM) based method (named SCMHBP) for predicting and analyzing HBPs from sequences. A balanced dataset of 747 HBPs (selected using a Gene Ontology term GO:0020037) and 747 non-HBPs (selected from 91,414 putative non-HBPs) with an identity of 25% was firstly established. Consequently, a set of scores that quantified the propensity of amino acids and dipeptides to be HBPs is estimated using SCM to maximize the predictive accuracy of SCMHBP. Finally, the informative physicochemical properties of 20 amino acids are identified by utilizing the estimated propensity scores to be used to categorize HBPs. The training and mean test accuracies of SCMHBP applied to three independent test datasets are 85.90% and 71.57%, respectively. SCMHBP performs well relative to comparison with such methods as support vector machine (SVM), decision tree J48, and Bayes classifiers. The putative non-HBPs with high sequence propensity scores are potential HBPs, which can be further validated by experimental confirmation. The propensity scores of individual amino acids and dipeptides are examined to elucidate the interactions between heme and its host proteins. The following characteristics of HBPs are derived from the propensity scores: 1) aromatic side chains are important to the effectiveness of specific HBP functions; 2) a hydrophobic environment is important in the interaction between heme and binding sites; and 3) the whole HBP has low flexibility whereas the heme binding residues are relatively flexible. CONCLUSIONS: SCMHBP yields knowledge that improves our understanding of HBPs rather than merely improves the prediction accuracy in predicting HBPs.


Asunto(s)
Proteínas Portadoras/metabolismo , Dipéptidos/metabolismo , Hemo/metabolismo , Hemoproteínas/metabolismo , Puntaje de Propensión , Programas Informáticos , Teorema de Bayes , Sitios de Unión , Proteínas Portadoras/química , Bases de Datos de Proteínas , Dipéptidos/química , Hemo/química , Proteínas de Unión al Hemo , Hemoproteínas/química , Humanos , Interacciones Hidrofóbicas e Hidrofílicas , Ligandos , Conformación Proteica , Máquina de Vectores de Soporte
15.
BMC Pulm Med ; 14: 80, 2014 May 08.
Artículo en Inglés | MEDLINE | ID: mdl-24885269

RESUMEN

BACKGROUND: Bronchial asthma influences some chronic diseases such as coronary heart disease, diabetes mellitus, and hypertension, but the impact of asthma on vital diseases such as chronic kidney disease is not yet verified. This study aims to clarify the association between bronchial asthma and the risk of developing chronic kidney disease. METHODS: The National Health Research Institute provided a database of one million random subjects for the study. A random sample of 141 064 patients aged ≥18 years without a history of kidney disease was obtained from the database. Among them, there were 35 086 with bronchial asthma and 105 258 without asthma matched for sex and age for a ration of 1:3. After adjusting for confounding risk factors, a Cox proportional hazards model was used to compare the risk of developing chronic kidney disease during a three-year follow-up period. RESULTS: Of the subjects with asthma, 2 196 (6.26%) developed chronic kidney disease compared to 4 120 (3.91%) of the control subjects. Cox proportional hazards regression analysis revealed that subjects with asthma were more likely to develop chronic kidney disease (hazard ratio [HR]: 1.56; 95% CI: 1.48-1.64; p < 0.001). After adjusting for sex, age, monthly income, urbanization level, geographic region, diabetes mellitus, hypertension, hyperlipidemia, and steroid use, the HR for asthma patients was 1.40 (95% CI: 1.33-1.48; p = 0.040). There was decreased HRs in steroid use (HR: 0.56; 95% CI: 0.62-0.61; p < 0.001) in the development of chronic kidney disease. Expectorants, bronchodilators, anti-muscarinic agents, airway smooth muscle relaxants, and leukotriene receptor antagonists may also be beneficial in attenuating the risk of chronic kidney disease. CONCLUSIONS: Patients with bronchial asthma may have increased risk of developing chronic kidney disease. The use of steroids or non-steroidal drugs in the treatment of asthma may attenuate this risk.


Asunto(s)
Asma/diagnóstico , Asma/epidemiología , Insuficiencia Renal Crónica/diagnóstico , Insuficiencia Renal Crónica/epidemiología , Corticoesteroides/uso terapéutico , Adulto , Distribución por Edad , Anciano , Asma/tratamiento farmacológico , Estudios de Cohortes , Comorbilidad , Bases de Datos Factuales , Femenino , Estudios de Seguimiento , Humanos , Incidencia , Estimación de Kaplan-Meier , Masculino , Persona de Mediana Edad , Oportunidad Relativa , Modelos de Riesgos Proporcionales , Estudios Retrospectivos , Medición de Riesgo , Índice de Severidad de la Enfermedad , Distribución por Sexo , Análisis de Supervivencia
16.
BMC Pediatr ; 14: 181, 2014 Jul 11.
Artículo en Inglés | MEDLINE | ID: mdl-25012668

RESUMEN

BACKGROUND: Non-infection caused urticaria is a common ailment in adolescents. Its symptoms (e.g., unusual rash appearance, limitation of daily activities, and recurrent itching) may contribute to the development of depressive stress in adolescents; the potential link has not been well studied. This study aimed to investigate the risk of major depression after a first-attack and non-infection caused urticaria. METHODS: This study used the Taiwan Longitudinal Health Insurance Database. A total of 5,755 adolescents hospitalized for a first-attack and non-infection caused urticaria from 2005 to 2009 were recruited as the study group, together with 17,265 matched non-urticarial enrollees who comprised the control group. Patients who had any history of urticaria or depression prior to the evaluation period were excluded. Each patient was followed for one year to identify the occurrence of depression. Cox proportional hazards models were generated to compute the risk of major depression, adjusting for the subjects' sociodemographic characteristics. Depression-free survival curves were also analyzed. RESULTS: Thirty-four (0.6%) adolescents with non-infection caused urticaria and 59 (0.3%) non-urticarial control subjects suffered a new-onset episode of major depression during the study period. The stratified Cox proportional analysis showed that the crude hazard ratio (HR) of depression among adolescents with urticaria was 1.73 times (95% CI, 1.13-2.64) than that of the control subjects without urticaria. Moreover, the HR were higher in physical (HR: 3.39, 95% CI 2.77-11.52) and allergy chronic urticaria (HR: 2.43, 95% CI 3.18-9.78). CONCLUSION: Individuals who have a non-infection caused urticaria during adolescence are at a higher risk of developing major depression.


Asunto(s)
Trastorno Depresivo Mayor/etiología , Urticaria/psicología , Adolescente , Estudios de Casos y Controles , Femenino , Estudios de Seguimiento , Humanos , Estimación de Kaplan-Meier , Masculino , Modelos de Riesgos Proporcionales , Estudios Retrospectivos , Factores de Riesgo , Taiwán
17.
ScientificWorldJournal ; 2014: 327306, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24955394

RESUMEN

The rapid and reliable identification of promoter regions is important when the number of genomes to be sequenced is increasing very speedily. Various methods have been developed but few methods investigate the effectiveness of sequence-based features in promoter prediction. This study proposes a knowledge acquisition method (named PromHD) based on if-then rules for promoter prediction in human and Drosophila species. PromHD utilizes an effective feature-mining algorithm and a reference feature set of 167 DNA sequence descriptors (DNASDs), comprising three descriptors of physicochemical properties (absorption maxima, molecular weight, and molar absorption coefficient), 128 top-ranked descriptors of 4-mer motifs, and 36 global sequence descriptors. PromHD identifies two feature subsets with 99 and 74 DNASDs and yields test accuracies of 96.4% and 97.5% in human and Drosophila species, respectively. Based on the 99- and 74-dimensional feature vectors, PromHD generates several if-then rules by using the decision tree mechanism for promoter prediction. The top-ranked informative rules with high certainty grades reveal that the global sequence descriptor, the length of nucleotide A at the first position of the sequence, and two physicochemical properties, absorption maxima and molecular weight, are effective in distinguishing promoters from non-promoters in human and Drosophila species, respectively.


Asunto(s)
Algoritmos , Drosophila/genética , Regiones Promotoras Genéticas/genética , Animales , Humanos
18.
NAR Genom Bioinform ; 6(1): lqae022, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38406797

RESUMEN

Breast cancer (BC) is one of the most commonly diagnosed cancers worldwide. As key regulatory molecules in several biological processes, microRNAs (miRNAs) are potential biomarkers for cancer. Understanding the miRNA markers that can detect BC may improve survival rates and develop new targeted therapeutic strategies. To identify a circulating miRNA signature for diagnostic prediction in patients with BC, we developed an evolutionary learning-based method called BSig. BSig established a compact set of miRNAs as potential markers from 1280 patients with BC and 2686 healthy controls retrieved from the serum miRNA expression profiles for the diagnostic prediction. BSig demonstrated outstanding prediction performance, with an independent test accuracy and area under the receiver operating characteristic curve were 99.90% and 0.99, respectively. We identified 12 miRNAs, including hsa-miR-3185, hsa-miR-3648, hsa-miR-4530, hsa-miR-4763-5p, hsa-miR-5100, hsa-miR-5698, hsa-miR-6124, hsa-miR-6768-5p, hsa-miR-6800-5p, hsa-miR-6807-5p, hsa-miR-642a-3p, and hsa-miR-6836-3p, which significantly contributed towards diagnostic prediction in BC. Moreover, through bioinformatics analysis, this study identified 65 miRNA-target genes specific to BC cell lines. A comprehensive gene-set enrichment analysis was also performed to understand the underlying mechanisms of these target genes. BSig, a tool capable of BC detection and facilitating therapeutic selection, is publicly available at https://github.com/mingjutsai/BSig.

19.
Cancer Imaging ; 24(1): 43, 2024 Mar 26.
Artículo en Inglés | MEDLINE | ID: mdl-38532511

RESUMEN

BACKGROUND: Automatic segmentation of hepatocellular carcinoma (HCC) on computed tomography (CT) scans is in urgent need to assist diagnosis and radiomics analysis. The aim of this study is to develop a deep learning based network to detect HCC from dynamic CT images. METHODS: Dynamic CT images of 595 patients with HCC were used. Tumors in dynamic CT images were labeled by radiologists. Patients were randomly divided into training, validation and test sets in a ratio of 5:2:3, respectively. We developed a hierarchical fusion strategy of deep learning networks (HFS-Net). Global dice, sensitivity, precision and F1-score were used to measure performance of the HFS-Net model. RESULTS: The 2D DenseU-Net using dynamic CT images was more effective for segmenting small tumors, whereas the 2D U-Net using portal venous phase images was more effective for segmenting large tumors. The HFS-Net model performed better, compared with the single-strategy deep learning models in segmenting small and large tumors. In the test set, the HFS-Net model achieved good performance in identifying HCC on dynamic CT images with global dice of 82.8%. The overall sensitivity, precision and F1-score were 84.3%, 75.5% and 79.6% per slice, respectively, and 92.2%, 93.2% and 92.7% per patient, respectively. The sensitivity in tumors < 2 cm, 2-3, 3-5 cm and > 5 cm were 72.7%, 92.9%, 94.2% and 100% per patient, respectively. CONCLUSIONS: The HFS-Net model achieved good performance in the detection and segmentation of HCC from dynamic CT images, which may support radiologic diagnosis and facilitate automatic radiomics analysis.


Asunto(s)
Carcinoma Hepatocelular , Aprendizaje Profundo , Neoplasias Hepáticas , Humanos , Procesamiento de Imagen Asistido por Computador , Vena Porta , Tomografía Computarizada por Rayos X
20.
BMC Bioinformatics ; 14 Suppl 16: S12, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24564437

RESUMEN

BACKGROUND: High-content screening (HCS) has become a powerful tool for drug discovery. However, the discovery of drugs targeting neurons is still hampered by the inability to accurately identify and quantify the phenotypic changes of multiple neurons in a single image (named multi-neuron image) of a high-content screen. Therefore, it is desirable to develop an automated image analysis method for analyzing multi-neuron images. RESULTS: We propose an automated analysis method with novel descriptors of neuromorphology features for analyzing HCS-based multi-neuron images, called HCS-neurons. To observe multiple phenotypic changes of neurons, we propose two kinds of descriptors which are neuron feature descriptor (NFD) of 13 neuromorphology features, e.g., neurite length, and generic feature descriptors (GFDs), e.g., Haralick texture. HCS-neurons can 1) automatically extract all quantitative phenotype features in both NFD and GFDs, 2) identify statistically significant phenotypic changes upon drug treatments using ANOVA and regression analysis, and 3) generate an accurate classifier to group neurons treated by different drug concentrations using support vector machine and an intelligent feature selection method. To evaluate HCS-neurons, we treated P19 neurons with nocodazole (a microtubule depolymerizing drug which has been shown to impair neurite development) at six concentrations ranging from 0 to 1000 ng/mL. The experimental results show that all the 13 features of NFD have statistically significant difference with respect to changes in various levels of nocodazole drug concentrations (NDC) and the phenotypic changes of neurites were consistent to the known effect of nocodazole in promoting neurite retraction. Three identified features, total neurite length, average neurite length, and average neurite area were able to achieve an independent test accuracy of 90.28% for the six-dosage classification problem. This NFD module and neuron image datasets are provided as a freely downloadable MatLab project at http://iclab.life.nctu.edu.tw/HCS-Neurons. CONCLUSIONS: Few automatic methods focus on analyzing multi-neuron images collected from HCS used in drug discovery. We provided an automatic HCS-based method for generating accurate classifiers to classify neurons based on their phenotypic changes upon drug treatments. The proposed HCS-neurons method is helpful in identifying and classifying chemical or biological molecules that alter the morphology of a group of neurons in HCS.


Asunto(s)
Ensayos Analíticos de Alto Rendimiento , Procesamiento de Imagen Asistido por Computador/métodos , Neuronas/efectos de los fármacos , Animales , Línea Celular , Procesamiento Automatizado de Datos , Ratones , Neuritas/efectos de los fármacos , Neuronas/citología , Nocodazol/farmacología , Fenotipo , Análisis de Regresión , Máquina de Vectores de Soporte
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA