Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
BMC Med Inform Decis Mak ; 22(1): 103, 2022 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-35428291

RESUMO

BACKGROUND: Clinical data repositories (CDR) including electronic health record (EHR) data have great potential for outcome prediction and risk modeling. We built a prediction tool integrated with CDR based on pattern discovery and demonstrated a case study on contrast related acute kidney injury (AKI). METHODS: Patients undergoing cardiac catheterization from January 2015 to April 2017 were included. AKI was identified based on Acute Kidney Injury Network definition. Predictive model including 16 variables covered in existing AKI models was built. A visual analytics tool based on pattern discovery was trained on 70% data up to August 2016 with three interactive knowledge incorporation modes to develop 3 models: (1) pure data-driven, (2) domain knowledge, and (3) clinician-interactive, which were tested and compared on 30% consecutive cases dated afterwards. RESULTS: Among 2560 patients in the final dataset, 189 (7.3%) had AKI. We measured 4 existing models, whose areas under curves (AUCs) of receiver operating characteristics curve for the test dataset were 0.70 (Mehran's), 0.72 (Chen's), 0.67 (Gao's) and 0.62 (AGEF), respectively. A pure data-driven machine learning method achieves AUC of 0.72 (Easy Ensemble). The AUCs of our 3 models are 0.77, 0.80, 0.82, respectively, with the last being top where physician knowledge is incorporated. CONCLUSIONS: We developed a novel pattern-discovery-based outcome prediction tool integrated with CDR and purely using EHR data. On the case of predicting contrast related AKI, the tool showed user-friendliness by physicians, and demonstrated a competitive performance in comparison with the state-of-the-art models.


Assuntos
Injúria Renal Aguda , Injúria Renal Aguda/induzido quimicamente , Injúria Renal Aguda/diagnóstico , Área Sob a Curva , Feminino , Humanos , Aprendizado de Máquina , Masculino , Prognóstico , Curva ROC , Estudos Retrospectivos , Fatores de Risco
2.
Genome Res ; 26(4): 440-50, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26888265

RESUMO

Identification of functional genetic variants and elucidation of their regulatory mechanisms represent significant challenges of the post-genomic era. A poorly understood topic is the involvement of genetic variants in mediating post-transcriptional RNA processing, including alternative splicing. Thus far, little is known about the genomic, evolutionary, and regulatory features of genetically modulated alternative splicing (GMAS). Here, we systematically identified intronic tag variants for genetic modulation of alternative splicing using RNA-seq data specific to cellular compartments. Combined with our previous method that identifies exonic tags for GMAS, this study yielded 622 GMAS exons. We observed that GMAS events are highly cell type independent, indicating that splicing-altering genetic variants could have widespread function across cell types. Interestingly, GMAS genes, exons, and single-nucleotide variants (SNVs) all demonstrated positive selection or accelerated evolution in primates. We predicted that GMAS SNVs often alter binding of splicing factors, with SRSF1 affecting the most GMAS events and demonstrating global allelic binding bias. However, in contrast to their GMAS targets, the predicted splicing factors are more conserved than expected, suggesting that cis-regulatory variation is the major driving force of splicing evolution. Moreover, GMAS-related splicing factors had stronger consensus motifs than expected, consistent with their susceptibility to SNV disruption. Intriguingly, GMAS SNVs in general do not alter the strongest consensus position of the splicing factor motif, except the more than 100 GMAS SNVs in linkage disequilibrium with polymorphisms reported by genome-wide association studies. Our study reports many GMAS events and enables a better understanding of the evolutionary and regulatory features of this phenomenon.


Assuntos
Processamento Alternativo , Evolução Molecular , Variação Genética , Proteínas/genética , Animais , Sítios de Ligação , Linhagem Celular , Biologia Computacional/métodos , Sequência Conservada , Éxons , Regulação da Expressão Gênica , Estudo de Associação Genômica Ampla , Humanos , Íntrons , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , Primatas/genética , Ligação Proteica , Proteínas/química , RNA/química , RNA/genética , Sequências Reguladoras de Ácido Nucleico , Reprodutibilidade dos Testes
3.
J Biomed Inform ; 66: 161-170, 2017 02.
Artigo em Inglês | MEDLINE | ID: mdl-28065840

RESUMO

OBJECTIVES: Major adverse cardiac events (MACE) of acute coronary syndrome (ACS) often occur suddenly resulting in high mortality and morbidity. Recently, the rapid development of electronic medical records (EMR) provides the opportunity to utilize the potential of EMR to improve the performance of MACE prediction. In this study, we present a novel data-mining based approach specialized for MACE prediction from a large volume of EMR data. METHODS: The proposed approach presents a new classification algorithm by applying both over-sampling and under-sampling on minority-class and majority-class samples, respectively, and integrating the resampling strategy into a boosting framework so that it can effectively handle imbalance of MACE of ACS patients analogous to domain practice. The method learns a new and stronger MACE prediction model each iteration from a more difficult subset of EMR data with wrongly predicted MACEs of ACS patients by a previous weak model. RESULTS: We verify the effectiveness of the proposed approach on a clinical dataset containing 2930 ACS patient samples with 268 feature types. While the imbalanced ratio does not seem extreme (25.7%), MACE prediction targets pose great challenge to traditional methods. As these methods degenerate dramatically with increasing imbalanced ratios, the performance of our approach for predicting MACE remains robust and reaches 0.672 in terms of AUC. On average, the proposed approach improves the performance of MACE prediction by 4.8%, 4.5%, 8.6% and 4.8% over the standard SVM, Adaboost, SMOTE, and the conventional GRACE risk scoring system for MACE prediction, respectively. CONCLUSIONS: We consider that the proposed iterative boosting approach has demonstrated great potential to meet the challenge of MACE prediction for ACS patients using a large volume of EMR.


Assuntos
Síndrome Coronariana Aguda/diagnóstico , Algoritmos , Registros Eletrônicos de Saúde , Mineração de Dados , Bases de Dados Factuais , Humanos
4.
BMC Med Inform Decis Mak ; 17(1): 47, 2017 Apr 20.
Artigo em Inglês | MEDLINE | ID: mdl-28427384

RESUMO

BACKGROUND: Clinical data repositories (CDR) have great potential to improve outcome prediction and risk modeling. However, most clinical studies require careful study design, dedicated data collection efforts, and sophisticated modeling techniques before a hypothesis can be tested. We aim to bridge this gap, so that clinical domain users can perform first-hand prediction on existing repository data without complicated handling, and obtain insightful patterns of imbalanced targets for a formal study before it is conducted. We specifically target for interpretability for domain users where the model can be conveniently explained and applied in clinical practice. METHODS: We propose an interpretable pattern model which is noise (missing) tolerant for practice data. To address the challenge of imbalanced targets of interest in clinical research, e.g., deaths less than a few percent, the geometric mean of sensitivity and specificity (G-mean) optimization criterion is employed, with which a simple but effective heuristic algorithm is developed. RESULTS: We compared pattern discovery to clinically interpretable methods on two retrospective clinical datasets. They contain 14.9% deaths in 1 year in the thoracic dataset and 9.1% deaths in the cardiac dataset, respectively. In spite of the imbalance challenge shown on other methods, pattern discovery consistently shows competitive cross-validated prediction performance. Compared to logistic regression, Naïve Bayes, and decision tree, pattern discovery achieves statistically significant (p-values < 0.01, Wilcoxon signed rank test) favorable averaged testing G-means and F1-scores (harmonic mean of precision and sensitivity). Without requiring sophisticated technical processing of data and tweaking, the prediction performance of pattern discovery is consistently comparable to the best achievable performance. CONCLUSIONS: Pattern discovery has demonstrated to be robust and valuable for target prediction on existing clinical data repositories with imbalance and noise. The prediction results and interpretable patterns can provide insights in an agile and inexpensive way for the potential formal studies.


Assuntos
Simulação por Computador , Mineração de Dados/métodos , Bases de Dados como Assunto/organização & administração , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Heurística Computacional , Previsões , Sistemas de Informação em Saúde/organização & administração
5.
Clin Chem ; 61(1): 221-30, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25376581

RESUMO

BACKGROUND: Extracellular RNAs (exRNAs) in human body fluids are emerging as effective biomarkers for detection of diseases. Saliva, as the most accessible and noninvasive body fluid, has been shown to harbor exRNA biomarkers for several human diseases. However, the entire spectrum of exRNA from saliva has not been fully characterized. METHODS: Using high-throughput RNA sequencing (RNA-Seq), we conducted an in-depth bioinformatic analysis of noncoding RNAs (ncRNAs) in human cell-free saliva (CFS) from healthy individuals, with a focus on microRNAs (miRNAs), piwi-interacting RNAs (piRNAs), and circular RNAs (circRNAs). RESULTS: Our data demonstrated robust reproducibility of miRNA and piRNA profiles across individuals. Furthermore, individual variability of these salivary RNA species was highly similar to those in other body fluids or cellular samples, despite the direct exposure of saliva to environmental impacts. By comparative analysis of >90 RNA-Seq data sets of different origins, we observed that piRNAs were surprisingly abundant in CFS compared with other body fluid or intracellular samples, with expression levels in CFS comparable to those found in embryonic stem cells and skin cells. Conversely, miRNA expression profiles in CFS were highly similar to those in serum and cerebrospinal fluid. Using a customized bioinformatics method, we identified >400 circRNAs in CFS. These data represent the first global characterization and experimental validation of circRNAs in any type of extracellular body fluid. CONCLUSIONS: Our study provides a comprehensive landscape of ncRNA species in human saliva that will facilitate further biomarker discoveries and lay a foundation for future studies related to ncRNAs in human saliva.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , MicroRNAs/análise , RNA Interferente Pequeno/análise , RNA/análise , Saliva/química , Análise de Sequência de RNA/métodos , Sequência de Bases , Biomarcadores/análise , Humanos , MicroRNAs/sangue , MicroRNAs/líquido cefalorraquidiano , MicroRNAs/genética , Dados de Sequência Molecular , RNA/sangue , RNA/líquido cefalorraquidiano , RNA/genética , RNA Circular , RNA Interferente Pequeno/sangue , RNA Interferente Pequeno/líquido cefalorraquidiano , RNA Interferente Pequeno/genética , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
6.
Nucleic Acids Res ; 41(16): e153, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23814189

RESUMO

Protein-binding microarray (PBM) is a high-throughout platform that can measure the DNA-binding preference of a protein in a comprehensive and unbiased manner. A typical PBM experiment can measure binding signal intensities of a protein to all the possible DNA k-mers (k=8∼10); such comprehensive binding affinity data usually need to be reduced and represented as motif models before they can be further analyzed and applied. Since proteins can often bind to DNA in multiple modes, one of the major challenges is to decompose the comprehensive affinity data into multimodal motif representations. Here, we describe a new algorithm that uses Hidden Markov Models (HMMs) and can derive precise and multimodal motifs using belief propagations. We describe an HMM-based approach using belief propagations (kmerHMM), which accepts and preprocesses PBM probe raw data into median-binding intensities of individual k-mers. The k-mers are ranked and aligned for training an HMM as the underlying motif representation. Multiple motifs are then extracted from the HMM using belief propagations. Comparisons of kmerHMM with other leading methods on several data sets demonstrated its effectiveness and uniqueness. Especially, it achieved the best performance on more than half of the data sets. In addition, the multiple binding modes derived by kmerHMM are biologically meaningful and will be useful in interpreting other genome-wide data such as those generated from ChIP-seq. The executables and source codes are available at the authors' websites: e.g. http://www.cs.toronto.edu/∼wkc/kmerHMM.


Assuntos
Proteínas de Ligação a DNA/metabolismo , DNA/química , Análise Serial de Proteínas , Análise de Sequência de DNA/métodos , Fatores de Transcrição/metabolismo , Algoritmos , Animais , Sítios de Ligação , DNA/metabolismo , Cadeias de Markov , Camundongos , Motivos de Nucleotídeos
7.
Nucleic Acids Res ; 40(19): 9392-403, 2012 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-22904079

RESUMO

In protein-DNA interactions, particularly transcription factor (TF) and transcription factor binding site (TFBS) bindings, associated residue variations form patterns denoted as subtypes. Subtypes may lead to changed binding preferences, distinguish conserved from flexible binding residues and reveal novel binding mechanisms. However, subtypes must be studied in the context of core bindings. While solving 3D structures would require huge experimental efforts, recent sequence-based associated TF-TFBS pattern discovery has shown to be promising, upon which a large-scale subtype study is possible and desirable. In this article, we investigate residue-varying subtypes based on associated TF-TFBS patterns. By re-categorizing the patterns with respect to varying TF amino acids, statistically significant (P values ≤ 0.005) subtypes leading to varying TFBS patterns are discovered without using TF family or domain annotations. Resultant subtypes have various biological meanings. The subtypes reflect familial and functional properties and exhibit changed binding preferences supported by 3D structures. Conserved residues critical for maintaining TF-TFBS bindings are revealed by analyzing the subtypes. In-depth analysis on the subtype pair PKVVIL-CACGTG versus PKVEIL-CAGCTG shows the V/E variation is indicative for distinguishing Myc from MRF families. Discovered from sequences only, the TF-TFBS subtypes are informative and promising for more biological findings, complementing and extending recent one-sided subtype and familial studies with comprehensive evidence.


Assuntos
DNA/química , Fatores de Transcrição/química , Fatores de Transcrição/classificação , Sítios de Ligação , Imunoprecipitação da Cromatina , DNA/metabolismo , Bases de Dados de Proteínas , Modelos Moleculares , Motivos de Nucleotídeos , Matrizes de Pontuação de Posição Específica , Ligação Proteica , Análise de Sequência de DNA , Fatores de Transcrição/metabolismo
8.
BMC Bioinformatics ; 14: 198, 2013 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-23777239

RESUMO

BACKGROUND: Microarray technology is widely used in cancer diagnosis. Successfully identifying gene biomarkers will significantly help to classify different cancer types and improve the prediction accuracy. The regularization approach is one of the effective methods for gene selection in microarray data, which generally contain a large number of genes and have a small number of samples. In recent years, various approaches have been developed for gene selection of microarray data. Generally, they are divided into three categories: filter, wrapper and embedded methods. Regularization methods are an important embedded technique and perform both continuous shrinkage and automatic gene selection simultaneously. Recently, there is growing interest in applying the regularization techniques in gene selection. The popular regularization technique is Lasso (L1), and many L1 type regularization terms have been proposed in the recent years. Theoretically, the Lq type regularization with the lower value of q would lead to better solutions with more sparsity. Moreover, the L1/2 regularization can be taken as a representative of Lq (0

Assuntos
Regulação da Expressão Gênica , Modelos Logísticos , Neoplasias/classificação , Neoplasias/genética , Algoritmos , Marcadores Genéticos , Humanos , Neoplasias/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos/métodos
9.
Apoptosis ; 17(8): 842-51, 2012 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-22610480

RESUMO

Gamboge is a traditional Chinese medicine and our previous study showed that gambogic acid and gambogenic acid suppress the proliferation of HCC cells. In the present study, another active component, 1,3,6,7-tetrahydroxyxanthone (TTA), was identified to effectively suppress HCC cell growth. In addition, our Hoechst-PI staining and flow cytometry analyses indicated that TTA induced apoptosis in HCC cells. In order to identify the targets of TTA in HCC cells, a two-dimensional gel electrophoresis was performed, and proteins in different expressions were identified by MALDA-TOF MS and MS/MS analyses. In summary, eighteen proteins with different expressions were identified in which twelve were up-regulated and six were down-regulated. Among them, the four most distinctively expressed proteins were further studied and validated by western blotting. The ß-tubulin and translationally controlled tumor protein were decreased while the 14-3-3σ and P16 protein expressions were up-regulated. In addition, TTA suppressed tumorigenesis partially through P16-pRb signaling. 14-3-3σ silence reversed the suppressive effect of cell growth and apoptosis induced by introducing TTA. In conclusion, TTA effectively suppressed cell growth through, at least partially, up-regulation of P16 and 14-3-3σ.


Assuntos
Antineoplásicos Fitogênicos/farmacologia , Apoptose/efeitos dos fármacos , Carcinoma Hepatocelular/tratamento farmacológico , Medicamentos de Ervas Chinesas/farmacologia , Neoplasias Hepáticas/tratamento farmacológico , Proteoma/metabolismo , Xantonas/farmacologia , Proteínas 14-3-3/genética , Proteínas 14-3-3/metabolismo , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo , Carcinoma Hepatocelular/metabolismo , Carcinoma Hepatocelular/patologia , Linhagem Celular Tumoral , Proliferação de Células/efeitos dos fármacos , Sobrevivência Celular/efeitos dos fármacos , Inibidor p16 de Quinase Dependente de Ciclina/genética , Inibidor p16 de Quinase Dependente de Ciclina/metabolismo , Exonucleases/genética , Exonucleases/metabolismo , Exorribonucleases , Garcinia/química , Expressão Gênica/efeitos dos fármacos , Técnicas de Silenciamento de Genes , Humanos , Neoplasias Hepáticas/metabolismo , Neoplasias Hepáticas/patologia , Proteoma/genética , Proteômica , Interferência de RNA , Transdução de Sinais
10.
Bioinformatics ; 27(4): 471-8, 2011 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-21193520

RESUMO

MOTIVATION: The bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) are fundamental protein-DNA interactions in transcriptional regulation. Extensive efforts have been made to better understand the protein-DNA interactions. Recent mining on exact TF-TFBS-associated sequence patterns (rules) has shown great potentials and achieved very promising results. However, exact rules cannot handle variations in real data, resulting in limited informative rules. In this article, we generalize the exact rules to approximate ones for both TFs and TFBSs, which are essential for biological variations. RESULTS: A progressive approach is proposed to address the approximation to alleviate the computational requirements. Firstly, similar TFBSs are grouped from the available TF-TFBS data (TRANSFAC database). Secondly, approximate and highly conserved binding cores are discovered from TF sequences corresponding to each TFBS group. A customized algorithm is developed for the specific objective. We discover the approximate TF-TFBS rules by associating the grouped TFBS consensuses and TF cores. The rules discovered are evaluated by matching (verifying with) the actual protein-DNA binding pairs from Protein Data Bank (PDB) 3D structures. The approximate results exhibit many more verified rules and up to 300% better verification ratios than the exact ones. The customized algorithm achieves over 73% better verification ratios than traditional methods. Approximate rules (64-79%) are shown statistically significant. Detailed variation analysis and conservation verification on NCBI records demonstrate that the approximate rules reveal both the flexible and specific protein-DNA interactions accurately. The approximate TF-TFBS rules discovered show great generalized capability of exploring more informative binding rules.


Assuntos
Algoritmos , Proteínas de Ligação a DNA/genética , DNA/genética , Fatores de Transcrição/genética , Sequência de Bases , Sítios de Ligação , Biologia Computacional/métodos , DNA/metabolismo , Proteínas de Ligação a DNA/metabolismo , Regulação da Expressão Gênica , Ligação Proteica , Estrutura Terciária de Proteína , Fatores de Transcrição/metabolismo
11.
Nucleic Acids Res ; 38(19): 6324-37, 2010 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-20529874

RESUMO

Protein-DNA bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) play an essential role in transcriptional regulation. Over the past decades, significant efforts have been made to study the principles for protein-DNA bindings. However, it is considered that there are no simple one-to-one rules between amino acids and nucleotides. Many methods impose complicated features beyond sequence patterns. Protein-DNA bindings are formed from associated amino acid and nucleotide sequence pairs, which determine many functional characteristics. Therefore, it is desirable to investigate associated sequence patterns between TFs and TFBSs. With increasing computational power, availability of massive experimental databases on DNA and proteins, and mature data mining techniques, we propose a framework to discover associated TF-TFBS binding sequence patterns in the most explicit and interpretable form from TRANSFAC. The framework is based on association rule mining with Apriori algorithm. The patterns found are evaluated by quantitative measurements at several levels on TRANSFAC. With further independent verifications from literatures, Protein Data Bank and homology modeling, there are strong evidences that the patterns discovered reveal real TF-TFBS bindings across different TFs and TFBSs, which can drive for further knowledge to better understand TF-TFBS bindings.


Assuntos
Proteínas de Ligação a DNA/química , DNA/química , Mineração de Dados/métodos , Elementos Reguladores de Transcrição , Análise de Sequência de DNA , Fatores de Transcrição/química , Algoritmos , Sítios de Ligação , DNA/metabolismo , Proteínas de Ligação a DNA/metabolismo , Bases de Dados Genéticas , Homologia Estrutural de Proteína , Fatores de Transcrição/metabolismo
12.
Artigo em Inglês | MEDLINE | ID: mdl-33488072

RESUMO

BACKGROUND: The Manchester Respiratory Activities of Daily Living Questionnaire (MRADLQ) is a valid and reliable tool measuring the functional level of patients with COPD in multidimensional aspects. However, a local validation of the questionnaire is lacking in Hong Kong. OBJECTIVE: To develop a Chinese version of MRADLQ with pictorial enhancement (C-MRADLQ) and study its reliability and validity. PATIENTS AND METHODS: A total of 238 patients suffering from COPD were recruited from nine public hospitals and five Nurse and Allied Health Respiratory Clinics by convenient sampling. A total of 64 patients with normal spirometry results and no previous clinical diagnosis of COPD were invited to complete the C-MRADLQ for comparison and examination of its validity. Ten out of 302 patients were re-assessed with the C-MRADLQ after one week by the same rater for test-retest reliability. The C-MRADLQ was correlated with spirometry result, COPD classifications and groups by Global Initiative for Chronic Obstructive Lung Disease (GOLD), the modified Medical Research Council Dyspnea Scale (mMRC Dyspnea Scale), COPD Assessment Test (CAT), Chinese Version of the Shortness of Breath Questionnaire (C-SOBQ), number of admission and the ADO index. RESULTS: The C-MRADLQ shows good test-retest reliability as indicated by an intra-class correlation coefficient value of 0.975. It is significantly correlated with COPD stage, COPD group, SOBQ score, CAT score, mMRC, ADO index, spirometry results, and number of admissions. The SOBQ score, number of admissions, FEV1/FVC, and COPD group could significantly predict the total C-MRADLQ score. A total of 67.9% of participants' mMRC levels were correctly classified by using the C-MRADLQ total score. The agreement of the original and new versions of questions 20 and 21 of C-MRADLQ was 97.3% and 90.1%, respectively. CONCLUSION: The pictorial version of the C-MRADLQ is a validated and reliable functional assessment tool to measure functional status among patients with COPD in the Chinese population.


Assuntos
Doença Pulmonar Obstrutiva Crônica , Atividades Cotidianas , China , Dispneia/diagnóstico , Hong Kong , Humanos , Doença Pulmonar Obstrutiva Crônica/diagnóstico , Reprodutibilidade dos Testes , Índice de Gravidade de Doença , Inquéritos e Questionários
13.
BMC Bioinformatics ; 10: 321, 2009 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-19811641

RESUMO

BACKGROUND: Identification of transcription factor binding sites (TFBSs) is a central problem in Bioinformatics on gene regulation. de novo motif discovery serves as a promising way to predict and better understand TFBSs for biological verifications. Real TFBSs of a motif may vary in their widths and their conservation degrees within a certain range. Deciding a single motif width by existing models may be biased and misleading. Additionally, multiple, possibly overlapping, candidate motifs are desired and necessary for biological verification in practice. However, current techniques either prohibit overlapping TFBSs or lack explicit control of different motifs. RESULTS: We propose a new generalized model to tackle the motif widths by considering and evaluating a width range of interest simultaneously, which should better address the width uncertainty. Moreover, a meta-convergence framework for genetic algorithms (GAs), is proposed to provide multiple overlapping optimal motifs simultaneously in an effective and flexible way. Users can easily specify the difference amongst expected motif kinds via similarity test. Incorporating Genetic Algorithm with Local Filtering (GALF) for searching, the new GALF-G (G for generalized) algorithm is proposed based on the generalized model and meta-convergence framework. CONCLUSION: GALF-G was tested extensively on over 970 synthetic, real and benchmark datasets, and is usually better than the state-of-the-art methods. The range model shows an increase in sensitivity compared with the single-width ones, while providing competitive precisions on the E. coli benchmark. Effectiveness can be maintained even using a very small population, exhibiting very competitive efficiency. In discovering multiple overlapping motifs in a real liver-specific dataset, GALF-G outperforms MEME by up to 73% in overall F-scores. GALF-G also helps to discover an additional motif which has probably not been annotated in the dataset. http://www.cse.cuhk.edu.hk/%7Etmchan/GALFG/


Assuntos
Biologia Computacional/métodos , Fatores de Transcrição/química , Fatores de Transcrição/metabolismo , Algoritmos , Sítios de Ligação , Análise de Sequência de DNA
14.
Bioinformatics ; 24(3): 341-9, 2008 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-18065426

RESUMO

MOTIVATION: Identification of transcription factor binding sites (TFBSs) plays an important role in deciphering the mechanisms of gene regulation. Recently, GAME, a Genetic Algorithm (GA)-based approach with iterative post-processing, has shown superior performance in TFBS identification. However, the basic GA in GAME is not elaborately designed, and may be trapped in local optima in real problems. The feature operators are only applied in the post-processing, but the final performance heavily depends on the GA output. Hence, both effectiveness and efficiency of the overall algorithm can be improved by introducing more advanced representations and novel operators in the GA, as well as designing the post-processing in an adaptive way. RESULTS: We propose a novel framework GALF-P, consisting of Genetic Algorithm with Local Filtering (GALF) and adaptive post-processing techniques (-P), to achieve both effectiveness and efficiency for TFBS identification. GALF combines the position-led and consensus-led representations used separately in current GAs and employs a novel local filtering operator to get rid of false positives within an individual efficiently during the evolutionary process in the GA. Pre-selection is used to maintain diversity and avoid local optima. Post-processing with adaptive adding and removing is developed to handle general cases with arbitrary numbers of instances per sequence. GALF-P shows superior performance to GAME, MEME, BioProspector and BioOptimizer on synthetic datasets with difficult scenarios and real test datasets. GALF-P is also more robust and reliable when further compared with GAME, the current state-of-the-art approach. AVAILABILITY: http://www.cse.cuhk.edu.hk/~tmchan/GALFP/.


Assuntos
Algoritmos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Fatores de Transcrição/genética , Sequência de Bases , Sítios de Ligação , Dados de Sequência Molecular , Ligação Proteica
15.
Stud Health Technol Inform ; 245: 398-402, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29295124

RESUMO

Clinical risk prediction of acute coronary syndrome (ACS) plays a critical role for clinical decision support, treatment management and quality of care assessment in ACS patients. Admission records contain a wealth of patient information in the early stages of hospitalization, which offers the opportunity to support the ACS risk prediction in a proactive manner. However, ACS patient risks aren't recorded in hospital admission records, thus impeding the construction of supervised risk prediction models. In our study, we propose a novel approach for ACS risk prediction, which employs a well-known ACS risk prediction model (GRACE) as the benchmark methods to stratify patient risks, and then utilizes a state-of-the-art supervised machine learning algorithm to establish our risk prediction models. The experiment was conducted with a collection of 3,643 ACS patient samples from a Chinese hospital. Our best model achieved 0.616 accuracy for risk prediction, which indicates our learned model can achieve a better performance than the benchmark GRACE model and can obtain significant improvement by mixing up patient samples that were manually labeled risks.


Assuntos
Síndrome Coronariana Aguda , Algoritmos , Medição de Risco , Hospitalização , Humanos , Prognóstico , Fatores de Risco
16.
J Healthc Eng ; 2017: 6493016, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29065631

RESUMO

Electronic Health Record (EHR) system enables clinical decision support. In this study, a set of 112 abdominal computed tomography imaging examination reports, consisting of 59 cases of hepatocellular carcinoma (HCC) or liver metastases (so-called HCC group for simplicity) and 53 cases with no abnormality detected (NAD group), were collected from four hospitals in Hong Kong. We extracted terms related to liver cancer from the reports and mapped them to ontological features using Systematized Nomenclature of Medicine (SNOMED) Clinical Terms (CT). The primary predictor panel was formed by these ontological features. Association levels between every two features in the HCC and NAD groups were quantified using Pearson's correlation coefficient. The HCC group reveals a distinct association pattern that signifies liver cancer and provides clinical decision support for suspected cases, motivating the inclusion of new features to form the augmented predictor panel. Logistic regression analysis with stepwise forward procedure was applied to the primary and augmented predictor sets, respectively. The obtained model with the new features attained 84.7% sensitivity and 88.4% overall accuracy in distinguishing HCC from NAD cases, which were significantly improved when compared with that without the new features.


Assuntos
Carcinoma Hepatocelular/fisiopatologia , Sistemas de Apoio a Decisões Clínicas , Registros Eletrônicos de Saúde , Neoplasias Hepáticas/fisiopatologia , Algoritmos , Hong Kong , Humanos , Systematized Nomenclature of Medicine , Tomografia Computadorizada por Raios X
17.
Artigo em Inglês | MEDLINE | ID: mdl-27649220

RESUMO

BACKGROUND: Clinical major adverse cardiovascular event (MACE) prediction of acute coronary syndrome (ACS) is important for a number of applications including physician decision support, quality of care assessment, and efficient healthcare service delivery on ACS patients. Admission records, as typical media to contain clinical information of patients at the early stage of their hospitalizations, provide significant potential to be explored for MACE prediction in a proactive manner. METHODS: We propose a hybrid approach for MACE prediction by utilizing a large volume of admission records. Firstly, both a rule-based medical language processing method and a machine learning method (i.e., Conditional Random Fields (CRFs)) are developed to extract essential patient features from unstructured admission records. After that, state-of-the-art supervised machine learning algorithms are applied to construct MACE prediction models from data. RESULTS: We comparatively evaluate the performance of the proposed approach on a real clinical dataset consisting of 2930 ACS patient samples collected from a Chinese hospital. Our best model achieved 72% AUC in MACE prediction. In comparison of the performance between our models and two well-known ACS risk score tools, i.e., GRACE and TIMI, our learned models obtain better performances with a significant margin. CONCLUSIONS: Experimental results reveal that our approach can obtain competitive performance in MACE prediction. The comparison of classifiers indicates the proposed approach has a competitive generality with datasets extracted by different feature extraction methods. Furthermore, our MACE prediction model obtained a significant improvement by comparison with both GRACE and TIMI. It indicates that using admission records can effectively provide MACE prediction service for ACS patients at the early stage of their hospitalizations.


Assuntos
Síndrome Coronariana Aguda/epidemiologia , Síndrome Coronariana Aguda/fisiopatologia , Algoritmos , Indicadores Básicos de Saúde , Admissão do Paciente/estatística & dados numéricos , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Prognóstico , Medição de Risco
18.
Mol Endocrinol ; 30(2): 254-71, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26745669

RESUMO

Male vertebrate social displays vary from physically simple to complex, with the latter involving exquisite motor command of the body and appendages. Studies of these displays have, in turn, provided substantial insight into neuromotor mechanisms. The neotropical golden-collared manakin (Manacus vitellinus) has been used previously as a model to investigate intricate motor skills because adult males of this species perform an acrobatic and androgen-dependent courtship display. To support this behavior, these birds express elevated levels of androgen receptors (AR) in their skeletal muscles. Here we use RNA sequencing to explore how testosterone (T) modulates the muscular transcriptome to support male manakin courtship displays. In addition, we explore how androgens influence gene expression in the muscles of the zebra finch (Taenopygia guttata), a model passerine bird with a limited courtship display and minimal muscle AR. We identify androgen-dependent, muscle-specific gene regulation in both species. In addition, we identify manakin-specific effects that are linked to muscle use during the manakin display, including androgenic regulation of genes associated with muscle fiber contractility, cellular homeostasis, and energetic efficiency. Overall, our results point to numerous genes and gene networks impacted by androgens in male birds, including some that underlie optimal muscle function necessary for performing acrobatic display routines. Manakins are excellent models to explore gene regulation promoting athletic ability.


Assuntos
Androgênios/farmacologia , Atletas , Pesquisa Biomédica , Aves/genética , Músculo Esquelético/metabolismo , Transcriptoma/efeitos dos fármacos , Animais , Corte , Perfilação da Expressão Gênica , Regulação da Expressão Gênica/efeitos dos fármacos , Ontologia Genética , Redes Reguladoras de Genes/efeitos dos fármacos , Humanos , Masculino , Anotação de Sequência Molecular , Músculo Esquelético/efeitos dos fármacos , Análise de Componente Principal , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Receptores Androgênicos/genética , Receptores Androgênicos/metabolismo , Análise de Sequência de RNA , Transcriptoma/genética
19.
Artigo em Inglês | MEDLINE | ID: mdl-26357085

RESUMO

Understanding binding cores is of fundamental importance in deciphering Protein-DNA (TF-TFBS) binding and for the deep understanding of gene regulation. Traditionally, binding cores are identified in resolved high-resolution 3D structures. However, it is expensive, labor-intensive and time-consuming to obtain these structures. Hence, it is promising to discover binding cores computationally on a large scale. Previous studies successfully applied association rule mining to discover binding cores from TF-TFBS binding sequence data only. Despite the successful results, there are limitations such as the use of tight support and confidence thresholds, the distortion by statistical bias in counting pattern occurrences, and the lack of a unified scheme to rank TF-TFBS associated patterns. In this study, we proposed an association rule mining algorithm incorporating statistical measures and ranking to address these limitations. Experimental results demonstrated that, even when the threshold on support was lowered to one-tenth of the value used in previous studies, a satisfactory verification ratio was consistently observed under different confidence levels. Moreover, we proposed a novel ranking scheme for TF-TFBS associated patterns based on p-values and co-support values. By comparing with other discovery approaches, the effectiveness of our algorithm was demonstrated. Eighty-four binding cores with PDB support are uniquely identified.


Assuntos
Sítios de Ligação , Biologia Computacional/métodos , Proteínas de Ligação a DNA/química , DNA/química , Modelos Estatísticos , Algoritmos , DNA/metabolismo , Proteínas de Ligação a DNA/metabolismo , Mineração de Dados , Ligação Proteica
20.
Artigo em Inglês | MEDLINE | ID: mdl-24091402

RESUMO

Understanding protein-DNA interactions, specifically transcription factor (TF) and transcription factor binding site (TFBS) bindings, is crucial in deciphering gene regulation. The recent associated TF-TFBS pattern discovery combines one-sided motif discovery on both the TF and the TFBS sides. Using sequences only, it identifies the short protein-DNA binding cores available only in high-resolution 3D structures. The discovered patterns lead to promising subtype and disease analysis applications. While the related studies use either association rule mining or existing TFBS annotations, none has proposed any formal unified (both-sided) model to prioritize the top verifiable associated patterns. We propose the unified scores and develop an effective pipeline for associated TF-TFBS pattern discovery. Our stringent instance-level evaluations show that the patterns with the top unified scores match with the binding cores in 3D structures considerably better than the previous works, where up to 90 percent of the top 20 scored patterns are verified. We also introduce extended verification from literature surveys, where the high unified scores correspond to even higher verification percentage. The top scored patterns are confirmed to match the known WRKY binding cores with no available 3D structures and agree well with the top binding affinities of in vivo experiments.


Assuntos
Sítios de Ligação , Biologia Computacional/métodos , DNA/química , Fatores de Transcrição/química , Algoritmos , DNA/metabolismo , Bases de Dados de Proteínas , Modelos Moleculares , Ligação Proteica , Fatores de Transcrição/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA