Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
Int J Comput Biol Drug Des ; 4(4): 307-15, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-22199032

RESUMO

Analysis of gene expression microarray datasets presents the high risk of over-fitting (spurious patterns) because of their feature-rich but case-poor nature. This paper describes our ongoing efforts to develop a method to combat over-fitting and determine the strongest signal in the dataset. A GA-SVM hybrid along with Gaussian noise (manual noise gain) is used to discover feature sets of minimal size that accurately classifies the cases under cross-validation. Initial results on a colorectal cancer dataset shows that the strongest signal (modest number of candidates) can be found by a binary search.


Assuntos
Neoplasias Colorretais/genética , Regulação Neoplásica da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos , Máquina de Vetores de Suporte , Algoritmos , Perfilação da Expressão Gênica/métodos , Humanos , Distribuição Normal
2.
Biomed Eng Online ; 10: 97, 2011 Nov 08.
Artigo em Inglês | MEDLINE | ID: mdl-22067671

RESUMO

BACKGROUND: Statistical learning (SL) techniques can address non-linear relationships and small datasets but do not provide an output that has an epidemiologic interpretation. METHODS: A small set of clinical variables (CVs) for stage-1 non-small cell lung cancer patients was used to evaluate an approach for using SL methods as a preprocessing step for survival analysis. A stochastic method of training a probabilistic neural network (PNN) was used with differential evolution (DE) optimization. Survival scores were derived stochastically by combining CVs with the PNN. Patients (n = 151) were dichotomized into favorable (n = 92) and unfavorable (n = 59) survival outcome groups. These PNN derived scores were used with logistic regression (LR) modeling to predict favorable survival outcome and were integrated into the survival analysis (i.e. Kaplan-Meier analysis and Cox regression). The hybrid modeling was compared with the respective modeling using raw CVs. The area under the receiver operating characteristic curve (Az) was used to compare model predictive capability. Odds ratios (ORs) and hazard ratios (HRs) were used to compare disease associations with 95% confidence intervals (CIs). RESULTS: The LR model with the best predictive capability gave Az = 0.703. While controlling for gender and tumor grade, the OR = 0.63 (CI: 0.43, 0.91) per standard deviation (SD) increase in age indicates increasing age confers unfavorable outcome. The hybrid LR model gave Az = 0.778 by combining age and tumor grade with the PNN and controlling for gender. The PNN score and age translate inversely with respect to risk. The OR = 0.27 (CI: 0.14, 0.53) per SD increase in PNN score indicates those patients with decreased score confer unfavorable outcome. The tumor grade adjusted hazard for patients above the median age compared with those below the median was HR = 1.78 (CI: 1.06, 3.02), whereas the hazard for those patients below the median PNN score compared to those above the median was HR = 4.0 (CI: 2.13, 7.14). CONCLUSION: We have provided preliminary evidence showing that the SL preprocessing may provide benefits in comparison with accepted approaches. The work will require further evaluation with varying datasets to confirm these findings.


Assuntos
Neoplasias Pulmonares , Estatística como Assunto/métodos , Idoso , Carcinoma Pulmonar de Células não Pequenas/diagnóstico , Feminino , Humanos , Estimativa de Kaplan-Meier , Modelos Logísticos , Neoplasias Pulmonares/diagnóstico , Masculino , Pessoa de Meia-Idade , Redes Neurais de Computação , Dinâmica não Linear , Prognóstico , Processos Estocásticos
3.
BMC Bioinformatics ; 12: 37, 2011 Jan 27.
Artigo em Inglês | MEDLINE | ID: mdl-21272346

RESUMO

BACKGROUND: When investigating covariate interactions and group associations with standard regression analyses, the relationship between the response variable and exposure may be difficult to characterize. When the relationship is nonlinear, linear modeling techniques do not capture the nonlinear information content. Statistical learning (SL) techniques with kernels are capable of addressing nonlinear problems without making parametric assumptions. However, these techniques do not produce findings relevant for epidemiologic interpretations. A simulated case-control study was used to contrast the information embedding characteristics and separation boundaries produced by a specific SL technique with logistic regression (LR) modeling representing a parametric approach. The SL technique was comprised of a kernel mapping in combination with a perceptron neural network. Because the LR model has an important epidemiologic interpretation, the SL method was modified to produce the analogous interpretation and generate odds ratios for comparison. RESULTS: The SL approach is capable of generating odds ratios for main effects and risk factor interactions that better capture nonlinear relationships between exposure variables and outcome in comparison with LR. CONCLUSIONS: The integration of SL methods in epidemiology may improve both the understanding and interpretation of complex exposure/disease relationships.


Assuntos
Inteligência Artificial , Estudos de Casos e Controles , Modelos Logísticos , Simulação por Computador , Modelos Biológicos , Redes Neurais de Computação , Razão de Chances
4.
BMC Syst Biol ; 5 Suppl 3: S13, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-22784619

RESUMO

BACKGROUND: The primary objectives of this paper are: 1.) to apply Statistical Learning Theory (SLT), specifically Partial Least Squares (PLS) and Kernelized PLS (K-PLS), to the universal "feature-rich/case-poor" (also known as "large p small n", or "high-dimension, low-sample size") microarray problem by eliminating those features (or probes) that do not contribute to the "best" chromosome bio-markers for lung cancer, and 2.) quantitatively measure and verify (by an independent means) the efficacy of this PLS process. A secondary objective is to integrate these significant improvements in diagnostic and prognostic biomedical applications into the clinical research arena. That is, to devise a framework for converting SLT results into direct, useful clinical information for patient care or pharmaceutical research. We, therefore, propose and preliminarily evaluate, a process whereby PLS, K-PLS, and Support Vector Machines (SVM) may be integrated with the accepted and well understood traditional biostatistical "gold standard", Cox Proportional Hazard model and Kaplan-Meier survival analysis methods. Specifically, this new combination will be illustrated with both PLS and Kaplan-Meier followed by PLS and Cox Hazard Ratios (CHR) and can be easily extended for both the K-PLS and SVM paradigms. Finally, these previously described processes are contained in the Fine Feature Selection (FFS) component of our overall feature reduction/evaluation process, which consists of the following components: 1.) coarse feature reduction, 2.) fine feature selection and 3.) classification (as described in this paper) and prediction. RESULTS: Our results for PLS and K-PLS showed that these techniques, as part of our overall feature reduction process, performed well on noisy microarray data. The best performance was a good 0.794 Area Under a Receiver Operating Characteristic (ROC) Curve (AUC) for classification of recurrence prior to or after 36 months and a strong 0.869 AUC for classification of recurrence prior to or after 60 months. Kaplan-Meier curves for the classification groups were clearly separated, with p-values below 4.5e-12 for both 36 and 60 months. CHRs were also good, with ratios of 2.846341 (36 months) and 3.996732 (60 months). CONCLUSIONS: SLT techniques such as PLS and K-PLS can effectively address difficult problems with analyzing biomedical data such as microarrays. The combinations with established biostatistical techniques demonstrated in this paper allow these methods to move from academic research and into clinical practice.


Assuntos
Biologia Computacional/métodos , Análise de Sequência com Séries de Oligonucleotídeos , Humanos , Estimativa de Kaplan-Meier , Análise dos Mínimos Quadrados , Neoplasias Pulmonares/genética , Modelos de Riscos Proporcionais , Medição de Risco , Máquina de Vetores de Suporte
5.
BMC Genomics ; 11 Suppl 3: S15, 2010 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-21143782

RESUMO

BACKGROUND: Significant interest exists in establishing radiologic imaging as a valid biomarker for assessing the response of cancer to a variety of treatments. To address this problem, we have chosen to study patients with metastatic colorectal carcinoma to learn whether statistical learning theory can improve the performance of radiologists using CT in predicting patient treatment response to therapy compared with the more traditional RECIST (Response Evaluation Criteria in Solid Tumors) standard. RESULTS: Predictions of survival after 8 months in 38 patients with metastatic colorectal carcinoma using the Support Vector Machine (SVM) technique improved 30% when using additional information compared to WHO (World Health Organization) or RECIST measurements alone. With both Logistic Regression (LR) and SVM, there was no significant difference in performance between WHO and RECIST. The SVM and LR techniques also demonstrated that one radiologist consistently outperformed another. CONCLUSIONS: This preliminary research study has demonstrated that SLT algorithms, properly used in a clinical setting, have the potential to address questions and criticisms associated with both RECIST and WHO scoring methods. We also propose that tumor heterogeneity, shape, etc. obtained from CT and/or MRI scans be added to the SLT feature vector for processing.


Assuntos
Carcinoma/diagnóstico por imagem , Carcinoma/secundário , Neoplasias Colorretais/diagnóstico por imagem , Tomografia Computadorizada por Raios X , Área Sob a Curva , Biomarcadores Tumorais , Carcinoma/tratamento farmacológico , Carcinoma/mortalidade , Neoplasias Colorretais/tratamento farmacológico , Neoplasias Colorretais/mortalidade , Neoplasias Colorretais/patologia , Humanos , Modelos Logísticos , Razão de Chances , Curva ROC , Software , Análise de Sobrevida
6.
Int J Comput Biol Drug Des ; 3(1): 15-8, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20693607

RESUMO

To establish radiologic imaging as a valid biomarker for assessing the response of cancer to different treatments. We study patients with metastatic colorectal carcinoma to learn whether Statistical Learning Theory (SLT) improves the performance of radiologists using Computer Tomography (CT) in predicting patient treatment response to therapy compared with traditional Response Evaluation Criteria in Solid Tumours (RECIST) standard. Preliminary research demonstrated that SLT algorithms can address questions and criticisms associated with both RECIST and World Health Organization (WHO) scoring methods. We add tumour heterogeneity, shape, etc., obtained from CT or MRI scans the feature vector for processing.


Assuntos
Neoplasias Colorretais/diagnóstico por imagem , Modelos Estatísticos , Tomografia Computadorizada por Raios X/métodos , Algoritmos , Neoplasias Colorretais/patologia , Neoplasias Colorretais/terapia , Humanos , Imageamento por Ressonância Magnética/métodos , Metástase Neoplásica , Valor Preditivo dos Testes , Resultado do Tratamento
7.
Int J Comput Biol Drug Des ; 2(1): 21-57, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-20054985

RESUMO

Parenchymal patterns defining the density of breast tissue are detected by advanced correlation pattern recognition in an integrated Computer-Aided Detection (CAD) and diagnosis system. Fractal signatures of density are modelled according to four clinical categories. A Support Vector Machine (SVM) in the primal formulation solves the multiclass problem using 'One-Versus-All' (OVA) and 'All-Versus-All' (AVA) decompositions, achieving 85% and 94% accuracy, respectively. Fully automated classification of breast density via a texture model derived from fractal dimension, dispersion, and lacunarity moves current qualitative methods forward to objective quantitative measures, amenable with the overarching vision of substantiating the role of density in epidemiological risk models of breast cancer.


Assuntos
Inteligência Artificial , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/patologia , Mama/patologia , Diagnóstico por Computador/métodos , Algoritmos , Biologia Computacional , Feminino , Fractais , Lógica Fuzzy , Humanos , Mamografia/estatística & dados numéricos , Modelos Estatísticos
8.
Int J Funct Inform Personal Med ; 1(2): 111-139, 2008 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26430470

RESUMO

The automated decision paradigms presented in this work address the false positive (FP) biopsy occurrence in diagnostic mammography. An EP/ES stochastic hybrid and two kernelized Partial Least Squares (K-PLS) paradigms were investigated with following studies: methodology performance comparisonsautomated diagnostic accuracy assessments with two data sets. The findings showed: the new hybrid produced comparable results more rapidlythe new K-PLS paradigms train and operate Essentially in real time for the data sets studied. Both advancements are essential components for eventually achieving the FP reduction goal, while maintaining acceptable diagnostic sensitivities.

9.
J Chem Inf Comput Sci ; 44(2): 499-507, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-15032529

RESUMO

The need for rapid and accurate detection systems is expanding and the utilization of cross-reactive sensor arrays to detect chemical warfare agents in conjunction with novel computational techniques may prove to be a potential solution to this challenge. We have investigated the detection, prediction, and classification of various organophosphate (OP) nerve agent simulants using sensor arrays with a novel learning scheme known as support vector machines (SVMs). The OPs tested include parathion, malathion, dichlorvos, trichlorfon, paraoxon, and diazinon. A new data reduction software program was written in MATLAB V. 6.1 to extract steady-state and kinetic data from the sensor arrays. The program also creates training sets by mixing and randomly sorting any combination of data categories into both positive and negative cases. The resulting signals were fed into SVM software for "pairwise" and "one" vs all classification. Experimental results for this new paradigm show a significant increase in classification accuracy when compared to artificial neural networks (ANNs). Three kernels, the S2000, the polynomial, and the Gaussian radial basis function (RBF), were tested and compared to the ANN. The following measures of performance were considered in the pairwise classification: receiver operating curve (ROC) Az indices, specificities, and positive predictive values (PPVs). The ROC Az) values, specifities, and PPVs increases ranged from 5% to 25%, 108% to 204%, and 13% to 54%, respectively, in all OP pairs studied when compared to the ANN baseline. Dichlorvos, trichlorfon, and paraoxon were perfectly predicted. Positive prediction for malathion was 95%.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...