Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
Group Decis Negot ; 31(4): 789-818, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35615756

RESUMO

Crowdsourcing and crowd voting systems are being increasingly used in societal, industry, and academic problems (labeling, recommendations, social choice, etc.) due to their possibility to exploit "wisdom of crowd" and obtain good quality solutions, and/or voter satisfaction, with high cost-efficiency. However, the decisions based on crowd vote aggregation do not guarantee high-quality results due to crowd voter data quality. Additionally, such decisions often do not satisfy the majority of voters due to data heterogeneity (multimodal or uniform vote distributions) and/or outliers, which cause traditional aggregation procedures (e.g., central tendency measures) to propose decisions with low voter satisfaction. In this research, we propose a system for the integration of crowd and expert knowledge in a crowdsourcing setting with limited resources. The system addresses the problem of sparse voting data by using machine learning models (matrix factorization and regression) for the estimation of crowd and expert votes/grades. The problem of vote aggregation under multimodal or uniform vote distributions is addressed by the inclusion of expert votes and aggregation of crowd and expert votes based on optimization and bargaining models (Kalai-Smorodinsky and Nash) usually used in game theory. Experimental evaluation on real world and artificial problems showed that the bargaining-based aggregation outperforms the traditional methods in terms of cumulative satisfaction of experts and crowd. Additionally, the machine learning models showed satisfactory predictive performance and enabled cost reduction in the process of vote collection.

2.
J Chromatogr A ; 1623: 461146, 2020 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-32505269

RESUMO

In micellar liquid chromatography (MLC), the addition of a surfactant to the mobile phase in excess is accompanied by an alteration of its solubilising capacity and a change in the stationary phase's properties. As an implication, the prediction of the analytes' retention in MLC mode becomes a challenging task. Mixed Quantitative Structure - Retention Relationships (QSRR) modelling represents a powerful tool for estimating the analytes' retention. This study compares 48 successfully developed mixed QSRR models with respect to their ability to predict retention of aripiprazole and its five impurities from molecular structures and factors that describe the Brij - acetonitrile system. The development of the models was based on an automatic combining of six attribute (feature) selection methods with eight predictive algorithms and the optimization of hyper-parameters. The feature selection methods included Principal Component Analysis (PCA), Non-negative Matrix Factorization (NMF), ReliefF, Multiple Linear Regression (MLR), Mutual Info and F-Regression. The series of investigated predictive algorithms comprised Linear Regressions (LR), Ridge Regression, Lasso Regression, Artificial Neural Networks (ANN), Support Vector Regression (SVR), Random Forest (RF), Gradient Boosted Trees (GBT) and K-Nearest neighbourhood (k-NN). A sufficient amount of data for building the model (78 cases in total) was provided by conducting 13 experiments for each of the 6 analytes and collecting the target responses afterwards. Different experimental settings were established by varying the values of the concentration of Brij L23, pH of the aqueous phase and acetonitrile content in the mobile phase according to the Box-Behnken design. In addition to the chromatographic parameters, the pool of independent variables was expanded by 27 molecular descriptors from all major groups (physicochemical, quantum chemical, topological and spatial structural descriptors). The best model was chosen by taking into consideration the Root Mean Square Error (RMSE) and cross-validation (CV) correlation coefficient (Q2) values. Interestingly, the comparative analysis indicated that a change in the set of input variables had a minor impact on the performance of the final models. On the other hand, different regression algorithms showed great diversity in the ability to learn patterns conserved in the data. In this regard, testing many regression algorithms is necessary in order to find the most suitable technique for model building. In the specific case, GBT-based models have demonstrated the best ability to predict the retention factor in the MLC mode. Steric factors and dipole-dipole interactions have proven to be relevant to the observed retention behaviour. This study, although being of a smaller scale, is a most promising starting point for comprehensive MLC retention prediction.


Assuntos
Algoritmos , Cromatografia Líquida/métodos , Micelas , Relação Quantitativa Estrutura-Atividade , Antipsicóticos/química , Automação , Bases de Dados como Assunto , Modelos Lineares , Reprodutibilidade dos Testes , Solventes/química
3.
Sci Rep ; 8(1): 10563, 2018 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-30002402

RESUMO

Intrinsically disordered proteins (IDPs) are characterized by the lack of a fixed tertiary structure and are involved in the regulation of key biological processes via binding to multiple protein partners. IDPs are malleable, adapting to structurally different partners, and this flexibility stems from features encoded in the primary structure. The assumption that universal sequence information will facilitate coverage of the sparse zones of the human interactome motivated us to explore the possibility of predicting protein-protein interactions (PPIs) that involve IDPs based on sequence characteristics. We developed a method that relies on features of the interacting and non-interacting protein pairs and utilizes machine learning to classify and predict IDP PPIs. Consideration of both sequence determinants specific for conformational organizations and the multiplicity of IDP interactions in the training phase ensured a reliable approach that is superior to current state-of-the-art methods. By applying a strict evaluation procedure, we confirm that our method predicts interactions of the IDP of interest even on the proteome-scale. This service is provided as a web tool to expedite the discovery of new interactions and IDP functions with enhanced efficiency.


Assuntos
Proteínas Intrinsicamente Desordenadas/metabolismo , Mapeamento de Interação de Proteínas/métodos , Proteoma/metabolismo , Sequência de Aminoácidos/fisiologia , Biologia Computacional , Conjuntos de Dados como Assunto , Humanos , Células MCF-7 , Aprendizado de Máquina , Modelos Moleculares , Anotação de Sequência Molecular , Ligação Proteica/fisiologia , Mapas de Interação de Proteínas/fisiologia
5.
Artif Intell Med ; 72: 12-21, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-27664505

RESUMO

OBJECTIVES: Quantification and early identification of unplanned readmission risk have the potential to improve the quality of care during hospitalization and after discharge. However, high dimensionality, sparsity, and class imbalance of electronic health data and the complexity of risk quantification, challenge the development of accurate predictive models. Predictive models require a certain level of interpretability in order to be applicable in real settings and create actionable insights. This paper aims to develop accurate and interpretable predictive models for readmission in a general pediatric patient population, by integrating a data-driven model (sparse logistic regression) and domain knowledge based on the international classification of diseases 9th-revision clinical modification (ICD-9-CM) hierarchy of diseases. Additionally, we propose a way to quantify the interpretability of a model and inspect the stability of alternative solutions. MATERIALS AND METHODS: The analysis was conducted on >66,000 pediatric hospital discharge records from California, State Inpatient Databases, Healthcare Cost and Utilization Project between 2009 and 2011. We incorporated domain knowledge based on the ICD-9-CM hierarchy in a data driven, Tree-Lasso regularized logistic regression model, providing the framework for model interpretation. This approach was compared with traditional Lasso logistic regression resulting in models that are easier to interpret by fewer high-level diagnoses, with comparable prediction accuracy. RESULTS: The results revealed that the use of a Tree-Lasso model was as competitive in terms of accuracy (measured by area under the receiver operating characteristic curve-AUC) as the traditional Lasso logistic regression, but integration with the ICD-9-CM hierarchy of diseases provided more interpretable models in terms of high-level diagnoses. Additionally, interpretations of models are in accordance with existing medical understanding of pediatric readmission. Best performing models have similar performances reaching AUC values 0.783 and 0.779 for traditional Lasso and Tree-Lasso, respectfully. However, information loss of Lasso models is 0.35 bits higher compared to Tree-Lasso model. CONCLUSIONS: We propose a method for building predictive models applicable for the detection of readmission risk based on Electronic Health records. Integration of domain knowledge (in the form of ICD-9-CM taxonomy) and a data-driven, sparse predictive algorithm (Tree-Lasso Logistic Regression) resulted in an increase of interpretability of the resulting model. The models are interpreted for the readmission prediction problem in general pediatric population in California, as well as several important subpopulations, and the interpretations of models comply with existing medical understanding of pediatric readmission. Finally, quantitative assessment of the interpretability of the models is given, that is beyond simple counts of selected low-level features.


Assuntos
Modelos Logísticos , Readmissão do Paciente , Curva ROC , Criança , Registros Eletrônicos de Saúde , Humanos , Pediatria/estatística & dados numéricos , Risco
6.
J Med Internet Res ; 18(7): e185, 2016 07 06.
Artigo em Inglês | MEDLINE | ID: mdl-27383622

RESUMO

Despite the accelerating pace of scientific discovery, the current clinical research enterprise does not sufficiently address pressing clinical questions. Given the constraints on clinical trials, for a majority of clinical questions, the only relevant data available to aid in decision making are based on observation and experience. Our purpose here is 3-fold. First, we describe the classic context of medical research guided by Poppers' scientific epistemology of "falsificationism." Second, we discuss challenges and shortcomings of randomized controlled trials and present the potential of observational studies based on big data. Third, we cover several obstacles related to the use of observational (retrospective) data in clinical studies. We conclude that randomized controlled trials are not at risk for extinction, but innovations in statistics, machine learning, and big data analytics may generate a completely new ecosystem for exploration and validation.


Assuntos
Pesquisa Biomédica/métodos , Pesquisa Biomédica/normas , Mineração de Dados/métodos , Estudos Observacionais como Assunto/métodos , Ensaios Clínicos Controlados Aleatórios como Assunto/métodos , Tomada de Decisões , Humanos , Inteligência
7.
Medicine (Baltimore) ; 95(28): e4188, 2016 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-27428217

RESUMO

Platelet function can be quantitatively assessed by specific assays such as light-transmission aggregometry, multiple-electrode aggregometry measuring the response to adenosine diphosphate (ADP), arachidonic acid, collagen, and thrombin-receptor activating peptide and viscoelastic tests such as rotational thromboelastometry (ROTEM).The task of extracting meaningful statistical and clinical information from high-dimensional data spaces in temporal multivariate clinical data represented in multivariate time series is complex. Building insightful visualizations for multivariate time series demands adequate usage of normalization techniques.In this article, various methods for data normalization (z-transformation, range transformation, proportion transformation, and interquartile range) are presented and visualized discussing the most suited approach for platelet function data series.Normalization was calculated per assay (test) for all time points and per time point for all tests.Interquartile range, range transformation, and z-transformation demonstrated the correlation as calculated by the Spearman correlation test, when normalized per assay (test) for all time points. When normalizing per time point for all tests, no correlation could be abstracted from the charts as was the case when using all data as 1 dataset for normalization.


Assuntos
Interpretação Estatística de Dados , Testes de Função Plaquetária/métodos , Ponte de Artéria Coronária , Humanos , Estudos Longitudinais , Valor Preditivo dos Testes , Cuidados Pré-Operatórios
8.
PLoS One ; 11(1): e0145791, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26731286

RESUMO

With the accumulation of large amounts of health related data, predictive analytics could stimulate the transformation of reactive medicine towards Predictive, Preventive and Personalized (PPPM) Medicine, ultimately affecting both cost and quality of care. However, high-dimensionality and high-complexity of the data involved, prevents data-driven methods from easy translation into clinically relevant models. Additionally, the application of cutting edge predictive methods and data manipulation require substantial programming skills, limiting its direct exploitation by medical domain experts. This leaves a gap between potential and actual data usage. In this study, the authors address this problem by focusing on open, visual environments, suited to be applied by the medical community. Moreover, we review code free applications of big data technologies. As a showcase, a framework was developed for the meaningful use of data from critical care patients by integrating the MIMIC-II database in a data mining environment (RapidMiner) supporting scalable predictive analytics using visual tools (RapidMiner's Radoop extension). Guided by the CRoss-Industry Standard Process for Data Mining (CRISP-DM), the ETL process (Extract, Transform, Load) was initiated by retrieving data from the MIMIC-II tables of interest. As use case, correlation of platelet count and ICU survival was quantitatively assessed. Using visual tools for ETL on Hadoop and predictive modeling in RapidMiner, we developed robust processes for automatic building, parameter optimization and evaluation of various predictive models, under different feature selection schemes. Because these processes can be easily adopted in other projects, this environment is attractive for scalable predictive analytics in health research.


Assuntos
Cuidados Críticos/estatística & dados numéricos , Estado Terminal/terapia , Mineração de Dados/estatística & dados numéricos , Armazenamento e Recuperação da Informação/estatística & dados numéricos , Algoritmos , Mineração de Dados/métodos , Bases de Dados Factuais/estatística & dados numéricos , Humanos , Armazenamento e Recuperação da Informação/métodos , Unidades de Terapia Intensiva/estatística & dados numéricos , Modelos Teóricos , Linguagens de Programação , Reprodutibilidade dos Testes
9.
ScientificWorldJournal ; 2014: 859279, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24892101

RESUMO

Rapid growth and storage of biomedical data enabled many opportunities for predictive modeling and improvement of healthcare processes. On the other side analysis of such large amounts of data is a difficult and computationally intensive task for most existing data mining algorithms. This problem is addressed by proposing a cloud based system that integrates metalearning framework for ranking and selection of best predictive algorithms for data at hand and open source big data technologies for analysis of biomedical data.


Assuntos
Atenção à Saúde/organização & administração , Armazenamento e Recuperação da Informação , Modelos Teóricos , Aprendizagem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA