Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
PLoS Comput Biol ; 17(9): e1009439, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34550974

RESUMO

Recent neuroscience studies demonstrate that a deeper understanding of brain function requires a deeper understanding of behavior. Detailed behavioral measurements are now often collected using video cameras, resulting in an increased need for computer vision algorithms that extract useful information from video data. Here we introduce a new video analysis tool that combines the output of supervised pose estimation algorithms (e.g. DeepLabCut) with unsupervised dimensionality reduction methods to produce interpretable, low-dimensional representations of behavioral videos that extract more information than pose estimates alone. We demonstrate this tool by extracting interpretable behavioral features from videos of three different head-fixed mouse preparations, as well as a freely moving mouse in an open field arena, and show how these interpretable features can facilitate downstream behavioral and neural analyses. We also show how the behavioral features produced by our model improve the precision and interpretation of these downstream analyses compared to using the outputs of either fully supervised or fully unsupervised methods alone.


Assuntos
Algoritmos , Inteligência Artificial/estatística & dados numéricos , Comportamento Animal , Gravação em Vídeo , Animais , Biologia Computacional , Simulação por Computador , Cadeias de Markov , Camundongos , Modelos Estatísticos , Redes Neurais de Computação , Aprendizado de Máquina Supervisionado/estatística & dados numéricos , Aprendizado de Máquina não Supervisionado/estatística & dados numéricos , Gravação em Vídeo/estatística & dados numéricos
2.
PLoS Genet ; 14(4): e1007341, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-29684059

RESUMO

Hybridization and gene flow between species appears to be common. Even though it is clear that hybridization is widespread across all surveyed taxonomic groups, the magnitude and consequences of introgression are still largely unknown. Thus it is crucial to develop the statistical machinery required to uncover which genomic regions have recently acquired haplotypes via introgression from a sister population. We developed a novel machine learning framework, called FILET (Finding Introgressed Loci via Extra-Trees) capable of revealing genomic introgression with far greater power than competing methods. FILET works by combining information from a number of population genetic summary statistics, including several new statistics that we introduce, that capture patterns of variation across two populations. We show that FILET is able to identify loci that have experienced gene flow between related species with high accuracy, and in most situations can correctly infer which population was the donor and which was the recipient. Here we describe a data set of outbred diploid Drosophila sechellia genomes, and combine them with data from D. simulans to examine recent introgression between these species using FILET. Although we find that these populations may have split more recently than previously appreciated, FILET confirms that there has indeed been appreciable recent introgression (some of which might have been adaptive) between these species, and reveals that this gene flow is primarily in the direction of D. simulans to D. sechellia.


Assuntos
Drosophila simulans/genética , Drosophila/genética , Genoma de Inseto , Aprendizado de Máquina Supervisionado , Animais , Simulação por Computador , Drosophila/classificação , Drosophila simulans/classificação , Evolução Molecular , Fluxo Gênico , Especiação Genética , Variação Genética , Genética Populacional , Haplótipos , Hibridização Genética , Modelos Genéticos , Software , Especificidade da Espécie , Aprendizado de Máquina Supervisionado/estatística & dados numéricos
3.
Ann Diagn Pathol ; 47: 151518, 2020 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-32531442

RESUMO

Accurate detection and quantification of hepatic fibrosis remain essential for assessing the severity of non-alcoholic fatty liver disease (NAFLD) and its response to therapy in clinical practice and research studies. Our aim was to develop an integrated artificial intelligence-based automated tool to detect and quantify hepatic fibrosis and assess its architectural pattern in NAFLD liver biopsies. Digital images of the trichrome-stained slides of liver biopsies from patients with NAFLD and different severity of fibrosis were used. Two expert liver pathologists semi-quantitatively assessed the severity of fibrosis in these biopsies and using a web applet provided a total of 987 annotations of different fibrosis types for developing, training and testing supervised machine learning models to detect fibrosis. The collagen proportionate area (CPA) was measured and correlated with each of the pathologists semi-quantitative fibrosis scores. Models were created and tested to detect each of six potential fibrosis patterns. There was good to excellent correlation between CPA and the pathologist score of fibrosis stage. The coefficient of determination (R2) of automated CPA with the pathologist stages ranged from 0.60 to 0.86. There was considerable overlap in the calculated CPA across different fibrosis stages. For identification of fibrosis patterns, the models areas under the receiver operator curve were 78.6% for detection of periportal fibrosis, 83.3% for pericellular fibrosis, 86.4% for portal fibrosis and >90% for detection of normal fibrosis, bridging fibrosis, and presence of nodule/cirrhosis. In conclusion, an integrated automated tool could accurately quantify hepatic fibrosis and determine its architectural patterns in NAFLD liver biopsies.


Assuntos
Inteligência Artificial/estatística & dados numéricos , Colágeno/análise , Cirrose Hepática/patologia , Hepatopatia Gordurosa não Alcoólica/patologia , Automação/métodos , Compostos Azo/metabolismo , Biópsia , Ensaios Clínicos como Assunto , Colágeno/metabolismo , Amarelo de Eosina-(YS)/metabolismo , Fibrose/classificação , Fibrose/patologia , Humanos , Processamento de Imagem Assistida por Computador/métodos , Fígado/patologia , Verde de Metila/metabolismo , Escores de Disfunção Orgânica , Patologistas/estatística & dados numéricos , Veia Porta/fisiopatologia , Padrões de Prática Médica/normas , Índice de Gravidade de Doença , Aprendizado de Máquina Supervisionado/estatística & dados numéricos
4.
Molecules ; 25(10)2020 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-32466318

RESUMO

In the last decade essential oils have attracted scientists with a constant increase rate of more than 7% as witnessed by almost 5000 articles. Among the prominent studies essential oils are investigated as antibacterial agents alone or in combination with known drugs. Minor studies involved essential oil inspection as potential anticancer and antiviral natural remedies. In line with the authors previous reports the investigation of an in-house library of extracted essential oils as a potential blocker of HSV-1 infection is reported herein. A subset of essential oils was experimentally tested in an in vitro model of HSV-1 infection and the determined IC50s and CC50s values were used in conjunction with the results obtained by gas-chromatography/mass spectrometry chemical analysis to derive machine learning based classification models trained with the partial least square discriminant analysis algorithm. The internally validated models were thus applied on untested essential oils to assess their effective predictive ability in selecting both active and low toxic samples. Five essential oils were selected among a list of 52 and readily assayed for IC50 and CC50 determination. Interestingly, four out of the five selected samples, compared with the potencies of the training set, returned to be highly active and endowed with low toxicity. In particular, sample CJM1 from Calaminta nepeta was the most potent tested essential oil with the highest selectivity index (IC50 = 0.063 mg/mL, SI > 47.5). In conclusion, it was herein demonstrated how multidisciplinary applications involving machine learning could represent a valuable tool in predicting the bioactivity of complex mixtures and in the near future to enable the design of blended essential oil possibly endowed with higher potency and lower toxicity.


Assuntos
Antivirais/farmacologia , Herpesvirus Humano 1/efeitos dos fármacos , Lamiales/química , Óleos Voláteis/farmacologia , Óleos de Plantas/farmacologia , Aprendizado de Máquina Supervisionado/estatística & dados numéricos , Animais , Antivirais/isolamento & purificação , Chlorocebus aethiops , Cromatografia Gasosa-Espectrometria de Massas , Herpesvirus Humano 1/crescimento & desenvolvimento , Humanos , Testes de Sensibilidade Microbiana , Óleos Voláteis/isolamento & purificação , Óleos de Plantas/isolamento & purificação , Relação Estrutura-Atividade , Células Vero
5.
Sensors (Basel) ; 19(23)2019 Nov 25.
Artigo em Inglês | MEDLINE | ID: mdl-31775243

RESUMO

This study presents incremental learning based methods to personalize human activity recognition models. Initially, a user-independent model is used in the recognition process. When a new user starts to use the human activity recognition application, personal streaming data can be gathered. Of course, this data does not have labels. However, there are three different ways to obtain this data: non-supervised, semi-supervised, and supervised. The non-supervised approach relies purely on predicted labels, the supervised approach uses only human intelligence to label the data, and the proposed method for semi-supervised learning is a combination of these two: It uses artificial intelligence (AI) in most cases to label the data but in uncertain cases it relies on human intelligence. After labels are obtained, the personalization process continues by using the streaming data and these labels to update the incremental learning based model, which in this case is Learn++. Learn++ is an ensemble method that can use any classifier as a base classifier, and this study compares three base classifiers: linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and classification and regression tree (CART). Moreover, three datasets are used in the experiment to show how well the presented method generalizes on different datasets. The results show that personalized models are much more accurate than user-independent models. On average, the recognition rates are: 87.0% using the user-independent model, 89.1% using the non-supervised personalization approach, 94.0% using the semi-supervised personalization approach, and 96.5% using the supervised personalization approach. This means that by relying on predicted labels with high confidence, and asking the user to label only uncertain observations (6.6% of the observations when using LDA, 7.7% when using QDA, and 18.3% using CART), almost as low error rates can be achieved as by using the supervised approach, in which labeling is fully based on human intelligence.


Assuntos
Atividades Humanas/estatística & dados numéricos , Aprendizado de Máquina Supervisionado/estatística & dados numéricos , Algoritmos , Inteligência Artificial/estatística & dados numéricos , Análise Discriminante , Humanos
6.
Brief Bioinform ; 16(2): 325-37, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-24723570

RESUMO

A number of supervised machine learning models have recently been introduced for the prediction of drug-target interactions based on chemical structure and genomic sequence information. Although these models could offer improved means for many network pharmacology applications, such as repositioning of drugs for new therapeutic uses, the prediction models are often being constructed and evaluated under overly simplified settings that do not reflect the real-life problem in practical applications. Using quantitative drug-target bioactivity assays for kinase inhibitors, as well as a popular benchmarking data set of binary drug-target interactions for enzyme, ion channel, nuclear receptor and G protein-coupled receptor targets, we illustrate here the effects of four factors that may lead to dramatic differences in the prediction results: (i) problem formulation (standard binary classification or more realistic regression formulation), (ii) evaluation data set (drug and target families in the application use case), (iii) evaluation procedure (simple or nested cross-validation) and (iv) experimental setting (whether training and test sets share common drugs and targets, only drugs or targets or neither). Each of these factors should be taken into consideration to avoid reporting overoptimistic drug-target interaction prediction results. We also suggest guidelines on how to make the supervised drug-target interaction prediction studies more realistic in terms of such model formulations and evaluation setups that better address the inherent complexity of the prediction task in the practical applications, as well as novel benchmarking data sets that capture the continuous nature of the drug-target interactions for kinase inhibitors.


Assuntos
Descoberta de Drogas/estatística & dados numéricos , Biologia Computacional , Bases de Dados de Produtos Farmacêuticos/estatística & dados numéricos , Humanos , Modelos Biológicos , Modelos Estatísticos , Relação Quantitativa Estrutura-Atividade , Aprendizado de Máquina Supervisionado/estatística & dados numéricos
7.
J Proteome Res ; 15(8): 2455-65, 2016 08 05.
Artigo em Inglês | MEDLINE | ID: mdl-27312948

RESUMO

Ovarian cancer is the deadliest gynecologic malignancy in the United States with most patients diagnosed in the advanced stage of the disease. Platinum-based antineoplastic therapeutics is indispensable to treating advanced ovarian serous carcinoma. However, patients have heterogeneous responses to platinum drugs, and it is difficult to predict these interindividual differences before administering medication. In this study, we investigated the tumor proteomic profiles and clinical characteristics of 130 ovarian serous carcinoma patients analyzed by the Clinical Proteomic Tumor Analysis Consortium (CPTAC), predicted the platinum drug response using supervised machine learning methods, and evaluated our prediction models through leave-one-out cross-validation. Our data-driven feature selection approach indicated that tumor proteomics profiles contain information for predicting binarized platinum response (P < 0.0001). We further built a least absolute shrinkage and selection operator (LASSO)-Cox proportional hazards model that stratified patients into early relapse and late relapse groups (P = 0.00013). The top proteomic features indicative of platinum response were involved in ATP synthesis pathways and Ran GTPase binding. Overall, we demonstrated that proteomic profiles of ovarian serous carcinoma patients predicted platinum drug responses as well as provided insights into the biological processes influencing the efficacy of platinum-based therapeutics. Our analytical approach is also extensible to predicting response to other antineoplastic agents or treatment modalities for both ovarian and other cancers.


Assuntos
Proteínas de Neoplasias/análise , Neoplasias Ovarianas/tratamento farmacológico , Compostos de Platina/uso terapêutico , Medicina de Precisão/métodos , Idoso , Antineoplásicos/farmacologia , Antineoplásicos/uso terapêutico , Cisplatino , Cistadenocarcinoma Seroso/química , Cistadenocarcinoma Seroso/tratamento farmacológico , Interpretação Estatística de Dados , Feminino , Humanos , Pessoa de Meia-Idade , Compostos de Platina/farmacologia , Valor Preditivo dos Testes , Proteômica , Aprendizado de Máquina Supervisionado/estatística & dados numéricos , Resultado do Tratamento
8.
Int J Neural Syst ; 32(9): 2250043, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-35912583

RESUMO

A practical problem in supervised deep learning for medical image segmentation is the lack of labeled data which is expensive and time-consuming to acquire. In contrast, there is a considerable amount of unlabeled data available in the clinic. To make better use of the unlabeled data and improve the generalization on limited labeled data, in this paper, a novel semi-supervised segmentation method via multi-task curriculum learning is presented. Here, curriculum learning means that when training the network, simpler knowledge is preferentially learned to assist the learning of more difficult knowledge. Concretely, our framework consists of a main segmentation task and two auxiliary tasks, i.e. the feature regression task and target detection task. The two auxiliary tasks predict some relatively simpler image-level attributes and bounding boxes as the pseudo labels for the main segmentation task, enforcing the pixel-level segmentation result to match the distribution of these pseudo labels. In addition, to solve the problem of class imbalance in the images, a bounding-box-based attention (BBA) module is embedded, enabling the segmentation network to concern more about the target region rather than the background. Furthermore, to alleviate the adverse effects caused by the possible deviation of pseudo labels, error tolerance mechanisms are also adopted in the auxiliary tasks, including inequality constraint and bounding-box amplification. Our method is validated on ACDC2017 and PROMISE12 datasets. Experimental results demonstrate that compared with the full supervision method and state-of-the-art semi-supervised methods, our method yields a much better segmentation performance on a small labeled dataset. Code is available at https://github.com/DeepMedLab/MTCL.


Assuntos
Currículo , Aprendizado de Máquina Supervisionado , Curadoria de Dados/métodos , Curadoria de Dados/normas , Conjuntos de Dados como Assunto/normas , Conjuntos de Dados como Assunto/provisão & distribuição , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina Supervisionado/classificação , Aprendizado de Máquina Supervisionado/estatística & dados numéricos , Aprendizado de Máquina Supervisionado/tendências
9.
Addict Behav ; 101: 106132, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31704370

RESUMO

Multiplayer Online Battle Arena (MOBA) has become one of the most popular genre of online video games played by gamers worldwide. Previous studies have exhibited that excessive engagement in games can lead to Internet Gaming Disorder (IGD). Internet Gaming Disorder has been associated with psychological disorders like impulsivity, anxiety and Attention Deficit Hyperactivity Disorder (ADHD). In this study, we propose an approach to use the game and player statistics along with self-esteem measure of a PlayerUnknown's Battlegrounds (PUBG, a MOBA game) player to predict whether he/she suffers from IGD and psychological disorders namely ADHD and Generalized Anxiety Disorder (GAD). We extract the game and player statistics of PUBG players from Asian countries and then run several state of the art supervised machine learning models to predict the occurrence of IGD, ADHD, and GAD. Initial experiments and results show that we are able to predict IGD, ADHD, and GAD with an accuracy of 93.18%, 81.81% and 84.9% respectively. Game statistics of PUBG players show strong positive correlation with IGD and ADHD indicating detrimental effects of MOBA games.


Assuntos
Transtorno de Adição à Internet/diagnóstico , Transtorno de Adição à Internet/epidemiologia , Autoimagem , Aprendizado de Máquina Supervisionado/estatística & dados numéricos , Inquéritos e Questionários/estatística & dados numéricos , Adolescente , Adulto , Ásia/epidemiologia , Feminino , Humanos , Transtorno de Adição à Internet/psicologia , Masculino , Reprodutibilidade dos Testes , Adulto Jovem
10.
PLoS One ; 14(9): e0220624, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31498787

RESUMO

Due to the fast speed of data generation and collection from advanced equipment, the amount of data obviously overflows the limit of available memory space and causes difficulties achieving high learning accuracy. Several methods based on discard-after-learn concept have been proposed. Some methods were designed to cope with a single incoming datum but some were designed for a chunk of incoming data. Although the results of these approaches are rather impressive, most of them are based on temporally adding more neurons to learn new incoming data without any neuron merging process which can obviously increase the computational time and space complexities. Only online versatile elliptic basis function (VEBF) introduced neuron merging to reduce the space-time complexity of learning only a single incoming datum. This paper proposed a method for further enhancing the capability of discard-after-learn concept for streaming data-chunk environment in terms of low computational time and neural space complexities. A set of recursive functions for computing the relevant parameters of a new neuron, based on statistical confidence interval, was introduced. The newly proposed method, named streaming chunk incremental learning (SCIL), increases the plasticity and the adaptabilty of the network structure according to the distribution of incoming data and their classes. When being compared to the others in incremental-like manner, based on 11 benchmarked data sets of 150 to 581,012 samples with attributes ranging from 4 to 1,558 formed as streaming data, the proposed SCIL gave better accuracy and time in most data sets.


Assuntos
Redes Neurais de Computação , Aprendizado de Máquina Supervisionado/estatística & dados numéricos , Análise por Conglomerados , Conjuntos de Dados como Assunto , Humanos
11.
PLoS One ; 14(7): e0219247, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31295300

RESUMO

Simulator imperfection, often known as model error, is ubiquitous in practical data assimilation problems. Despite the enormous efforts dedicated to addressing this problem, properly handling simulator imperfection in data assimilation remains to be a challenging task. In this work, we propose an approach to dealing with simulator imperfection from a point of view of functional approximation that can be implemented through a certain machine learning method, such as kernel-based learning adopted in the current work. To this end, we start from considering a class of supervised learning problems, and then identify similarities between supervised learning and variational data assimilation. These similarities found the basis for us to develop an ensemble-based learning framework to tackle supervised learning problems, while achieving various advantages of ensemble-based methods over the variational ones. After establishing the ensemble-based learning framework, we proceed to investigate the integration of ensemble-based learning into an ensemble-based data assimilation framework to handle simulator imperfection. In the course of our investigations, we also develop a strategy to tackle the issue of multi-modality in supervised-learning problems, and transfer this strategy to data assimilation problems to help improve assimilation performance. For demonstration, we apply the ensemble-based learning framework and the integrated, ensemble-based data assimilation framework to a supervised learning problem and a data assimilation problem with an imperfect forward simulator, respectively. The experiment results indicate that both frameworks achieve good performance in relevant case studies, and that functional approximation through machine learning may serve as a viable way to account for simulator imperfection in data assimilation problems.


Assuntos
Bases de Dados Factuais/estatística & dados numéricos , Aprendizado de Máquina/estatística & dados numéricos , Aprendizado de Máquina Supervisionado/estatística & dados numéricos , Algoritmos , Humanos
12.
PLoS One ; 14(1): e0210267, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30650109

RESUMO

There is a critical need for fast, inexpensive, objective, and accurate screening tools for childhood psychopathology. Perhaps most compelling is in the case of internalizing disorders, like anxiety and depression, where unobservable symptoms cause children to go unassessed-suffering in silence because they never exhibiting the disruptive behaviors that would lead to a referral for diagnostic assessment. If left untreated these disorders are associated with long-term negative outcomes including substance abuse and increased risk for suicide. This paper presents a new approach for identifying children with internalizing disorders using an instrumented 90-second mood induction task. Participant motion during the task is monitored using a commercially available wearable sensor. We show that machine learning can be used to differentiate children with an internalizing diagnosis from controls with 81% accuracy (67% sensitivity, 88% specificity). We provide a detailed description of the modeling methodology used to arrive at these results and explore further the predictive ability of each temporal phase of the mood induction task. Kinematical measures most discriminative of internalizing diagnosis are analyzed in detail, showing affected children exhibit significantly more avoidance of ambiguous threat. Performance of the proposed approach is compared to clinical thresholds on parent-reported child symptoms which differentiate children with an internalizing diagnosis from controls with slightly lower accuracy (.68-.75 vs. .81), slightly higher specificity (.88-1.00 vs. .88), and lower sensitivity (.00-.42 vs. .67) than the proposed, instrumented method. These results point toward the future use of this approach for screening children for internalizing disorders so that interventions can be deployed when they have the highest chance for long-term success.


Assuntos
Ansiedade/diagnóstico , Depressão/diagnóstico , Aprendizado de Máquina Supervisionado , Dispositivos Eletrônicos Vestíveis , Afeto , Criança , Pré-Escolar , Feminino , Humanos , Modelos Logísticos , Masculino , Modelos Estatísticos , Psicologia da Criança , Psicopatologia , Curva ROC , Sensibilidade e Especificidade , Processamento de Sinais Assistido por Computador , Aprendizado de Máquina Supervisionado/estatística & dados numéricos , Dispositivos Eletrônicos Vestíveis/estatística & dados numéricos
13.
Pac Symp Biocomput ; 24: 136-147, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30864317

RESUMO

Cancer is a complex collection of diseases that are to some degree unique to each patient. Precision oncology aims to identify the best drug treatment regime using molecular data on tumor samples. While omics-level data is becoming more widely available for tumor specimens, the datasets upon which computational learning methods can be trained vary in coverage from sample to sample and from data type to data type. Methods that can 'connect the dots' to leverage more of the information provided by these studies could offer major advantages for maximizing predictive potential. We introduce a multi-view machinelearning strategy called PLATYPUS that builds 'views' from multiple data sources that are all used as features for predicting patient outcomes. We show that a learning strategy that finds agreement across the views on unlabeled data increases the performance of the learning methods over any single view. We illustrate the power of the approach by deriving signatures for drug sensitivity in a large cancer cell line database. Code and additional information are available from the PLATYPUS website https://sysbiowiki.soe.ucsc.edu/platypus.


Assuntos
Resistencia a Medicamentos Antineoplásicos , Aprendizado de Máquina , Neoplasias/tratamento farmacológico , Antineoplásicos/uso terapêutico , Linhagem Celular Tumoral , Biologia Computacional/métodos , Bases de Dados Factuais , Resistencia a Medicamentos Antineoplásicos/genética , Humanos , Armazenamento e Recuperação da Informação , Aprendizado de Máquina/estatística & dados numéricos , Neoplasias/genética , Modelagem Computacional Específica para o Paciente , Variantes Farmacogenômicos , Medicina de Precisão , Software , Aprendizado de Máquina Supervisionado/estatística & dados numéricos
14.
Pac Symp Biocomput ; 23: 192-203, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29218881

RESUMO

As the bioinformatics field grows, it must keep pace not only with new data but with new algorithms. Here we contribute a thorough analysis of 13 state-of-the-art, commonly used machine learning algorithms on a set of 165 publicly available classification problems in order to provide data-driven algorithm recommendations to current researchers. We present a number of statistical and visual comparisons of algorithm performance and quantify the effect of model selection and algorithm tuning for each algorithm and dataset. The analysis culminates in the recommendation of five algorithms with hyperparameters that maximize classifier performance across the tested problems, as well as general guidelines for applying machine learning to supervised classification problems.


Assuntos
Algoritmos , Biologia Computacional/métodos , Aprendizado de Máquina Supervisionado/estatística & dados numéricos , Bases de Dados Factuais/estatística & dados numéricos , Humanos , Modelos Lineares , Dinâmica não Linear
15.
Pac Symp Biocomput ; 23: 204-215, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29218882

RESUMO

Machine Learning (ML) methods are now influencing major decisions about patient care, new medical methods, drug development and their use and importance are rapidly increasing in all areas. However, these ML methods are inherently complex and often difficult to understand and explain resulting in barriers to their adoption and validation. Our work (RFEX) focuses on enhancing Random Forest (RF) classifier explainability by developing easy to interpret explainability summary reports from trained RF classifiers as a way to improve the explainability for (often non-expert) users. RFEX is implemented and extensively tested on Stanford FEATURE data where RF is tasked with predicting functional sites in 3D molecules based on their electrochemical signatures (features). In developing RFEX method we apply user-centered approach driven by explainability questions and requirements collected by discussions with interested practitioners. We performed formal usability testing with 13 expert and non-expert users to verify RFEX usefulness. Analysis of RFEX explainability report and user feedback indicates its usefulness in significantly increasing explainability and user confidence in RF classification on FEATURE data. Notably, RFEX summary reports easily reveal that one needs very few (from 2-6 depending on a model) top ranked features to achieve 90% or better of the accuracy when all 480 features are used.


Assuntos
Aprendizado de Máquina Supervisionado/estatística & dados numéricos , Interface Usuário-Computador , Algoritmos , Classificação/métodos , Biologia Computacional/métodos , Bases de Dados Factuais/estatística & dados numéricos , Humanos , Modelos Estatísticos
16.
IEEE Trans Pattern Anal Mach Intell ; 40(8): 1845-1859, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-28809674

RESUMO

We propose a deep convolutional neural network (CNN) for face detection leveraging on facial attributes based supervision. We observe a phenomenon that part detectors emerge within CNN trained to classify attributes from uncropped face images, without any explicit part supervision. The observation motivates a new method for finding faces through scoring facial parts responses by their spatial structure and arrangement. The scoring mechanism is data-driven, and carefully formulated considering challenging cases where faces are only partially visible. This consideration allows our network to detect faces under severe occlusion and unconstrained pose variations. Our method achieves promising performance on popular benchmarks including FDDB, PASCAL Faces, AFW, and WIDER FACE.


Assuntos
Aprendizado Profundo , Face , Redes Neurais de Computação , Reconhecimento Automatizado de Padrão/métodos , Inteligência Artificial/estatística & dados numéricos , Bases de Dados Factuais , Aprendizado Profundo/estatística & dados numéricos , Face/anatomia & histologia , Humanos , Reconhecimento Automatizado de Padrão/estatística & dados numéricos , Aprendizado de Máquina Supervisionado/estatística & dados numéricos
17.
IEEE Trans Pattern Anal Mach Intell ; 40(8): 1829-1844, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-28841549

RESUMO

We address the problem of video classification for facial analysis and human action recognition. We propose a novel weakly supervised learning method that models the video as a sequence of automatically mined, discriminative sub-events (e.g., onset and offset phase for "smile", running and jumping for "highjump"). The proposed model is inspired by the recent works on Multiple Instance Learning and latent SVM/HCRF - it extends such frameworks to model the ordinal aspect in the videos, approximately. We obtain consistent improvements over relevant competitive baselines on four challenging and publicly available video based facial analysis datasets for prediction of expression, clinical pain and intent in dyadic conversations, and on three challenging human action datasets. We also validate the method with qualitative results and show that they largely support the intuitions behind the method.


Assuntos
Algoritmos , Inteligência Artificial , Reconhecimento Automatizado de Padrão/métodos , Aprendizado de Máquina Supervisionado , Inteligência Artificial/estatística & dados numéricos , Gráficos por Computador , Bases de Dados Factuais , Expressão Facial , Humanos , Modelos Estatísticos , Movimento , Reconhecimento Automatizado de Padrão/estatística & dados numéricos , Corrida , Processos Estocásticos , Aprendizado de Máquina Supervisionado/estatística & dados numéricos , Gravação em Vídeo
18.
Pac Symp Biocomput ; 23: 123-132, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29218875

RESUMO

Electronic Health Records (EHRs) contain a wealth of patient data useful to biomedical researchers. At present, both the extraction of data and methods for analyses are frequently designed to work with a single snapshot of a patient's record. Health care providers often perform and record actions in small batches over time. By extracting these care events, a sequence can be formed providing a trajectory for a patient's interactions with the health care system. These care events also offer a basic heuristic for the level of attention a patient receives from health care providers. We show that is possible to learn meaningful embeddings from these care events using two deep learning techniques, unsupervised autoencoders and long short-term memory networks. We compare these methods to traditional machine learning methods which require a point in time snapshot to be extracted from an EHR.


Assuntos
Cuidados Críticos/estatística & dados numéricos , Aprendizado de Máquina/estatística & dados numéricos , Biologia Computacional/métodos , Bases de Dados Factuais/estatística & dados numéricos , Registros Eletrônicos de Saúde/estatística & dados numéricos , Feminino , Humanos , Masculino , Aprendizado de Máquina Supervisionado/estatística & dados numéricos , Aprendizado de Máquina não Supervisionado/estatística & dados numéricos
19.
Comput Methods Programs Biomed ; 165: 235-250, 2018 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-30337078

RESUMO

BACKGROUND AND OBJECTIVE: Accurate segmentation of the intra-retinal tissue layers in Optical Coherence Tomography (OCT) images plays an important role in the diagnosis and treatment of ocular diseases such as Age-Related Macular Degeneration (AMD) and Diabetic Macular Edema (DME). The existing energy minimization based methods employ multiple, manually handcrafted cost terms and often fail in the presence of pathologies. In this work, we eliminate the need to handcraft the energy by learning it from training images in an end-to-end manner. Our method can be easily adapted to pathologies by re-training it on an appropriate dataset. METHODS: We propose a Conditional Random Field (CRF) framework for the joint multi-layer segmentation of OCT B-scans. The appearance of each retinal layer and boundary is modeled by two convolutional filter banks and the shape priors are modeled using Gaussian distributions. The total CRF energy is linearly parameterized to allow a joint, end-to-end training by employing the Structured Support Vector Machine formulation. RESULTS: The proposed method outperformed three benchmark algorithms on four public datasets. The NORMAL-1 and NORMAL-2 datasets contain healthy OCT B-scans while the AMD-1 and DME-1 dataset contain B-scans of AMD and DME cases respectively. The proposed method achieved an average unsigned boundary localization error (U-BLE) of 1.52 pixels on NORMAL-1, 1.11 pixels on NORMAL-2 and 2.04 pixels on the combined NORMAL-1 and DME-1 dataset across the eight layer boundaries, outperforming the three benchmark methods in each case. The Dice coefficient was 0.87 on NORMAL-1, 0.89 on NORMAL-2 and 0.84 on the combined NORMAL-1 and DME-1 dataset across the seven retinal layers. On the combined NORMAL-1 and AMD-1 dataset, we achieved an average U-BLE of 1.86 pixels on the ILM, inner and outer RPE boundaries and a Dice of 0.98 for the ILM-RPEin region and 0.81 for the RPE layer. CONCLUSION: We have proposed a supervised CRF based method to jointly segment multiple tissue layers in OCT images. It can aid the ophthalmologists in the quantitative analysis of structural changes in the retinal tissue layers for clinical practice and large-scale clinical studies.


Assuntos
Técnicas de Diagnóstico Oftalmológico/estatística & dados numéricos , Retina/diagnóstico por imagem , Tomografia de Coerência Óptica/estatística & dados numéricos , Algoritmos , Bases de Dados Factuais , Retinopatia Diabética/diagnóstico por imagem , Humanos , Interpretação de Imagem Assistida por Computador/métodos , Interpretação de Imagem Assistida por Computador/estatística & dados numéricos , Degeneração Macular/diagnóstico por imagem , Aprendizado de Máquina Supervisionado/estatística & dados numéricos
20.
Pac Symp Biocomput ; 23: 460-471, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29218905

RESUMO

With the maturation of metabolomics science and proliferation of biobanks, clinical metabolic profiling is an increasingly opportunistic frontier for advancing translational clinical research. Automated Machine Learning (AutoML) approaches provide exciting opportunity to guide feature selection in agnostic metabolic profiling endeavors, where potentially thousands of independent data points must be evaluated. In previous research, AutoML using high-dimensional data of varying types has been demonstrably robust, outperforming traditional approaches. However, considerations for application in clinical metabolic profiling remain to be evaluated. Particularly, regarding the robustness of AutoML to identify and adjust for common clinical confounders. In this study, we present a focused case study regarding AutoML considerations for using the Tree-Based Optimization Tool (TPOT) in metabolic profiling of exposure to metformin in a biobank cohort. First, we propose a tandem rank-accuracy measure to guide agnostic feature selection and corresponding threshold determination in clinical metabolic profiling endeavors. Second, while AutoML, using default parameters, demonstrated potential to lack sensitivity to low-effect confounding clinical covariates, we demonstrated residual training and adjustment of metabolite features as an easily applicable approach to ensure AutoML adjustment for potential confounding characteristics. Finally, we present increased homocysteine with long-term exposure to metformin as a potentially novel, non-replicated metabolite association suggested by TPOT; an association not identified in parallel clinical metabolic profiling endeavors. While warranting independent replication, our tandem rank-accuracy measure suggests homocysteine to be the metabolite feature with largest effect, and corresponding priority for further translational clinical research. Residual training and adjustment for a potential confounding effect by BMI only slightly modified the suggested association. Increased homocysteine is thought to be associated with vitamin B12 deficiency - evaluation for potential clinical relevance is suggested. While considerations for clinical metabolic profiling are recommended, including adjustment approaches for clinical confounders, AutoML presents an exciting tool to enhance clinical metabolic profiling and advance translational research endeavors.


Assuntos
Homocisteína/sangue , Hipoglicemiantes/efeitos adversos , Metaboloma , Metformina/efeitos adversos , Aprendizado de Máquina Supervisionado/estatística & dados numéricos , Viés , Índice de Massa Corporal , Estudos de Casos e Controles , Biologia Computacional/métodos , Diabetes Mellitus Tipo 2/sangue , Diabetes Mellitus Tipo 2/tratamento farmacológico , Humanos , Metabolômica/estatística & dados numéricos , Fatores de Risco , Pesquisa Translacional Biomédica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA