Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
Mais filtros

Bases de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Entropy (Basel) ; 23(10)2021 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-34682054

RESUMO

We present new PAC-Bayesian generalisation bounds for learning problems with unbounded loss functions. This extends the relevance and applicability of the PAC-Bayes learning framework, where most of the existing literature focuses on supervised learning problems with a bounded loss function (typically assumed to take values in the interval [0;1]). In order to relax this classical assumption, we propose to allow the range of the loss to depend on each predictor. This relaxation is captured by our new notion of HYPothesis-dependent rangE (HYPE). Based on this, we derive a novel PAC-Bayesian generalisation bound for unbounded loss functions, and we instantiate it on a linear regression problem. To make our theory usable by the largest audience possible, we include discussions on actual computation, practicality and limitations of our assumptions.

2.
Am J Hematol ; 95(8): 883-891, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32282969

RESUMO

Over 200 million malaria cases globally lead to half a million deaths annually. Accurate malaria diagnosis remains a challenge. Automated imaging processing approaches to analyze Thick Blood Films (TBF) could provide scalable solutions, for urban healthcare providers in the holoendemic malaria sub-Saharan region. Although several approaches have been attempted to identify malaria parasites in TBF, none have achieved negative and positive predictive performance suitable for clinical use in the west sub-Saharan region. While malaria parasite object detection remains an intermediary step in achieving automatic patient diagnosis, training state-of-the-art deep-learning object detectors requires the human-expert labor-intensive process of labeling a large dataset of digitized TBF. To overcome these challenges and to achieve a clinically usable system, we show a novel approach. It leverages routine clinical-microscopy labels from our quality-controlled malaria clinics, to train a Deep Malaria Convolutional Neural Network classifier (DeepMCNN) for automated malaria diagnosis. Our system also provides total Malaria Parasite (MP) and White Blood Cell (WBC) counts allowing parasitemia estimation in MP/µL, as recommended by the WHO. Prospective validation of the DeepMCNN achieves sensitivity/specificity of 0.92/0.90 against expert-level malaria diagnosis. Our approach PPV/NPV performance is of 0.92/0.90, which is clinically usable in our holoendemic settings in the densely populated metropolis of Ibadan. It is located within the most populous African country (Nigeria) and with one of the largest burdens of Plasmodium falciparum malaria. Our openly available method is of importance for strategies aimed to scale malaria diagnosis in urban regions where daily assessment of thousands of specimens is required.


Assuntos
Malária Falciparum/sangue , Malária/diagnóstico , Redes Neurais de Computação , Humanos , Malária/sangue
3.
Neuroimage ; 195: 215-231, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-30894334

RESUMO

Combining neuroimaging and clinical information for diagnosis, as for example behavioral tasks and genetics characteristics, is potentially beneficial but presents challenges in terms of finding the best data representation for the different sources of information. Their simple combination usually does not provide an improvement if compared with using the best source alone. In this paper, we proposed a framework based on a recent multiple kernel learning algorithm called EasyMKL and we investigated the benefits of this approach for diagnosing two different mental health diseases. The well known Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset tackling the Alzheimer Disease (AD) patients versus healthy controls classification task, and a second dataset tackling the task of classifying an heterogeneous group of depressed patients versus healthy controls. We used EasyMKL to combine a huge amount of basic kernels alongside a feature selection methodology, pursuing an optimal and sparse solution to facilitate interpretability. Our results show that the proposed approach, called EasyMKLFS, outperforms baselines (e.g. SVM and SimpleMKL), state-of-the-art random forests (RF) and feature selection (FS) methods.


Assuntos
Algoritmos , Doença de Alzheimer/diagnóstico , Depressão/diagnóstico , Aprendizado de Máquina , Neuroimagem/métodos , Humanos , Interpretação de Imagem Assistida por Computador/métodos
4.
Bioinformatics ; 33(7): 951-955, 2017 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-28073756

RESUMO

Motivation: Somatic DNA recombination, the hallmark of vertebrate adaptive immunity, has the potential to generate a vast diversity of antigen receptor sequences. How this diversity captures antigen specificity remains incompletely understood. In this study we use high throughput sequencing to compare the global changes in T cell receptor ß chain complementarity determining region 3 (CDR3ß) sequences following immunization with ovalbumin administered with complete Freund's adjuvant (CFA) or CFA alone. Results: The CDR3ß sequences were deconstructed into short stretches of overlapping contiguous amino acids. The motifs were ranked according to a one-dimensional Bayesian classifier score comparing their frequency in the repertoires of the two immunization classes. The top ranking motifs were selected and used to create feature vectors which were used to train a support vector machine. The support vector machine achieved high classification scores in a leave-one-out validation test reaching >90% in some cases. Summary: The study describes a novel two-stage classification strategy combining a one-dimensional Bayesian classifier with a support vector machine. Using this approach we demonstrate that the frequency of a small number of linear motifs three amino acids in length can accurately identify a CD4 T cell response to ovalbumin against a background response to the complex mixture of antigens which characterize Complete Freund's Adjuvant. Availability and implementation: The sequence data is available at www.ncbi.nlm.nih.gov/sra/?term»SRP075893 . The Decombinator package is available at github.com/innate2adaptive/Decombinator . The R package e1071 is available at the CRAN repository https://cran.r-project.org/web/packages/e1071/index.html . Contact: b.chain@ucl.ac.uk. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Regiões Determinantes de Complementaridade/metabolismo , Máquina de Vetores de Suporte , Aminoácidos/metabolismo , Animais , Teorema de Bayes , Linfócitos T CD4-Positivos/imunologia , Regiões Determinantes de Complementaridade/química , Bases de Dados Genéticas , Humanos , Camundongos Endogâmicos C57BL , Receptores de Antígenos de Linfócitos T alfa-beta/química
5.
Neuroimage ; 105: 493-506, 2015 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-25463459

RESUMO

Pattern recognition applied to whole-brain neuroimaging data, such as functional Magnetic Resonance Imaging (fMRI), has proved successful at discriminating psychiatric patients from healthy participants. However, predictive patterns obtained from whole-brain voxel-based features are difficult to interpret in terms of the underlying neurobiology. Many psychiatric disorders, such as depression and schizophrenia, are thought to be brain connectivity disorders. Therefore, pattern recognition based on network models might provide deeper insights and potentially more powerful predictions than whole-brain voxel-based approaches. Here, we build a novel sparse network-based discriminative modeling framework, based on Gaussian graphical models and L1-norm regularized linear Support Vector Machines (SVM). In addition, the proposed framework is optimized in terms of both predictive power and reproducibility/stability of the patterns. Our approach aims to provide better pattern interpretation than voxel-based whole-brain approaches by yielding stable brain connectivity patterns that underlie discriminative changes in brain function between the groups. We illustrate our technique by classifying patients with major depressive disorder (MDD) and healthy participants, in two (event- and block-related) fMRI datasets acquired while participants performed a gender discrimination and emotional task, respectively, during the visualization of emotional valent faces.


Assuntos
Mapeamento Encefálico/métodos , Encéfalo/fisiopatologia , Transtorno Depressivo Maior/diagnóstico , Reconhecimento Automatizado de Padrão/métodos , Adulto , Feminino , Humanos , Imageamento por Ressonância Magnética/métodos , Masculino , Pessoa de Meia-Idade , Redes Neurais de Computação
6.
Bioinformatics ; 30(22): 3181-8, 2014 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-25095879

RESUMO

MOTIVATION: The clonal theory of adaptive immunity proposes that immunological responses are encoded by increases in the frequency of lymphocytes carrying antigen-specific receptors. In this study, we measure the frequency of different T-cell receptors (TcR) in CD4 + T cell populations of mice immunized with a complex antigen, killed Mycobacterium tuberculosis, using high throughput parallel sequencing of the TcRß chain. Our initial hypothesis that immunization would induce repertoire convergence proved to be incorrect, and therefore an alternative approach was developed that allows accurate stratification of TcR repertoires and provides novel insights into the nature of CD4 + T-cell receptor recognition. RESULTS: To track the changes induced by immunization within this heterogeneous repertoire, the sequence data were classified by counting the frequency of different clusters of short (3 or 4) continuous stretches of amino acids within the antigen binding complementarity determining region 3 (CDR3) repertoire of different mice. Both unsupervised (hierarchical clustering) and supervised (support vector machine) analyses of these different distributions of sequence clusters differentiated between immunized and unimmunized mice with 100% efficiency. The CD4 + TcR repertoires of mice 5 and 14 days postimmunization were clearly different from that of unimmunized mice but were not distinguishable from each other. However, the repertoires of mice 60 days postimmunization were distinct both from naive mice and the day 5/14 animals. Our results reinforce the remarkable diversity of the TcR repertoire, resulting in many diverse private TcRs contributing to the T-cell response even in genetically identical mice responding to the same antigen. However, specific motifs defined by short stretches of amino acids within the CDR3 region may determine TcR specificity and define a new approach to TcR sequence classification. AVAILABILITY AND IMPLEMENTATION: The analysis was implemented in R and Python, and source code can be found in Supplementary Data. CONTACT: b.chain@ucl.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Linfócitos T CD4-Positivos/imunologia , Regiões Determinantes de Complementaridade/química , Receptores de Antígenos de Linfócitos T/química , Sequência de Aminoácidos , Animais , Análise por Conglomerados , Imunização , Camundongos , Mycobacterium tuberculosis/imunologia , Receptores de Antígenos de Linfócitos T/imunologia , Receptores de Antígenos de Linfócitos T alfa-beta/química , Análise de Sequência de Proteína , Máquina de Vetores de Suporte
7.
Bioinformatics ; 29(5): 542-50, 2013 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-23303508

RESUMO

SUMMARY: High-throughput sequencing provides an opportunity to analyse the repertoire of antigen-specific receptors with an unprecedented breadth and depth. However, the quantity of raw data produced by this technology requires efficient ways to categorize and store the output for subsequent analysis. To this end, we have defined a simple five-item identifier that uniquely and unambiguously defines each TcR sequence. We then describe a novel application of finite-state automaton to map Illumina short-read sequence data for individual TcRs to their respective identifier. An extension of the standard algorithm is also described, which allows for the presence of single-base pair mismatches arising from sequencing error. The software package, named Decombinator, is tested first on a set of artificial in silico sequences and then on a set of published human TcR-ß sequences. Decombinator assigned sequences at a rate more than two orders of magnitude faster than that achieved by classical pairwise alignment algorithms, and with a high degree of accuracy (>88%), even after introducing up to 1% error rates in the in silico sequences. Analysis of the published sequence dataset highlighted the strong V and J usage bias observed in the human peripheral blood repertoire, which seems to be unconnected to antigen exposure. The analysis also highlighted the enormous size of the available repertoire and the challenge of obtaining a comprehensive description for it. The Decombinator package will be a valuable tool for further in-depth analysis of the T-cell repertoire. AVAILABILITY AND IMPLEMENTATION: The Decombinator package is implemented in Python (v2.6) and is freely available at https://github.com/uclinfectionimmunity/Decombinator along with full documentation and examples of typical usage.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Receptores de Antígenos de Linfócitos T alfa-beta/genética , Análise de Sequência de DNA/métodos , Software , Algoritmos , Humanos , Receptores de Antígenos de Linfócitos T alfa-beta/química
8.
PLoS Comput Biol ; 9(4): e1003018, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-23637585

RESUMO

Biomarker discovery aims to find small subsets of relevant variables in 'omics data that correlate with the clinical syndromes of interest. Despite the fact that clinical phenotypes are usually characterized by a complex set of clinical parameters, current computational approaches assume univariate targets, e.g. diagnostic classes, against which associations are sought for. We propose an approach based on asymmetrical sparse canonical correlation analysis (SCCA) that finds multivariate correlations between the 'omics measurements and the complex clinical phenotypes. We correlated plasma proteomics data to multivariate overlapping complex clinical phenotypes from tuberculosis and malaria datasets. We discovered relevant 'omic biomarkers that have a high correlation to profiles of clinical measurements and are remarkably sparse, containing 1.5-3% of all 'omic variables. We show that using clinical view projections we obtain remarkable improvements in diagnostic class prediction, up to 11% in tuberculosis and up to 5% in malaria. Our approach finds proteomic-biomarkers that correlate with complex combinations of clinical-biomarkers. Using the clinical-biomarkers improves the accuracy of diagnostic class prediction while not requiring the measurement plasma proteomic profiles of each subject. Our approach makes it feasible to use omics' data to build accurate diagnostic algorithms that can be deployed to community health centres lacking the expensive 'omics measurement capabilities.


Assuntos
Biomarcadores/metabolismo , Malária/metabolismo , Tuberculose/metabolismo , Algoritmos , Biomarcadores/sangue , Biologia Computacional/métodos , Genômica , Humanos , Fenótipo , Proteômica/métodos , Reprodutibilidade dos Testes
9.
Sci Rep ; 12(1): 16286, 2022 09 29.
Artigo em Inglês | MEDLINE | ID: mdl-36175579

RESUMO

Post-hazard rapid response has emerged as a promising pathway towards resilient critical infrastructure systems (CISs). Nevertheless, it is challenging to scheme the optimal plan for those rapid responses, given the enormous search space and the hardship of assessment on the spatiotemporal status of CISs. We now present a new approach to post-shock rapid responses of road networks (RNs), based upon lookahead searches supported by machine learning. Following this approach, we examined the resilience-oriented rapid response of a real-world RN across Luchon, France, under destructive earthquake scenarios. Our results show that the introduction of one-step lookahead searches can effectively offset the lack of adaptivity due to the deficient heuristic of rapid responses. Furthermore, the performance of rapid responses following such a strategy is far surpassed, when a series of deep neural networks trained based solely on machine learning, without human interventions, are employed to replace the heuristic and guide the searches.


Assuntos
Inteligência Artificial , Terremotos , Heurística , Humanos , Aprendizado de Máquina , Redes Neurais de Computação
10.
Sci Rep ; 12(1): 7692, 2022 05 11.
Artigo em Inglês | MEDLINE | ID: mdl-35545647

RESUMO

How do we best constrain social interactions to decrease transmission of communicable diseases? Indiscriminate suppression is unsustainable long term and presupposes that all interactions carry equal importance. Instead, transmission within a social network has been shown to be determined by its topology. In this paper, we deploy simulations to understand and quantify the impact on disease transmission of a set of topological network features, building a dataset of 9000 interaction graphs using generators of different types of synthetic social networks. Independently of the topology of the network, we maintain constant the total volume of social interactions in our simulations, to show how even with the same social contact some network structures are more or less resilient to the spread. We find a suitable intervention to be specific suppression of unfamiliar and casual interactions that contribute to the network's global efficiency. This is, pathogen spread is significantly reduced by limiting specific kinds of contact rather than their global number. Our numerical studies might inspire further investigation in connection to public health, as an integrative framework to craft and evaluate social interventions in communicable diseases with different social graphs or as a highlight of network metrics that should be captured in social studies.


Assuntos
Doenças Transmissíveis , Humanos
11.
Artigo em Inglês | MEDLINE | ID: mdl-35952973

RESUMO

Canonical correlation analysis (CCA) and partial least squares (PLS) are powerful multivariate methods for capturing associations across 2 modalities of data (e.g., brain and behavior). However, when the sample size is similar to or smaller than the number of variables in the data, standard CCA and PLS models may overfit, i.e., find spurious associations that generalize poorly to new data. Dimensionality reduction and regularized extensions of CCA and PLS have been proposed to address this problem, yet most studies using these approaches have some limitations. This work gives a theoretical and practical introduction into the most common CCA/PLS models and their regularized variants. We examine the limitations of standard CCA and PLS when the sample size is similar to or smaller than the number of variables. We discuss how dimensionality reduction and regularization techniques address this problem and explain their main advantages and disadvantages. We highlight crucial aspects of the CCA/PLS analysis framework, including optimizing the hyperparameters of the model and testing the identified associations for statistical significance. We apply the described CCA/PLS models to simulated data and real data from the Human Connectome Project and Alzheimer's Disease Neuroimaging Initiative (both of n > 500). We use both low- and high-dimensionality versions of these data (i.e., ratios between sample size and variables in the range of ∼1-10 and ∼0.1-0.01, respectively) to demonstrate the impact of data dimensionality on the models. Finally, we summarize the key lessons of the tutorial.


Assuntos
Análise de Correlação Canônica , Conectoma , Humanos , Análise dos Mínimos Quadrados , Algoritmos , Encéfalo
12.
Neuroimage ; 54(3): 2267-77, 2011 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-20970510

RESUMO

During auditory perception, we are required to abstract information from complex temporal sequences such as those in music and speech. Here, we investigated how higher-order statistics modulate the neural responses to sound sequences, hypothesizing that these modulations are associated with higher levels of the peri-Sylvian auditory hierarchy. We devised second-order Markov sequences of pure tones with uniform first-order transition probabilities. Participants learned to discriminate these sequences from random ones. Magnetoencephalography was used to identify evoked fields in which second-order transition probabilities were encoded. We show that improbable tones evoked heightened neural responses after 200 ms post-tone onset during exposure at the learning stage or around 150 ms during the subsequent test stage, originating near the right temporoparietal junction. These signal changes reflected higher-order statistical learning, which can contribute to the perception of natural sounds with hierarchical structures. We propose that our results reflect hierarchical predictive representations, which can contribute to the experiences of speech and music.


Assuntos
Percepção Auditiva/fisiologia , Estimulação Acústica , Córtex Auditivo/fisiologia , Aprendizagem por Discriminação/fisiologia , Potenciais Evocados Auditivos/fisiologia , Previsões , Humanos , Aprendizagem/fisiologia , Magnetoencefalografia , Cadeias de Markov , Lobo Parietal/fisiologia , Lobo Temporal/fisiologia
13.
Neuroimage ; 58(3): 793-804, 2011 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-21723950

RESUMO

Pattern recognition approaches, such as the Support Vector Machine (SVM), have been successfully used to classify groups of individuals based on their patterns of brain activity or structure. However these approaches focus on finding group differences and are not applicable to situations where one is interested in accessing deviations from a specific class or population. In the present work we propose an application of the one-class SVM (OC-SVM) to investigate if patterns of fMRI response to sad facial expressions in depressed patients would be classified as outliers in relation to patterns of healthy control subjects. We defined features based on whole brain voxels and anatomical regions. In both cases we found a significant correlation between the OC-SVM predictions and the patients' Hamilton Rating Scale for Depression (HRSD), i.e. the more depressed the patients were the more of an outlier they were. In addition the OC-SVM split the patient groups into two subgroups whose membership was associated with future response to treatment. When applied to region-based features the OC-SVM classified 52% of patients as outliers. However among the patients classified as outliers 70% did not respond to treatment and among those classified as non-outliers 89% responded to treatment. In addition 89% of the healthy controls were classified as non-outliers.


Assuntos
Depressão/diagnóstico , Interpretação de Imagem Assistida por Computador/métodos , Máquina de Vetores de Suporte , Adulto , Depressão/classificação , Emoções/fisiologia , Expressão Facial , Feminino , Humanos , Imageamento por Ressonância Magnética , Masculino , Pessoa de Meia-Idade
14.
Front Physiol ; 12: 730908, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34566692

RESUMO

The physical interaction between the T cell receptor (TCR) and its cognate antigen causes T cells to activate and participate in the immune response. Understanding this physical interaction is important in predicting TCR binding to a target epitope, as well as potential cross-reactivity. Here, we propose a way of collecting informative features of the binding interface from homology models of T cell receptor-peptide-major histocompatibility complex (TCR-pMHC) complexes. The information collected from these structures is sufficient to discriminate binding from non-binding TCR-pMHC pairs in multiple independent datasets. The classifier is limited by the number of crystal structures available for the homology modelling and by the size of the training set. However, the classifier shows comparable performance to sequence-based classifiers requiring much larger training sets.

16.
Patterns (N Y) ; 2(12): 100381, 2021 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-34950903

RESUMO

Individuals from a diverse range of backgrounds are increasingly engaging in research and development in the field of artificial intelligence (AI). The main activities, although still nascent, are coalescing around three core activities: innovation, policy, and capacity building. Within agriculture, which is the focus of this paper, AI is working with converging technologies, particularly data optimization, to add value along the entire agricultural value chain, including procurement, farm automation, and market access. Our key takeaway is that, despite the promising opportunities for development, there are actual and potential challenges that African countries need to consider in deciding whether to scale up or down the application of AI in agriculture. Input from African innovators, policymakers, and academics is essential to ensure that AI solutions are aligned with African needs and priorities. This paper proposes questions that can be used to form a road map to inform research and development in this area.

17.
Wellcome Open Res ; 5: 122, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32566761

RESUMO

Changing behaviour is necessary to address many of the threats facing human populations.  However, identifying behaviour change interventions likely to be effective in particular contexts as a basis for improving them presents a major challenge. The Human Behaviour-Change Project harnesses the power of artificial intelligence and behavioural science to organise global evidence about behaviour change to predict outcomes in common and unknown behaviour change scenarios.

18.
Biol Psychiatry ; 87(4): 368-376, 2020 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-32040421

RESUMO

BACKGROUND: In 2009, the National Institute of Mental Health launched the Research Domain Criteria, an attempt to move beyond diagnostic categories and ground psychiatry within neurobiological constructs that combine different levels of measures (e.g., brain imaging and behavior). Statistical methods that can integrate such multimodal data, however, are often vulnerable to overfitting, poor generalization, and difficulties in interpreting the results. METHODS: We propose an innovative machine learning framework combining multiple holdouts and a stability criterion with regularized multivariate techniques, such as sparse partial least squares and kernel canonical correlation analysis, for identifying hidden dimensions of cross-modality relationships. To illustrate the approach, we investigated structural brain-behavior associations in an extensively phenotyped developmental sample of 345 participants (312 healthy and 33 with clinical depression). The brain data consisted of whole-brain voxel-based gray matter volumes, and the behavioral data included item-level self-report questionnaires and IQ and demographic measures. RESULTS: Both sparse partial least squares and kernel canonical correlation analysis captured two hidden dimensions of brain-behavior relationships: one related to age and drinking and the other one related to depression. The applied machine learning framework indicates that these results are stable and generalize well to new data. Indeed, the identified brain-behavior associations are in agreement with previous findings in the literature concerning age, alcohol use, and depression-related changes in brain volume. CONCLUSIONS: Multivariate techniques (such as sparse partial least squares and kernel canonical correlation analysis) embedded in our novel framework are promising tools to link behavior and/or symptoms to neurobiology and thus have great potential to contribute to a biologically grounded definition of psychiatric disorders.


Assuntos
Encéfalo , Substância Cinzenta , Encéfalo/diagnóstico por imagem , Humanos , Aprendizado de Máquina , Transtornos do Humor , National Institute of Mental Health (U.S.) , Estados Unidos
19.
Sci Rep ; 10(1): 15918, 2020 09 28.
Artigo em Inglês | MEDLINE | ID: mdl-32985514

RESUMO

Over 200 million malaria cases globally lead to half-million deaths annually. The development of malaria prevalence prediction systems to support malaria care pathways has been hindered by lack of data, a tendency towards universal "monolithic" models (one-size-fits-all-regions) and a focus on long lead time predictions. Current systems do not provide short-term local predictions at an accuracy suitable for deployment in clinical practice. Here we show a data-driven approach that reliably produces one-month-ahead prevalence prediction within a densely populated all-year-round malaria metropolis of over 3.5 million inhabitants situated in Nigeria which has one of the largest global burdens of P. falciparum malaria. We estimate one-month-ahead prevalence in a unique 22-years prospective regional dataset of > 9 × 104 participants attending our healthcare services. Our system agrees with both magnitude and direction of the prediction on validation data achieving MAE ≤ 6 × 10-2, MSE ≤ 7 × 10-3, PCC (median 0.63, IQR 0.3) and with more than 80% of estimates within a (+ 0.1 to - 0.05) error-tolerance range which is clinically relevant for decision-support in our holoendemic setting. Our data-driven approach could facilitate healthcare systems to harness their own data to support local malaria care pathways.


Assuntos
Malária/epidemiologia , População Urbana , África Subsaariana/epidemiologia , África Ocidental/epidemiologia , Humanos , Modelos Teóricos , Prevalência , Estudos Prospectivos
20.
IEEE Trans Pattern Anal Mach Intell ; 31(8): 1347-61, 2009 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-19542571

RESUMO

The presence of irrelevant features in training data is a significant obstacle for many machine learning tasks. One approach to this problem is to extract appropriate features and, often, one selects a feature extraction method based on the inference algorithm. Here, we formalize a general framework for feature extraction, based on Partial Least Squares, in which one can select a user-defined criterion to compute projection directions. The framework draws together a number of existing results and provides additional insights into several popular feature extraction methods. Two new sparse kernel feature extraction methods are derived under the framework, called Sparse Maximal Alignment (SMA) and Sparse Maximal Covariance (SMC), respectively. Key advantages of these approaches include simple implementation and a training time which scales linearly in the number of examples. Furthermore, one can project a new test example using only k kernel evaluations, where k is the output dimensionality. Computational results on several real-world data sets show that SMA and SMC extract features which are as predictive as those found using other popular feature extraction methods. Additionally, on large text retrieval and face detection data sets, they produce features which match the performance of the original ones in conjunction with a Support Vector Machine.


Assuntos
Inteligência Artificial , Análise dos Mínimos Quadrados , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Bases de Dados Factuais , Face , Humanos , Análise de Componente Principal , Curva ROC
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA