RESUMO
BACKGROUND: It is unclear whether data-driven machine learning models, which are trained on large epidemiological cohorts, may improve prediction of comorbidities in people living with human immunodeficiency virus (HIV). METHODS: In this proof-of-concept study, we included people living with HIV in the prospective Swiss HIV Cohort Study with a first estimated glomerular filtration rate (eGFR) >60 mL/minute/1.73 m2 after 1 January 2002. Our primary outcome was chronic kidney disease (CKD)-defined as confirmed decrease in eGFR ≤60 mL/minute/1.73 m2 over 3 months apart. We split the cohort data into a training set (80%), validation set (10%), and test set (10%), stratified for CKD status and follow-up length. RESULTS: Of 12 761 eligible individuals (median baseline eGFR, 103 mL/minute/1.73 m2), 1192 (9%) developed a CKD after a median of 8 years. We used 64 static and 502 time-changing variables: Across prediction horizons and algorithms and in contrast to expert-based standard models, most machine learning models achieved state-of-the-art predictive performances with areas under the receiver operating characteristic curve and precision recall curve ranging from 0.926 to 0.996 and from 0.631 to 0.956, respectively. CONCLUSIONS: In people living with HIV, we observed state-of-the-art performances in forecasting individual CKD onsets with different machine learning algorithms.
Assuntos
Infecções por HIV/complicações , Aprendizado de Máquina , Insuficiência Renal Crônica/diagnóstico , Adulto , Estudos de Coortes , Feminino , Taxa de Filtração Glomerular , Infecções por HIV/tratamento farmacológico , Infecções por HIV/epidemiologia , Conhecimentos, Atitudes e Prática em Saúde , Humanos , Masculino , Pessoa de Meia-Idade , Valor Preditivo dos Testes , Estudos Prospectivos , Insuficiência Renal Crônica/complicações , Insuficiência Renal Crônica/epidemiologia , Fatores de Risco , Suíça/epidemiologiaRESUMO
HIV patients are treated by administration of combinations of antiretroviral drugs. The very large number of such combinations makes the manual search for an effective therapy practically impossible, especially in advanced stages of the disease. Therapy selection can be supported by statistical methods that predict the outcomes of candidate therapies. However, these methods are based on clinical data sets that have highly unbalanced therapy representation. This paper presents a novel approach that considers each drug belonging to a target combination therapy as a separate task in a multi-task hierarchical Bayes setting. The drug-specific models take into account information on all therapies containing the drug, not just the target therapy. In this way, we can circumvent the problem of data sparseness pertaining to some target therapies. The computational validation shows that compared to the most commonly used approach that provides therapy information in the form of input features, our model has significantly higher predictive power for therapies with very few training samples and is at least as powerful for abundant therapies.
Assuntos
Terapia Antirretroviral de Alta Atividade , Infecções por HIV/tratamento farmacológico , Modelos Estatísticos , Algoritmos , Teorema de Bayes , Simulação por Computador , Humanos , Prognóstico , Reprodutibilidade dos Testes , Resultado do TratamentoRESUMO
Infections with the human immunodeficiency virus type 1 (HIV-1) are treated with combinations of drugs. Unfortunately, HIV responds to the treatment by developing resistance mutations. Consequently, the genome of the viral target proteins is sequenced and inspected for resistance mutations as part of routine diagnostic procedures for ensuring an effective treatment. For predicting response to a combination therapy, currently available computer-based methods rely on the genotype of the virus and the composition of the regimen as input. However, no available tool takes full advantage of the knowledge about the order of and the response to previously prescribed regimens. The resulting high-dimensional feature space makes existing methods difficult to apply in a straightforward fashion. The machine learning system proposed in this work, sequence boosting, is tailored to exploiting such high-dimensional information, i.e. the extraction of longitudinal features, by utilizing the recent advancements in data mining and boosting. When applied to predicting the latest treatment outcome for 3,759 treatment-experienced patients from the EuResist integrated database, sequence boosting achieved superior performance compared to SVMs with RBF kernels. Moreover, sequence boosting allows an easy access to the discriminative treatment information. Analysis of feature importance values provided by our model confirmed known facts regarding HIV treatment. For instance, application of potent and recently licensed drugs was beneficial for patients, and, conversely, the patient group that was subject to NRTI mono-therapies in the past had poor treatment perspectives today. Furthermore, our model revealed novel biological insights. More precisely, the combination of previously used drugs with their in vivo response is more informative than the information of previously used drugs alone. Using this information improves the performance of systems for predicting therapy outcome.
Assuntos
Fármacos Anti-HIV/uso terapêutico , Inteligência Artificial , Mineração de Dados/métodos , Farmacorresistência Viral/genética , Infecções por HIV/tratamento farmacológico , HIV-1/genética , Simulação por Computador , Interpretação Estatística de Dados , Bases de Dados Factuais , Quimioterapia Combinada , Humanos , Mutação , Resultado do TratamentoRESUMO
MOTIVATION: As there exists no cure or vaccine for the infection with human immunodeficiency virus (HIV), the standard approach to treating HIV patients is to repeatedly administer different combinations of several antiretroviral drugs. Because of the large number of possible drug combinations, manually finding a successful regimen becomes practically impossible. This presents a major challenge for HIV treatment. The application of machine learning methods for predicting virological responses to potential therapies is a possible approach to solving this problem. However, due to evolving trends in treating HIV patients the available clinical datasets have a highly unbalanced representation, which might negatively affect the usefulness of derived statistical models. RESULTS: This article presents an approach that tackles the problem of predicting virological response to combination therapies by learning a separate logistic regression model for each therapy. The models are fitted by using not only the data from the target therapy but also the information from similar therapies. For this purpose, we introduce and evaluate two different measures of therapy similarity. The models are also able to incorporate phenotypic knowledge on the therapy outcomes through a Gaussian prior. With our approach we balance the uneven therapy representation in the datasets and produce higher quality models for therapies with very few training samples. According to the results from the computational experiments our therapy similarity model performs significantly better than training separate models for each therapy by using solely their examples. Furthermore, the model's performance is as good as an approach that encodes therapy information in the input feature space with the advantage of delivering better results for therapies with very few training samples. AVAILABILITY: Code of the efficient logistic regression is available from http://www.mpi-inf.mpg.de/%7Ejasmina/fastLogistic.zip.
Assuntos
Inteligência Artificial , Quimioterapia Combinada , Infecções por HIV/tratamento farmacológico , Modelos Logísticos , Fármacos Anti-HIV/uso terapêutico , Genótipo , HIV/genética , Infecções por HIV/virologia , Humanos , Resultado do TratamentoRESUMO
The HIV-1 reservoir is the major hurdle to curing HIV-1. However, the impact of the viral genome on the HIV-1 reservoir, i.e. its heritability, remains unknown. We investigate the heritability of the HIV-1 reservoir size and its long-term decay by analyzing the distribution of those traits on viral phylogenies from both partial-pol and viral near full-length genome sequences. We use a unique nationwide cohort of 610 well-characterized HIV-1 subtype-B infected individuals on suppressive ART for a median of 5.4 years. We find that a moderate but significant fraction of the HIV-1 reservoir size 1.5 years after the initiation of ART is explained by genetic factors. At the same time, we find more tentative evidence for the heritability of the long-term HIV-1 reservoir decay. Our findings indicate that viral genetic factors contribute to the HIV-1 reservoir size and hence the infecting HIV-1 strain may affect individual patients' hurdle towards a cure.
Assuntos
Antirretrovirais/farmacologia , HIV-1/efeitos dos fármacos , HIV-1/genética , Adulto , Linfócitos T CD4-Positivos/virologia , Estudos de Coortes , DNA Viral/genética , Feminino , Genoma Viral , Infecções por HIV/virologia , Humanos , Masculino , Fatores de Tempo , Carga ViralRESUMO
BACKGROUND: The primary hurdle for the eradication of HIV-1 is the establishment of a latent viral reservoir early after primary infection. Here, we investigated the potential influence of human genetic variation on the HIV-1 reservoir size and its decay rate during suppressive antiretroviral treatment. SETTING: Genome-wide association study and exome sequencing study to look for host genetic determinants of HIV-1 reservoir measurements in patients enrolled in the Swiss HIV Cohort Study, a nation-wide prospective observational study. METHODS: We measured total HIV-1 DNA in peripheral blood mononuclear cells from study participants, as a proxy for the reservoir size at 3 time points over a median of 5.4 years, and searched for associations between human genetic variation and 2 phenotypic readouts: the reservoir size at the first time point and its decay rate over the study period. We assessed the contribution of common genetic variants using genome-wide genotyping data from 797 patients with European ancestry enrolled in the Swiss HIV Cohort Study and searched for a potential impact of rare variants and exonic copy number variants using exome sequencing data generated in a subset of 194 study participants. RESULTS: Genome-wide and exome-wide analyses did not reveal any significant association with the size of the HIV-1 reservoir or its decay rate on suppressive antiretroviral treatment. CONCLUSIONS: Our results point to a limited influence of human genetics on the size of the HIV-1 reservoir and its long-term dynamics in successfully treated individuals.
Assuntos
Fármacos Anti-HIV/uso terapêutico , Variação Genética , Genoma Humano , Genômica/métodos , Infecções por HIV/genética , HIV-1 , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Genótipo , Infecções por HIV/tratamento farmacológico , Humanos , Fatores de TempoRESUMO
In genetics, many evolutionary pathways can be modeled by the ordered accumulation of permanent changes. Mixture models of mutagenetic trees have been used to describe disease progression in cancer and in HIV. In cancer, progression is modeled by the accumulation of chromosomal gains and losses in tumor cells; in HIV, the accumulation of drug resistance-associated mutations in the viral genome is known to be associated with disease progression. From such evolutionary models, genetic progression scores can be derived that assign measures for the disease state to single patients. Rtreemix is an R package for estimating mixture models of evolutionary pathways from observed cross-sectional data and for estimating associated genetic progression scores. The package also provides extended functionality for estimating confidence intervals for estimated model parameters and for evaluating the stability of the estimated evolutionary mixture models.
Assuntos
Evolução Molecular , Infecções por HIV/virologia , HIV/genética , Neoplasias/genética , Software , Algoritmos , Animais , Simulação por Computador , Progressão da Doença , Farmacorresistência Viral/genética , HIV/efeitos dos fármacos , Humanos , Modelos Genéticos , MutaçãoRESUMO
The HIV-1 reservoir is the major hurdle to a cure. We here evaluate viral and host characteristics associated with reservoir size and long-term dynamics in 1,057 individuals on suppressive antiretroviral therapy for a median of 5.4 years. At the population level, the reservoir decreases with diminishing differences over time, but increases in 26.6% of individuals. Viral blips and low-level viremia are significantly associated with slower reservoir decay. Initiation of ART within the first year of infection, pretreatment viral load, and ethnicity affect reservoir size, but less so long-term dynamics. Viral blips and low-level viremia are thus relevant for reservoir and cure studies.
Assuntos
Fármacos Anti-HIV/uso terapêutico , Reservatórios de Doenças , Infecções por HIV/diagnóstico , Infecções por HIV/tratamento farmacológico , Infecções por HIV/virologia , HIV-1/isolamento & purificação , Adulto , Feminino , Infecções por HIV/sangue , HIV-1/genética , Humanos , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , Modelos Biológicos , RNA Viral/sangue , Carga Viral , Viremia , Latência Viral/efeitos dos fármacosRESUMO
BACKGROUND: Mixture models of mutagenetic trees are evolutionary models that capture several pathways of ordered accumulation of genetic events observed in different subsets of patients. They were used to model HIV progression by accumulation of resistance mutations in the viral genome under drug pressure and cancer progression by accumulation of chromosomal aberrations in tumor cells. From the mixture models a genetic progression score (GPS) can be derived that estimates the genetic status of single patients according to the corresponding progression along the tree models. GPS values were shown to have predictive power for estimating drug resistance in HIV or the survival time in cancer. Still, the reliability of the exact values of such complex markers derived from graphical models can be questioned. RESULTS: In a simulation study, we analyzed various aspects of the stability of estimated mutagenetic trees mixture models. It turned out that the induced probabilistic distributions and the tree topologies are recovered with high precision by an EM-like learning algorithm. However, only for models with just one major model component, also GPS values of single patients can be reliably estimated. CONCLUSION: It is encouraging that the estimation process of mutagenetic trees mixture models can be performed with high confidence regarding induced probability distributions and the general shape of the tree topologies. For a model with only one major disease progression process, even genetic progression scores for single patients can be reliably estimated. However, for models with more than one relevant component, alternative measures should be introduced for estimating the stage of disease progression.
Assuntos
Análise Mutacional de DNA/métodos , DNA/genética , Predisposição Genética para Doença/genética , Instabilidade Genômica/genética , Modelos Genéticos , Análise de Sequência de DNA/métodos , Sequência de Bases , Interpretação Estatística de Dados , Progressão da Doença , Humanos , Modelos Estatísticos , Dados de Sequência MolecularRESUMO
We present a mixture-of-experts approach for HIV therapy selection. The heterogeneity in patient data makes it difficult for one particular model to succeed at providing suitable therapy predictions for all patients. An appropriate means for addressing this heterogeneity is through combining kernel and model-based techniques. These methods capture different kinds of information: kernel-based methods are able to identify clusters of similar patients, and work well when modelling the viral response for these groups. In contrast, model-based methods capture the sequential process of decision making, and are able to find simpler, yet accurate patterns in response for patients outside these groups. We take advantage of this information by proposing a mixture-of-experts model that automatically selects between the methods in order to assign the most appropriate therapy choice to an individual. Overall, we verify that therapy combinations proposed using this approach significantly outperform previous methods.