RESUMO
BACKGROUND: Accurate prediction of healthcare costs is important for optimally managing health costs. However, methods leveraging the medical richness from data such as health insurance claims or electronic health records are missing. METHODS: Here, we developed a deep neural network to predict future cost from health insurance claims records. We applied the deep network and a ridge regression model to a sample of 1.4 million German insurants to predict total one-year health care costs. Both methods were compared to existing models with various performance measures and were also used to predict patients with a change in costs and to identify relevant codes for this prediction. RESULTS: We showed that the neural network outperformed the ridge regression as well as all considered models for cost prediction. Further, the neural network was superior to ridge regression in predicting patients with cost change and identified more specific codes. CONCLUSION: In summary, we showed that our deep neural network can leverage the full complexity of the patient records and outperforms standard approaches. We suggest that the better performance is due to the ability to incorporate complex interactions in the model and that the model might also be used for predicting other health phenotypes.
Assuntos
Aprendizado Profundo , Custos de Cuidados de Saúde , Alemanha , Humanos , Revisão da Utilização de Seguros , Redes Neurais de Computação , Saúde da PopulaçãoRESUMO
Plants use light as source of energy and information to detect diurnal rhythms and seasonal changes. Sensing changing light conditions is critical to adjust plant metabolism and to initiate developmental transitions. Here, we analyzed transcriptome-wide alterations in gene expression and alternative splicing (AS) of etiolated seedlings undergoing photomorphogenesis upon exposure to blue, red, or white light. Our analysis revealed massive transcriptome reprogramming as reflected by differential expression of â¼20% of all genes and changes in several hundred AS events. For more than 60% of all regulated AS events, light promoted the production of a presumably protein-coding variant at the expense of an mRNA with nonsense-mediated decay-triggering features. Accordingly, AS of the putative splicing factor REDUCED RED-LIGHT RESPONSES IN CRY1CRY2 BACKGROUND1, previously identified as a red light signaling component, was shifted to the functional variant under light. Downstream analyses of candidate AS events pointed at a role of photoreceptor signaling only in monochromatic but not in white light. Furthermore, we demonstrated similar AS changes upon light exposure and exogenous sugar supply, with a critical involvement of kinase signaling. We propose that AS is an integration point of signaling pathways that sense and transmit information regarding the energy availability in plants.
Assuntos
Processamento Alternativo/fisiologia , Proteínas de Arabidopsis/metabolismo , Arabidopsis/genética , Transcriptoma/genética , Processamento Alternativo/genética , Arabidopsis/fisiologia , Proteínas de Arabidopsis/genética , Regulação da Expressão Gênica de Plantas/genética , Regulação da Expressão Gênica de Plantas/fisiologia , Transdução de Sinais/genética , Transdução de Sinais/fisiologiaRESUMO
PAR-CLIP (photoactivatable ribonucleoside-enhanced crosslinking and immunoprecipitation) facilitates the identification and mapping of protein/RNA interactions. So far, it has been limited to select cell-lines as it requires efficient 4SU uptake. To increase transcriptome complexity and thus identify additional RNA-protein interaction sites we fused HEK 293 T-Rex cells (HEK293-Y) that express the RNA binding protein YBX1 with PC12 cells expressing eGFP (PC12-eGFP). The resulting hybrids enable PAR-CLIP on a neuronally expanded transcriptome (Fusion-CLIP) and serve as a proof of principle. The fusion cells express both parental marker genes YBX1 and eGFP and the expanded transcriptome contains human and rat transcripts. PAR-CLIP of fused cells versus the parental HEK293-Y identified 768 novel RNA targets of YBX1. We were able to trace the origin of the majority of the short PAR-CLIP reads as they differentially mapped to the human and rat genome. Furthermore, Fusion-CLIP expanded the CAUC RNA binding motif of YBX1 to UCUUUNNCAUC. The fusion of HEK293-Y and PC12-eGFP cells resulted in cells with a diverse genome expressing human and rat transcripts that enabled the identification of novel YBX1 substrates. The technique allows the expansion of the HEK 293 transcriptome and makes PAR-CLIP available to fusion cells of diverse origin.
Assuntos
Fusão Celular/métodos , Proteínas de Fluorescência Verde/química , Proteínas de Fluorescência Verde/metabolismo , RNA Mensageiro/metabolismo , Proteína 1 de Ligação a Y-Box/química , Proteína 1 de Ligação a Y-Box/metabolismo , Motivos de Aminoácidos , Animais , Sítios de Ligação , Reagentes de Ligações Cruzadas , Perfilação da Expressão Gênica/métodos , Células HEK293 , Humanos , Imunoprecipitação , Células PC12 , Ligação Proteica , RatosRESUMO
Alternative mRNA splicing is a fundamental process to increase the versatility of the genome. In humans, cardiac mRNA splicing is involved in the pathophysiology of heart failure. Mutations in the splicing factor RNA binding motif protein 20 (RBM20) cause severe forms of cardiomyopathy. To identify novel cardiomyopathy-associated splicing factors, RNA-seq and tissue-enrichment analyses were performed, which identified up-regulated expression of Sam68-Like mammalian protein 2 (SLM2) in the left ventricle of dilated cardiomyopathy (DCM) patients. In the human heart, SLM2 binds to important transcripts of sarcomere constituents, such as those encoding myosin light chain 2 (MYL2), troponin I3 (TNNI3), troponin T2 (TNNT2), tropomyosin 1/2 (TPM1/2), and titin (TTN). Mechanistically, SLM2 mediates intron retention, prevents exon exclusion, and thereby mediates alternative splicing of the mRNA regions encoding the variable proline-, glutamate-, valine-, and lysine-rich (PEVK) domain and another part of the I-band region of titin. In summary, SLM2 is a novel cardiac splicing regulator with essential functions for maintaining cardiomyocyte integrity by binding to and processing the mRNAs of essential cardiac constituents such as titin.
Assuntos
Cardiomiopatia Dilatada , Insuficiência Cardíaca , Cardiomiopatia Dilatada/genética , Cardiomiopatia Dilatada/metabolismo , Conectina/genética , Conectina/metabolismo , Glutamatos , Insuficiência Cardíaca/genética , Humanos , Lisina , Prolina , Fatores de Processamento de RNA , RNA Mensageiro/metabolismo , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo , Tropomiosina/metabolismo , Troponina I/metabolismo , Troponina T/metabolismo , ValinaRESUMO
Early stages of embryogenesis depend on subcellular localization and transport of maternal mRNA. However, systematic analysis of these processes is hindered by a lack of spatio-temporal information in single-cell RNA sequencing. Here, we combine spatially-resolved transcriptomics and single-cell RNA labeling to perform a spatio-temporal analysis of the transcriptome during early zebrafish development. We measure spatial localization of mRNA molecules within the one-cell stage embryo, which allows us to identify a class of mRNAs that are specifically localized at an extraembryonic position, the vegetal pole. Furthermore, we establish a method for high-throughput single-cell RNA labeling in early zebrafish embryos, which enables us to follow the fate of individual maternal transcripts until gastrulation. This approach reveals that many localized transcripts are specifically transported to the primordial germ cells. Finally, we acquire spatial transcriptomes of two xenopus species and compare evolutionary conservation of localized genes as well as enriched sequence motifs.
Assuntos
Rastreamento de Células/métodos , Embrião não Mamífero/metabolismo , RNA Mensageiro/genética , Transcriptoma/genética , Peixe-Zebra/genética , Animais , Embrião não Mamífero/citologia , Embrião não Mamífero/embriologia , Feminino , Regulação da Expressão Gênica no Desenvolvimento , Oócitos/citologia , Oócitos/metabolismo , RNA Mensageiro/metabolismo , Análise de Célula Única/métodos , Análise Espaço-Temporal , Especificidade da Espécie , Xenopus/embriologia , Xenopus/genética , Xenopus laevis/embriologia , Xenopus laevis/genética , Peixe-Zebra/embriologiaRESUMO
OBJECTIVE: We propose a data-driven method to detect temporal patterns of disease progression in high-dimensional claims data based on gradient boosting with stability selection. MATERIALS AND METHODS: We identified patients with chronic obstructive pulmonary disease in a German health insurance claims database with 6.5 million individuals and divided them into a group of patients with the highest disease severity and a group of control patients with lower severity. We then used gradient boosting with stability selection to determine variables correlating with a chronic obstructive pulmonary disease diagnosis of highest severity and subsequently model the temporal progression of the disease using the selected variables. RESULTS: We identified a network of 20 diagnoses (e.g. respiratory failure), medications (e.g. anticholinergic drugs) and procedures associated with a subsequent chronic obstructive pulmonary disease diagnosis of highest severity. Furthermore, the network successfully captured temporal patterns, such as disease progressions from lower to higher severity grades. DISCUSSION: The temporal trajectories identified by our data-driven approach are compatible with existing knowledge about chronic obstructive pulmonary disease showing that the method can reliably select relevant variables in a high-dimensional context. CONCLUSION: We provide a generalizable approach for the automatic detection of disease trajectories in claims data. This could help to diagnose diseases early, identify unknown risk factors and optimize treatment plans.
Assuntos
Doença Pulmonar Obstrutiva Crônica , Bases de Dados Factuais , Humanos , Seguro Saúde , Fatores de Risco , Índice de Gravidade de DoençaRESUMO
CLIP-seq methods allow the generation of genome-wide maps of RNA binding protein - RNA interaction sites. However, due to differences between different CLIP-seq assays, existing computational approaches to analyze the data can only be applied to a subset of assays. Here, we present a probabilistic model called omniCLIP that can detect regulatory elements in RNAs from data of all CLIP-seq assays. omniCLIP jointly models data across replicates and can integrate background information. Therefore, omniCLIP greatly simplifies the data analysis, increases the reliability of results and paves the way for integrative studies based on data from different assays.