Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 87
Filter
1.
Lancet Digit Health ; 6(6): e396-e406, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38789140

ABSTRACT

BACKGROUND: Health care is experiencing a drive towards digitisation, and many countries are implementing national health data resources. Although a range of cancer risk models exists, the utility on a population level for risk stratification across cancer types has not been fully explored. We aimed to close this gap by evaluating pan-cancer risk models built on electronic health records across the Danish population with validation in the UK Biobank. METHODS: In this retrospective modelling and validation study, data for model development and internal validation were derived from the following Danish health registries: the Central Person Registry, the Danish National Patient Registry, the death registry, the cancer registry, and full-text medical records from secondary care records in the capital region. The development data included adults aged 16-86 years without previous malignant cancers in the time period from Jan 1, 1995, to Dec 31, 2014. The internal validation period was from Jan 1, 2015, to April 10, 2018, and the data included all adults without a previous indication of cancer aged 16-75 years on Dec 31, 2014. The external validation cohort from the UK Biobank included all adults without a previous indication of cancer aged 50-75 years. We used time-dependent Bayesian Cox hazard models built on the combined medical history of Danish individuals. A set of 1392 covariates from available clinical disease trajectories, text-mined basic health factors, and family histories were used to train predictive models of 20 major cancer types. The models were validated on cancer incidence between 2015 and 2018 across Denmark and on individuals in the UK Biobank. The primary outcomes were discrimination and calibration performance. FINDINGS: From the Danish registries, we included 6 732 553 individuals covering 60 million hospital visits, 90 million diagnoses, and a total of 193 million life-years between Jan 1, 1978, and April 10, 2018. Danish registry data covering the period from Jan 1, 2015, to April 10, 2018, were used to internally validate risk models, containing a total of 4 248 491 individuals who remained at risk of a primary malignant cancer diagnosis and 67 401 cancer cases recorded. For the external validation, we evaluated the same time period in the UK Biobank covering 377 004 individuals with 11 486 cancer cases. The predictive performance of the models on Danish data showed good discrimination (concordance index 0·81 [SD 0·08], ranging from 0·66 [95% CI 0·65-0·67] for cervix uteri cancer to 0·91 [0·90-0·92] for liver cancer). Performance was similar on the UK Biobank in a direct transfer when controlling for shifts in the age distribution (concordance index 0·66 [SD 0·08], ranging from 0·55 [95% CI 0·44-0·66] for cervix uteri cancer to 0·78 [0·77-0·79] for lung cancer). Cancer risks were associated, in addition to heritable components, with a broad range of preceding diagnoses and health factors. The best overall performance was seen for cancers of the digestive system (oesophageal, stomach, colorectal, liver, and pancreatic) but also thyroid, kidney, and uterine cancers. INTERPRETATION: Data available in national electronic health databases can be used to approximate cancer risk factors and enable risk predictions in most cancer types. Model predictions generalise between the Danish and UK health-care systems. With the emergence of multi-cancer early detection tests, electronic health record-based risk models could supplement screening efforts. FUNDING: Novo Nordisk Foundation and the Danish Innovation Foundation.


Subject(s)
Electronic Health Records , Neoplasms , Humans , Middle Aged , Aged , Adult , Denmark/epidemiology , Female , Retrospective Studies , Male , Neoplasms/epidemiology , Adolescent , Risk Assessment/methods , Young Adult , Aged, 80 and over , United Kingdom/epidemiology , Registries , Bayes Theorem , Proportional Hazards Models , Risk Factors
2.
J Eur Acad Dermatol Venereol ; 36(12): 2504-2511, 2022 Dec.
Article in English | MEDLINE | ID: mdl-35735049

ABSTRACT

BACKGROUND: Research on hyperhidrosis comorbidities has documented the co-occurrence of diseases but has not provided information about temporal disease associations. OBJECTIVE: To investigate the temporal disease trajectories of individuals with hospital-diagnosed hyperhidrosis. METHODS: This is a hospital-based nationwide cohort study including all patients with a hospital contact in Denmark between 1994 and 2018. International Classification of Diseases version-10 diagnoses assigned to inpatients, outpatients and emergency department patients were collected from the Danish National Patient Register. The main outcome was the temporal disease associations occurring in individuals with hyperhidrosis, which was assessed by identifying morbidities significantly associated with hyperhidrosis and then examining whether there was a significant order of these diagnoses using binomial tests. RESULTS: Overall, 7 191 519 patients were included. Of these, 8758 (0.12%) patients had localized hyperhidrosis (5674 female sex [64.8%]; median age at first diagnosis 26.9 [interquartile range 21.3-36.1]) and 1102 (0.015%) generalized hyperhidrosis (606 female sex [59.9%]; median age at first diagnosis 40.9 [interquartile range 26.4-60.7]). The disease trajectories comprised pain complaints, stress, epilepsy, respiratory and psychiatric diseases. The most diagnosed morbidities for localized hyperhidrosis were abdominal pain (relative risk [RR] = 121.75; 95% Confidence Interval [CI] 121.14-122.35; P < 0.001), soft tissue disorders (RR = 151.19; 95% CI 149.58-152.80; P < 0.001) and dorsalgia (RR = 160.15; 95% CI 158.92-161.38; P < 0.001). The most diagnosed morbidities for generalized hyperhidrosis were dorsalgia (RR = 306.59; 95% CI 302.17-311.02; P < 0.001), angina pectoris (RR = 411.69; 95% CI 402.23-421.16; P < 0.001) and depression (RR = 207.92; 95% CI 202.21-213.62; P < 0.001). All these morbidities were diagnosed before hyperhidrosis. CONCLUSIONS: This paper ascertains which hospital-diagnosed morbidities precede hospital-diagnosed hyperhidrosis. As hyperhidrosis mainly is treated in the primary health care sector, the trajectories suggests that these morbidities may lead to a worse disease course of hyperhidrosis that necessitates treatment in hospitals. Treating these morbidities may improve the disease course of hyperhidrosis.


Subject(s)
Hyperhidrosis , Inpatients , Humans , Female , Cohort Studies , Comorbidity , Hyperhidrosis/epidemiology , Hospitals , Denmark/epidemiology
3.
Sci Rep ; 10(1): 13975, 2020 08 18.
Article in English | MEDLINE | ID: mdl-32811969

ABSTRACT

Rheumatoid arthritis (RA) is a chronic inflammatory disease with fluctuating course of progression. Despite substantial improvement in treatments in recent years, treatment response is still not guaranteed. The aim of this study was to identify variation in Disease Activity Score 28 (DAS28) of RA patients in response to Tocilizumab, and to investigate both molecular and clinical factors influencing response. Clinical and biochemical data for 485 RA patients receiving Tocilizumab in combination with methotrexate were extracted from the LITHE phase III clinical study (NCT00106535), and post-hoc analysis conducted. Latent class mixed models were used to identify statistically distinct trajectories of DAS28 after the initiation of treatment. Biomarker measurements were then analysed cross-sectionally and temporally, to characterise patients by serological biomarkers and clinical factors. We identified three distinct trajectories of drug response: class 1 (n = 85, 17.5%), class 2 (n = 338, 69.7%) and class 3 (n = 62, 12.8%). All groups started with high DAS28 on average (DAS28 > 5.1). Class 1 showed the least reduction in DAS28, with significantly more patients seeking escape therapy (p < 0.001). Class 3 showed significantly higher rates of improvement in DAS28, with 58.1% achieving ACR response levels compared to 2.4% in class 1 (p < 0.0001). Biomarkers of inflammation, MMP-3, CRP, C1M, showed greater reduction in class 3 compared to the other classes. Identification of more homogenous patient sub-populations of drug response may allow for more targeted therapeutic treatment regimens and a better understanding of disease aetiology.


Subject(s)
Antibodies, Monoclonal, Humanized/therapeutic use , Arthritis, Rheumatoid/drug therapy , Receptors, Interleukin-6/immunology , Adult , Aged , Antirheumatic Agents/therapeutic use , Biomarkers, Pharmacological/blood , Blood Sedimentation , Disease Progression , Drug Therapy, Combination/methods , Female , Humans , Male , Methotrexate/therapeutic use , Middle Aged , Receptors, Interleukin-6/metabolism , Remission Induction , Severity of Illness Index , Treatment Outcome
4.
Clin Microbiol Infect ; 25(10): 1277-1285, 2019 Oct.
Article in English | MEDLINE | ID: mdl-31059795

ABSTRACT

OBJECTIVES: Sample preparation for high-throughput sequencing (HTS) includes treatment with various laboratory components, potentially carrying viral nucleic acids, the extent of which has not been thoroughly investigated. Our aim was to systematically examine a diverse repertoire of laboratory components used to prepare samples for HTS in order to identify contaminating viral sequences. METHODS: A total of 322 samples of mainly human origin were analysed using eight protocols, applying a wide variety of laboratory components. Several samples (60% of human specimens) were processed using different protocols. In total, 712 sequencing libraries were investigated for viral sequence contamination. RESULTS: Among sequences showing similarity to viruses, 493 were significantly associated with the use of laboratory components. Each of these viral sequences had sporadic appearance, only being identified in a subset of the samples treated with the linked laboratory component, and some were not identified in the non-template control samples. Remarkably, more than 65% of all viral sequences identified were within viral clusters linked to the use of laboratory components. CONCLUSIONS: We show that high prevalence of contaminating viral sequences can be expected in HTS-based virome data and provide an extensive list of novel contaminating viral sequences that can be used for evaluation of viral findings in future virome and metagenome studies. Moreover, we show that detection can be problematic due to stochastic appearance and limited non-template controls. Although the exact origin of these viral sequences requires further research, our results support laboratory-component-linked viral sequence contamination of both biological and synthetic origin.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Metagenomics/methods , Specimen Handling/methods , Viruses/isolation & purification , Humans , Viruses/genetics
5.
Leukemia ; 29(2): 297-303, 2015 Feb.
Article in English | MEDLINE | ID: mdl-24990611

ABSTRACT

Childhood acute lymphoblastic leukemia survival approaches 90%. New strategies are needed to identify the 10-15% who evade cure. We applied targeted, sequencing-based genotyping of 25 000 to 34 000 preselected potentially clinically relevant single-nucleotide polymorphisms (SNPs) to identify host genome profiles associated with relapse risk in 352 patients from the Nordic ALL92/2000 protocols and 426 patients from the German Berlin-Frankfurt-Munster (BFM) ALL2000 protocol. Patients were enrolled between 1992 and 2008 (median follow-up: 7.6 years). Eleven cross-validated SNPs were significantly associated with risk of relapse across protocols. SNP and biologic pathway level analyses associated relapse risk with leukemia aggressiveness, glucocorticosteroid pharmacology/response and drug transport/metabolism pathways. Classification and regression tree analysis identified three distinct risk groups defined by end of induction residual leukemia, white blood cell count and variants in myeloperoxidase (MPO), estrogen receptor 1 (ESR1), lamin B1 (LMNB1) and matrix metalloproteinase-7 (MMP7) genes, ATP-binding cassette transporters and glucocorticosteroid transcription regulation pathways. Relapse rates ranged from 4% (95% confidence interval (CI): 1.6-6.3%) for the best group (72% of patients) to 76% (95% CI: 41-90%) for the worst group (5% of patients, P<0.001). Validation of these findings and similar approaches to identify SNPs associated with toxicities may allow future individualized relapse and toxicity risk-based treatments adaptation.


Subject(s)
Neoplasm Recurrence, Local/diagnosis , Neoplasm Recurrence, Local/genetics , Polymorphism, Genetic , Precursor Cell Lymphoblastic Leukemia-Lymphoma/diagnosis , Precursor Cell Lymphoblastic Leukemia-Lymphoma/genetics , Adolescent , Child , Child, Preschool , Denmark , Female , Genome, Human , Genomics , Genotype , Germany , Humans , Infant , Male , Neoplasm, Residual/genetics , Polymorphism, Single Nucleotide , Risk Factors , Treatment Outcome
6.
J Biomed Inform ; 47: 160-70, 2014 Feb.
Article in English | MEDLINE | ID: mdl-24513869

ABSTRACT

We describe a new method for identification of confident associations within large clinical data sets. The method is a hybrid of two existing methods; Self-Organizing Maps and Association Mining. We utilize Self-Organizing Maps as the initial step to reduce the search space, and then apply Association Mining in order to find association rules. We demonstrate that this procedure has a number of advantages compared to traditional Association Mining; it allows for handling numerical variables without a priori binning and is able to generate variable groups which act as "hotspots" for statistically significant associations. We showcase the method on infertility-related data from Danish military conscripts. The clinical data we analyzed contained both categorical type questionnaire data and continuous variables generated from biological measurements, including missing values. From this data set, we successfully generated a number of interesting association rules, which relate an observation with a specific consequence and the p-value for that finding. Additionally, we demonstrate that the method can be used on non-clinical data containing chemical-disease associations in order to find associations between different phenotypes, such as prostate cancer and breast cancer.


Subject(s)
Biological Specimen Banks , Data Mining/methods , Information Storage and Retrieval , Algorithms , Breast Neoplasms/epidemiology , Denmark , Female , Humans , Infertility, Male/epidemiology , Male , Phenotype , Prostatic Neoplasms/epidemiology , Surveys and Questionnaires , Toxicogenetics
8.
Int J Androl ; 35(3): 294-302, 2012 Jun.
Article in English | MEDLINE | ID: mdl-22519522

ABSTRACT

During the past four decades, there has been an increase in the incidence rate of male reproductive disorders in some, but not all, Western countries. The observed increase in the prevalence of male reproductive disorders is suspected to be ascribable to environmental factors as the increase has been too rapid to be explained by genetics alone. To study the association between complex chemical exposures of humans and congenital cryptorchidism, the most common malformation of the male genitalia, we measured 121 environmental chemicals with suspected or known endocrine disrupting properties in 130 breast milk samples from Danish and Finnish mothers. Half the newborns were healthy controls, whereas the other half was boys with congenital cryptorchidism. The measured chemicals included polychlorinated biphenyls (PCBs), polybrominated diphenyl-ethers, dioxins (OCDD/PCDFs), phthalates, polybrominated biphenyls and organochlorine pesticides. Computational analysis of the data was performed using logistic regression and three multivariate machine learning classifiers. Furthermore, we performed systems biology analysis to explore the chemical influence on a molecular level. After correction for multiple testing, exposure to nine chemicals was significantly different between the cases and controls in the Danish cohort, but not in the Finnish cohort. The multivariate analysis indicated that Danish samples exhibited a stronger correlation between chemical exposure patterns in breast milk and cryptorchidism than Finnish samples. Moreover, PCBs were indicated as having a protective effect within the Danish cohort, which was supported by molecular data recovered through systems biology. Our results lend further support to the hypothesis that the mixture of environmental chemicals may contribute to observed adverse trends in male reproductive health.


Subject(s)
Cryptorchidism/epidemiology , Milk, Human/chemistry , Artificial Intelligence , Denmark/epidemiology , Dioxins/analysis , Environmental Pollutants/analysis , Female , Finland/epidemiology , Halogenated Diphenyl Ethers/analysis , Humans , Logistic Models , Male , Polychlorinated Biphenyls/analysis , Systems Biology
9.
Int J Androl ; 34(4 Pt 2): e122-32, 2011 Aug.
Article in English | MEDLINE | ID: mdl-21696394

ABSTRACT

To search for disease-related copy number variations (CNVs) in families with a high frequency of germ cell tumours (GCT), we analysed 16 individuals from four families by array comparative genomic hybridization (aCGH) and applied an integrative systems biology algorithm that prioritizes risk-associated genes among loci targeted by CNVs. The top-ranked candidate, RLN1, encoding a Relaxin-H1 peptide, although only detected in one of the families, was selected for further investigations. Validation of the CNV at the RLN1 locus was performed as an association study using qPCR with 106 sporadic testicular GCT patients and 200 healthy controls. Observed CNV frequencies of 1.9% among cases and 1.5% amongst controls were not significantly different and this was further confirmed by CNV data extracted from a genome-wide analysis of 189 cases and 380 controls, where similar frequencies of 2.2% were observed in both groups (p=1). Immunohistochemistry for Relaxin-H1 (RLN1), Relaxin-H2 (RLN2) and their cognate receptor, RXFP1, detected one, and in some cases both, of the relaxins in Leydig cells, Sertoli cells and a subset of neoplastic germ cells, whereas the receptor was present in Leydig cells and spermatids. Collectively, the findings show that a heterozygous loss at the RLN1 locus is not a genetic factor mediating high population-wide risk for testicular germ cell tumour, but do not exclude a contribution of this aberration in some cases of cancer. The preliminary expression data suggest a possible role of the relaxin peptides in spermatogenesis and warrant further studies.


Subject(s)
DNA Copy Number Variations , Neoplasms, Germ Cell and Embryonal/genetics , Relaxin/genetics , Sequence Deletion , Testicular Neoplasms/genetics , Adolescent , Adult , Base Sequence , Comparative Genomic Hybridization , Family , Genetic Variation , Genome-Wide Association Study , Humans , Male , Middle Aged , Polymerase Chain Reaction , Receptors, G-Protein-Coupled/genetics , Receptors, Peptide/genetics
10.
Leukemia ; 25(6): 1001-6, 2011 Jun.
Article in English | MEDLINE | ID: mdl-21415851

ABSTRACT

Genetic variants, including single-nucleotide polymorphisms (SNPs), are key determiners of interindividual differences in treatment efficacy and toxicity in childhood acute lymphoblastic leukemia (ALL). Although up to 13 chemotherapeutic agents are used in the treatment of this cancer, it remains a model disease for exploring the impact of genetic variation due to well-characterized cytogenetics, drug response pathways and precise monitoring of minimal residual disease. Here, we have selected clinically relevant genes and SNPs through literature screening, and on the basis of associations with key pathways, protein-protein interactions or downstream partners that have a role in drug disposition and treatment efficacy in childhood ALL. This allows exploration of pathways, where one of several genetic variants may lead to similar clinical phenotypes through related molecular mechanisms. We have designed a cost-effective, high-throughput capture assay of ∼25,000 clinically relevant SNPs, and demonstrated that multiple samples can be tagged and pooled before genome capture in targeted enrichment with a sufficient sequencing depth for genotyping. This multiplexed, targeted sequencing method allows exploration of the impact of pharmacogenetics on efficacy and toxicity in childhood ALL treatment, which will be of importance for personalized chemotherapy.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Polymorphism, Single Nucleotide , Precursor Cell Lymphoblastic Leukemia-Lymphoma/genetics , Adolescent , Child, Preschool , Cost-Benefit Analysis , Genotype , High-Throughput Nucleotide Sequencing/economics , Humans , Infant , Infant, Newborn , Pharmacogenetics , Phenotype , Precursor Cell Lymphoblastic Leukemia-Lymphoma/epidemiology , Treatment Outcome
11.
Int J Androl ; 33(2): 270-8, 2010 Apr.
Article in English | MEDLINE | ID: mdl-19780864

ABSTRACT

Recent reports have confirmed a worldwide increasing trend of testicular cancer incidence, and a conspicuously high prevalence of this disease and other male reproductive disorders, including cryptorchidism and hypospadias, in Denmark. In contrast, Finland, a similarly industrialized Nordic country, exhibits much lower incidences of these disorders. The reasons behind the observed trends are unexplained, but environmental endocrine disrupting chemicals (EDCs) that affect foetal testis development are probably involved. Levels of persistent chemicals in breast milk can be considered a proxy for exposure of the foetus to such agents. Therefore, we undertook a comprehensive ecological study of 121 EDCs, including the persistent compounds dioxins, polychlorinated biphenyls (PCBs), pesticides and flame retardants, and non-persistent phthalates, in 68 breast milk samples from Denmark and Finland to compare exposure of mothers to this environmental mixture of EDCs. Using sophisticated, bioinformatic tools in our analysis, we reveal, for the first time, distinct country-specific chemical signatures of EDCs with Danes having generally higher exposure than Finns to persistent bioaccumulative chemicals, whereas there was no country-specific pattern with regard to the non-persistent phthalates. Importantly, EDC levels, including some dioxins, PCBs and some pesticides (hexachlorobenzene and dieldrin) were significantly higher in Denmark than in Finland. As these classes of EDCs have been implicated in testicular cancer or in adversely affecting development of the foetal testis in humans and animals, our findings reinforce the view that environmental exposure to EDCs may explain some of the temporal and between-country differences in incidence of male reproductive disorders.


Subject(s)
Dioxins/analysis , Endocrine Disruptors/analysis , Environmental Exposure , Environmental Pollutants/analysis , Hydrocarbons, Chlorinated/analysis , Maternal Exposure , Milk, Human/chemistry , Polychlorinated Biphenyls/analysis , Denmark , Dieldrin/analysis , Dioxins/toxicity , Environmental Pollutants/toxicity , Female , Finland , Flame Retardants/analysis , Hexachlorobenzene/analysis , Humans , Hydrocarbons, Chlorinated/toxicity , Male , Pesticides/analysis , Testicular Neoplasms/chemically induced , Testis/drug effects , Testis/embryology
12.
Clin Pharmacol Ther ; 86(2): 183-9, 2009 Aug.
Article in English | MEDLINE | ID: mdl-19369935

ABSTRACT

A critical task in pharmacogenomics is identifying genes that may be important modulators of drug response. High-throughput experimental methods are often plagued by false positives and do not take advantage of existing knowledge. Candidate gene lists can usefully summarize existing knowledge, but they are expensive to generate manually and may therefore have incomplete coverage. We have developed a method that ranks 12,460 genes in the human genome on the basis of their potential relevance to a specific query drug and its putative indications. Our method uses known gene-drug interactions, networks of gene-gene interactions, and available measures of drug-drug similarity. It ranks genes by building a local network of known interactions and assessing the similarity of the query drug (by both structure and indication) with drugs that interact with gene products in the local network. In a comprehensive benchmark, our method achieves an overall area under the curve of 0.82. To showcase our method, we found novel gene candidates for warfarin, gefitinib, carboplatin, and gemcitabine, and we provide the molecular hypotheses for these predictions.


Subject(s)
Gene Expression Regulation/drug effects , Genes/drug effects , Genome, Human/genetics , Pharmacogenetics , Anticoagulants/pharmacology , Antineoplastic Agents/pharmacology , Area Under Curve , Carboplatin/pharmacology , Deoxycytidine/analogs & derivatives , Deoxycytidine/pharmacology , Gefitinib , Humans , Oligonucleotide Array Sequence Analysis , Quinazolines/pharmacology , Warfarin/pharmacology , Gemcitabine
13.
Diabetes Obes Metab ; 11 Suppl 1: 60-6, 2009 Feb.
Article in English | MEDLINE | ID: mdl-19143816

ABSTRACT

AIM: To develop novel methods for identifying new genes that contribute to the risk of developing type 1 diabetes within the Major Histocompatibility Complex (MHC) region on chromosome 6, independently of the known linkage disequilibrium (LD) between human leucocyte antigen (HLA)-DRB1, -DQA1, -DQB1 genes. METHODS: We have developed a novel method that combines single nucleotide polymorphism (SNP) genotyping data with protein-protein interaction (ppi) networks to identify disease-associated network modules enriched for proteins encoded from the MHC region. Approximately 2500 SNPs located in the 4 Mb MHC region were analysed in 1000 affected offspring trios generated by the Type 1 Diabetes Genetics Consortium (T1DGC). The most associated SNP in each gene was chosen and genes were mapped to ppi networks for identification of interaction partners. The association testing and resulting interacting protein modules were statistically evaluated using permutation. RESULTS: A total of 151 genes could be mapped to nodes within the protein interaction network and their interaction partners were identified. Five protein interaction modules reached statistical significance using this approach. The identified proteins are well known in the pathogenesis of T1D, but the modules also contain additional candidates that have been implicated in beta-cell development and diabetic complications. CONCLUSIONS: The extensive LD within the MHC region makes it important to develop new methods for analysing genotyping data for identification of additional risk genes for T1D. Combining genetic data with knowledge about functional pathways provides new insight into mechanisms underlying T1D.


Subject(s)
Diabetes Mellitus, Type 1/genetics , Genetic Predisposition to Disease/genetics , HLA Antigens/genetics , Major Histocompatibility Complex/genetics , Polymorphism, Single Nucleotide/genetics , Apolipoproteins/genetics , Apolipoproteins M , CD4 Antigens/genetics , Calcium-Binding Proteins , Chromosomes, Human, Pair 6/genetics , DNA-Binding Proteins/genetics , Genotype , HMGB1 Protein/genetics , Humans , Lipocalins , Microfilament Proteins , Protein Interaction Mapping , Receptor for Advanced Glycation End Products , Receptors, Immunologic/genetics
14.
Tissue Antigens ; 63(5): 395-400, 2004 May.
Article in English | MEDLINE | ID: mdl-15104671

ABSTRACT

An effective Severe Acute Respiratory Syndrome (SARS) vaccine is likely to include components that can induce specific cytotoxic T-lymphocyte (CTL) responses. The specificities of such responses are governed by human leukocyte antigen (HLA)-restricted presentation of SARS-derived peptide epitopes. Exact knowledge of how the immune system handles protein antigens would allow for the identification of such linear sequences directly from genomic/proteomic sequence information (Lauemoller et al., Rev Immunogenet 2001: 2: 477-91). The latter was recently established when a causative coronavirus (SARS-CoV) was isolated and full-length sequenced (Marra et al., Science 2003: 300: 1399-404). Here, we have combined advanced bioinformatics and high-throughput immunology to perform an HLA supertype-, genome-wide scan for SARS-specific CTL epitopes. The scan includes all nine human HLA supertypes in total covering >99% of all individuals of all major human populations (Sette & Sidney, Immunogenetics 1999: 50: 201-12). For each HLA supertype, we have selected the 15 top candidates for test in biochemical binding assays. At this time (approximately 6 months after the genome was established), we have tested the majority of the HLA supertypes and identified almost 100 potential vaccine candidates. These should be further validated in SARS survivors and used for vaccine formulation. We suggest that immunobioinformatics may become a fast and valuable tool in rational vaccine design.


Subject(s)
HLA Antigens/immunology , Severe Acute Respiratory Syndrome/therapy , Severe acute respiratory syndrome-related coronavirus/immunology , Viral Vaccines/immunology , Antigen Presentation , Computational Biology , Epitopes, T-Lymphocyte/immunology , Genome, Viral , HLA-A Antigens/immunology , HLA-A3 Antigen/immunology , Humans , Neural Networks, Computer , Peptides/immunology , Protein Binding , Severe acute respiratory syndrome-related coronavirus/genetics , Severe acute respiratory syndrome-related coronavirus/isolation & purification , Severe Acute Respiratory Syndrome/immunology
15.
Nucleic Acids Res ; 32(3): 1131-42, 2004.
Article in English | MEDLINE | ID: mdl-14960723

ABSTRACT

Prediction of splice sites in non-coding regions of genes is one of the most challenging aspects of gene structure recognition. We perform a rigorous analysis of such splice sites embedded in human 5' untranslated regions (UTRs), and investigate correlations between this class of splice sites and other features found in the adjacent exons and introns. By restricting the training of neural network algorithms to 'pure' UTRs (not extending partially into protein coding regions), we for the first time investigate the predictive power of the splicing signal proper, in contrast to conventional splice site prediction, which typically relies on the change in sequence at the transition from protein coding to non-coding. By doing so, the algorithms were able to pick up subtler splicing signals that were otherwise masked by 'coding' noise, thus enhancing significantly the prediction of 5' UTR splice sites. For example, the non-coding splice site predicting networks pick up compositional and positional bias in the 3' ends of non-coding exons and 5' non-coding intron ends, where cytosine and guanine are over-represented. This compositional bias at the true UTR donor sites is also visible in the synaptic weights of the neural networks trained to identify UTR donor sites. Conventional splice site prediction methods perform poorly in UTRs because the reading frame pattern is absent. The NetUTR method presented here performs 2-3-fold better compared with NetGene2 and GenScan in 5' UTRs. We also tested the 5' UTR trained method on protein coding regions, and discovered, surprisingly, that it works quite well (although it cannot compete with NetGene2). This indicates that the local splicing pattern in UTRs and coding regions is largely the same. The NetUTR method is made publicly available at www.cbs.dtu.dk/services/NetUTR.


Subject(s)
5' Untranslated Regions/chemistry , Neural Networks, Computer , RNA Precursors/chemistry , RNA Splice Sites , Sequence Analysis, RNA/methods , 5' Untranslated Regions/metabolism , Exons , Humans , Introns , Molecular Sequence Data , Nucleotides/analysis , Protein Biosynthesis , RNA Precursors/metabolism
16.
Tissue Antigens ; 62(5): 378-84, 2003 Nov.
Article in English | MEDLINE | ID: mdl-14617044

ABSTRACT

We have generated Artificial Neural Networks (ANN) capable of performing sensitive, quantitative predictions of peptide binding to the MHC class I molecule, HLA-A*0204. We have shown that such quantitative ANN are superior to conventional classification ANN, that have been trained to predict binding vs non-binding peptides. Furthermore, quantitative ANN allowed a straightforward application of a 'Query by Committee' (QBC) principle whereby particularly information-rich peptides could be identified and subsequently tested experimentally. Iterative training based on QBC-selected peptides considerably increased the sensitivity without compromising the efficiency of the prediction. This suggests a general, rational and unbiased approach to the development of high quality predictions of epitopes restricted to this and other HLA molecules. Due to their quantitative nature, such predictions will cover a wide range of MHC-binding affinities of immunological interest, and they can be readily integrated with predictions of other events involved in generating immunogenic epitopes. These predictions have the capacity to perform rapid proteome-wide searches for epitopes. Finally, it is an example of an iterative feedback loop whereby advanced, computational bioinformatics optimize experimental strategy, and vice versa.


Subject(s)
HLA-A Antigens/immunology , Neural Networks, Computer , Peptides/metabolism , HLA-A Antigens/metabolism , Humans , Protein Binding , Proteome/metabolism
17.
Bioinformatics ; 19(5): 635-42, 2003 Mar 22.
Article in English | MEDLINE | ID: mdl-12651722

ABSTRACT

MOTIVATION: The human genome project has led to the discovery of many human protein coding genes which were previously unknown. As a large fraction of these are functionally uncharacterized, it is of interest to develop methods for predicting their molecular function from sequence. RESULTS: We have developed a method for prediction of protein function for a subset of classes from the Gene Ontology classification scheme. This subset includes several pharmaceutically interesting categories-transcription factors, receptors, ion channels, stress and immune response proteins, hormones and growth factors can all be predicted. Although the method relies on protein sequences as the sole input, it does not rely on sequence similarity, but instead on sequence derived protein features such as predicted post translational modifications (PTMs), protein sorting signals and physical/chemical properties calculated from the amino acid composition. This allows for prediction of the function for orphan proteins where no homologs can be found. Using this method we propose two novel receptors in the human genome, and further demonstrate chromosomal clustering of related proteins.


Subject(s)
Algorithms , Database Management Systems , Neural Networks, Computer , Proteins/chemistry , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Databases, Protein , Gene Expression Profiling/methods , Humans , Information Storage and Retrieval/methods , Pattern Recognition, Automated , Proteins/classification , Proteins/genetics , Sequence Homology , Structure-Activity Relationship
19.
J Mol Biol ; 319(5): 1257-65, 2002 Jun 21.
Article in English | MEDLINE | ID: mdl-12079362

ABSTRACT

We have developed an entirely sequence-based method that identifies and integrates relevant features that can be used to assign proteins of unknown function to functional classes, and enzyme categories for enzymes. We show that strategies for the elucidation of protein function may benefit from a number of functional attributes that are more directly related to the linear sequence of amino acids, and hence easier to predict, than protein structure. These attributes include features associated with post-translational modifications and protein sorting, but also much simpler aspects such as the length, isoelectric point and composition of the polypeptide chain.


Subject(s)
Computational Biology/methods , Protein Processing, Post-Translational , Protein Sorting Signals , Proteins/chemistry , Proteins/classification , Databases, Protein , Enzymes/chemistry , Enzymes/classification , Enzymes/metabolism , Genome, Human , Glycosylation , Humans , Isoelectric Point , Linguistics , Neural Networks, Computer , Phosphorylation , Physical Chromosome Mapping , Protein Binding , Protein Transport , Proteins/metabolism , Software
20.
FEBS Lett ; 507(1): 6-10, 2001 Oct 19.
Article in English | MEDLINE | ID: mdl-11682049

ABSTRACT

In the last decade, the prediction of protein secondary structure has been optimized using essentially one and the same assignment scheme known as DSSP. We present here a different scheme, which is more predictable. This scheme predicts directly the hydrogen bonds, which stabilize the secondary structures. Single sequence prediction of the new three category assignment gives an overall prediction improvement of 3.1% and 5.1% compared to the DSSP assignment and schemes where the helix category consists of alpha-helix and 3(10)-helix, respectively. These results were achieved using a standard feed-forward neural network with one hidden layer on a data set identical to the one used in earlier work.


Subject(s)
Proteins/chemistry , Algorithms , Hydrogen Bonding , Models, Chemical , Models, Molecular , Neural Networks, Computer , Protein Structure, Secondary
SELECTION OF CITATIONS
SEARCH DETAIL