Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 87
Filtrar
1.
J Eur Acad Dermatol Venereol ; 36(12): 2504-2511, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-35735049

RESUMO

BACKGROUND: Research on hyperhidrosis comorbidities has documented the co-occurrence of diseases but has not provided information about temporal disease associations. OBJECTIVE: To investigate the temporal disease trajectories of individuals with hospital-diagnosed hyperhidrosis. METHODS: This is a hospital-based nationwide cohort study including all patients with a hospital contact in Denmark between 1994 and 2018. International Classification of Diseases version-10 diagnoses assigned to inpatients, outpatients and emergency department patients were collected from the Danish National Patient Register. The main outcome was the temporal disease associations occurring in individuals with hyperhidrosis, which was assessed by identifying morbidities significantly associated with hyperhidrosis and then examining whether there was a significant order of these diagnoses using binomial tests. RESULTS: Overall, 7 191 519 patients were included. Of these, 8758 (0.12%) patients had localized hyperhidrosis (5674 female sex [64.8%]; median age at first diagnosis 26.9 [interquartile range 21.3-36.1]) and 1102 (0.015%) generalized hyperhidrosis (606 female sex [59.9%]; median age at first diagnosis 40.9 [interquartile range 26.4-60.7]). The disease trajectories comprised pain complaints, stress, epilepsy, respiratory and psychiatric diseases. The most diagnosed morbidities for localized hyperhidrosis were abdominal pain (relative risk [RR] = 121.75; 95% Confidence Interval [CI] 121.14-122.35; P < 0.001), soft tissue disorders (RR = 151.19; 95% CI 149.58-152.80; P < 0.001) and dorsalgia (RR = 160.15; 95% CI 158.92-161.38; P < 0.001). The most diagnosed morbidities for generalized hyperhidrosis were dorsalgia (RR = 306.59; 95% CI 302.17-311.02; P < 0.001), angina pectoris (RR = 411.69; 95% CI 402.23-421.16; P < 0.001) and depression (RR = 207.92; 95% CI 202.21-213.62; P < 0.001). All these morbidities were diagnosed before hyperhidrosis. CONCLUSIONS: This paper ascertains which hospital-diagnosed morbidities precede hospital-diagnosed hyperhidrosis. As hyperhidrosis mainly is treated in the primary health care sector, the trajectories suggests that these morbidities may lead to a worse disease course of hyperhidrosis that necessitates treatment in hospitals. Treating these morbidities may improve the disease course of hyperhidrosis.


Assuntos
Hiperidrose , Pacientes Internados , Humanos , Feminino , Estudos de Coortes , Comorbidade , Hiperidrose/epidemiologia , Hospitais , Dinamarca/epidemiologia
2.
J Biomed Inform ; 47: 160-70, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24513869

RESUMO

We describe a new method for identification of confident associations within large clinical data sets. The method is a hybrid of two existing methods; Self-Organizing Maps and Association Mining. We utilize Self-Organizing Maps as the initial step to reduce the search space, and then apply Association Mining in order to find association rules. We demonstrate that this procedure has a number of advantages compared to traditional Association Mining; it allows for handling numerical variables without a priori binning and is able to generate variable groups which act as "hotspots" for statistically significant associations. We showcase the method on infertility-related data from Danish military conscripts. The clinical data we analyzed contained both categorical type questionnaire data and continuous variables generated from biological measurements, including missing values. From this data set, we successfully generated a number of interesting association rules, which relate an observation with a specific consequence and the p-value for that finding. Additionally, we demonstrate that the method can be used on non-clinical data containing chemical-disease associations in order to find associations between different phenotypes, such as prostate cancer and breast cancer.


Assuntos
Bancos de Espécimes Biológicos , Mineração de Dados/métodos , Armazenamento e Recuperação da Informação , Algoritmos , Neoplasias da Mama/epidemiologia , Dinamarca , Feminino , Humanos , Infertilidade Masculina/epidemiologia , Masculino , Fenótipo , Neoplasias da Próstata/epidemiologia , Inquéritos e Questionários , Toxicogenética
3.
Lancet Digit Health ; 6(6): e396-e406, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38789140

RESUMO

BACKGROUND: Health care is experiencing a drive towards digitisation, and many countries are implementing national health data resources. Although a range of cancer risk models exists, the utility on a population level for risk stratification across cancer types has not been fully explored. We aimed to close this gap by evaluating pan-cancer risk models built on electronic health records across the Danish population with validation in the UK Biobank. METHODS: In this retrospective modelling and validation study, data for model development and internal validation were derived from the following Danish health registries: the Central Person Registry, the Danish National Patient Registry, the death registry, the cancer registry, and full-text medical records from secondary care records in the capital region. The development data included adults aged 16-86 years without previous malignant cancers in the time period from Jan 1, 1995, to Dec 31, 2014. The internal validation period was from Jan 1, 2015, to April 10, 2018, and the data included all adults without a previous indication of cancer aged 16-75 years on Dec 31, 2014. The external validation cohort from the UK Biobank included all adults without a previous indication of cancer aged 50-75 years. We used time-dependent Bayesian Cox hazard models built on the combined medical history of Danish individuals. A set of 1392 covariates from available clinical disease trajectories, text-mined basic health factors, and family histories were used to train predictive models of 20 major cancer types. The models were validated on cancer incidence between 2015 and 2018 across Denmark and on individuals in the UK Biobank. The primary outcomes were discrimination and calibration performance. FINDINGS: From the Danish registries, we included 6 732 553 individuals covering 60 million hospital visits, 90 million diagnoses, and a total of 193 million life-years between Jan 1, 1978, and April 10, 2018. Danish registry data covering the period from Jan 1, 2015, to April 10, 2018, were used to internally validate risk models, containing a total of 4 248 491 individuals who remained at risk of a primary malignant cancer diagnosis and 67 401 cancer cases recorded. For the external validation, we evaluated the same time period in the UK Biobank covering 377 004 individuals with 11 486 cancer cases. The predictive performance of the models on Danish data showed good discrimination (concordance index 0·81 [SD 0·08], ranging from 0·66 [95% CI 0·65-0·67] for cervix uteri cancer to 0·91 [0·90-0·92] for liver cancer). Performance was similar on the UK Biobank in a direct transfer when controlling for shifts in the age distribution (concordance index 0·66 [SD 0·08], ranging from 0·55 [95% CI 0·44-0·66] for cervix uteri cancer to 0·78 [0·77-0·79] for lung cancer). Cancer risks were associated, in addition to heritable components, with a broad range of preceding diagnoses and health factors. The best overall performance was seen for cancers of the digestive system (oesophageal, stomach, colorectal, liver, and pancreatic) but also thyroid, kidney, and uterine cancers. INTERPRETATION: Data available in national electronic health databases can be used to approximate cancer risk factors and enable risk predictions in most cancer types. Model predictions generalise between the Danish and UK health-care systems. With the emergence of multi-cancer early detection tests, electronic health record-based risk models could supplement screening efforts. FUNDING: Novo Nordisk Foundation and the Danish Innovation Foundation.


Assuntos
Registros Eletrônicos de Saúde , Neoplasias , Humanos , Pessoa de Meia-Idade , Idoso , Adulto , Dinamarca/epidemiologia , Feminino , Estudos Retrospectivos , Masculino , Neoplasias/epidemiologia , Adolescente , Medição de Risco/métodos , Adulto Jovem , Idoso de 80 Anos ou mais , Reino Unido/epidemiologia , Sistema de Registros , Teorema de Bayes , Modelos de Riscos Proporcionais , Fatores de Risco
4.
Int J Androl ; 35(3): 294-302, 2012 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-22519522

RESUMO

During the past four decades, there has been an increase in the incidence rate of male reproductive disorders in some, but not all, Western countries. The observed increase in the prevalence of male reproductive disorders is suspected to be ascribable to environmental factors as the increase has been too rapid to be explained by genetics alone. To study the association between complex chemical exposures of humans and congenital cryptorchidism, the most common malformation of the male genitalia, we measured 121 environmental chemicals with suspected or known endocrine disrupting properties in 130 breast milk samples from Danish and Finnish mothers. Half the newborns were healthy controls, whereas the other half was boys with congenital cryptorchidism. The measured chemicals included polychlorinated biphenyls (PCBs), polybrominated diphenyl-ethers, dioxins (OCDD/PCDFs), phthalates, polybrominated biphenyls and organochlorine pesticides. Computational analysis of the data was performed using logistic regression and three multivariate machine learning classifiers. Furthermore, we performed systems biology analysis to explore the chemical influence on a molecular level. After correction for multiple testing, exposure to nine chemicals was significantly different between the cases and controls in the Danish cohort, but not in the Finnish cohort. The multivariate analysis indicated that Danish samples exhibited a stronger correlation between chemical exposure patterns in breast milk and cryptorchidism than Finnish samples. Moreover, PCBs were indicated as having a protective effect within the Danish cohort, which was supported by molecular data recovered through systems biology. Our results lend further support to the hypothesis that the mixture of environmental chemicals may contribute to observed adverse trends in male reproductive health.


Assuntos
Criptorquidismo/epidemiologia , Leite Humano/química , Inteligência Artificial , Dinamarca/epidemiologia , Dioxinas/análise , Poluentes Ambientais/análise , Feminino , Finlândia/epidemiologia , Éteres Difenil Halogenados/análise , Humanos , Modelos Logísticos , Masculino , Bifenilos Policlorados/análise , Biologia de Sistemas
5.
Int J Androl ; 34(4 Pt 2): e122-32, 2011 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-21696394

RESUMO

To search for disease-related copy number variations (CNVs) in families with a high frequency of germ cell tumours (GCT), we analysed 16 individuals from four families by array comparative genomic hybridization (aCGH) and applied an integrative systems biology algorithm that prioritizes risk-associated genes among loci targeted by CNVs. The top-ranked candidate, RLN1, encoding a Relaxin-H1 peptide, although only detected in one of the families, was selected for further investigations. Validation of the CNV at the RLN1 locus was performed as an association study using qPCR with 106 sporadic testicular GCT patients and 200 healthy controls. Observed CNV frequencies of 1.9% among cases and 1.5% amongst controls were not significantly different and this was further confirmed by CNV data extracted from a genome-wide analysis of 189 cases and 380 controls, where similar frequencies of 2.2% were observed in both groups (p=1). Immunohistochemistry for Relaxin-H1 (RLN1), Relaxin-H2 (RLN2) and their cognate receptor, RXFP1, detected one, and in some cases both, of the relaxins in Leydig cells, Sertoli cells and a subset of neoplastic germ cells, whereas the receptor was present in Leydig cells and spermatids. Collectively, the findings show that a heterozygous loss at the RLN1 locus is not a genetic factor mediating high population-wide risk for testicular germ cell tumour, but do not exclude a contribution of this aberration in some cases of cancer. The preliminary expression data suggest a possible role of the relaxin peptides in spermatogenesis and warrant further studies.


Assuntos
Variações do Número de Cópias de DNA , Neoplasias Embrionárias de Células Germinativas/genética , Relaxina/genética , Deleção de Sequência , Neoplasias Testiculares/genética , Adolescente , Adulto , Sequência de Bases , Hibridização Genômica Comparativa , Família , Variação Genética , Estudo de Associação Genômica Ampla , Humanos , Masculino , Pessoa de Meia-Idade , Reação em Cadeia da Polimerase , Receptores Acoplados a Proteínas G/genética , Receptores de Peptídeos/genética
6.
Int J Androl ; 33(2): 270-8, 2010 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-19780864

RESUMO

Recent reports have confirmed a worldwide increasing trend of testicular cancer incidence, and a conspicuously high prevalence of this disease and other male reproductive disorders, including cryptorchidism and hypospadias, in Denmark. In contrast, Finland, a similarly industrialized Nordic country, exhibits much lower incidences of these disorders. The reasons behind the observed trends are unexplained, but environmental endocrine disrupting chemicals (EDCs) that affect foetal testis development are probably involved. Levels of persistent chemicals in breast milk can be considered a proxy for exposure of the foetus to such agents. Therefore, we undertook a comprehensive ecological study of 121 EDCs, including the persistent compounds dioxins, polychlorinated biphenyls (PCBs), pesticides and flame retardants, and non-persistent phthalates, in 68 breast milk samples from Denmark and Finland to compare exposure of mothers to this environmental mixture of EDCs. Using sophisticated, bioinformatic tools in our analysis, we reveal, for the first time, distinct country-specific chemical signatures of EDCs with Danes having generally higher exposure than Finns to persistent bioaccumulative chemicals, whereas there was no country-specific pattern with regard to the non-persistent phthalates. Importantly, EDC levels, including some dioxins, PCBs and some pesticides (hexachlorobenzene and dieldrin) were significantly higher in Denmark than in Finland. As these classes of EDCs have been implicated in testicular cancer or in adversely affecting development of the foetal testis in humans and animals, our findings reinforce the view that environmental exposure to EDCs may explain some of the temporal and between-country differences in incidence of male reproductive disorders.


Assuntos
Dioxinas/análise , Disruptores Endócrinos/análise , Exposição Ambiental , Poluentes Ambientais/análise , Hidrocarbonetos Clorados/análise , Exposição Materna , Leite Humano/química , Bifenilos Policlorados/análise , Dinamarca , Dieldrin/análise , Dioxinas/toxicidade , Poluentes Ambientais/toxicidade , Feminino , Finlândia , Retardadores de Chama/análise , Hexaclorobenzeno/análise , Humanos , Hidrocarbonetos Clorados/toxicidade , Masculino , Praguicidas/análise , Neoplasias Testiculares/induzido quimicamente , Testículo/efeitos dos fármacos , Testículo/embriologia
7.
Sci Rep ; 10(1): 13975, 2020 08 18.
Artigo em Inglês | MEDLINE | ID: mdl-32811969

RESUMO

Rheumatoid arthritis (RA) is a chronic inflammatory disease with fluctuating course of progression. Despite substantial improvement in treatments in recent years, treatment response is still not guaranteed. The aim of this study was to identify variation in Disease Activity Score 28 (DAS28) of RA patients in response to Tocilizumab, and to investigate both molecular and clinical factors influencing response. Clinical and biochemical data for 485 RA patients receiving Tocilizumab in combination with methotrexate were extracted from the LITHE phase III clinical study (NCT00106535), and post-hoc analysis conducted. Latent class mixed models were used to identify statistically distinct trajectories of DAS28 after the initiation of treatment. Biomarker measurements were then analysed cross-sectionally and temporally, to characterise patients by serological biomarkers and clinical factors. We identified three distinct trajectories of drug response: class 1 (n = 85, 17.5%), class 2 (n = 338, 69.7%) and class 3 (n = 62, 12.8%). All groups started with high DAS28 on average (DAS28 > 5.1). Class 1 showed the least reduction in DAS28, with significantly more patients seeking escape therapy (p < 0.001). Class 3 showed significantly higher rates of improvement in DAS28, with 58.1% achieving ACR response levels compared to 2.4% in class 1 (p < 0.0001). Biomarkers of inflammation, MMP-3, CRP, C1M, showed greater reduction in class 3 compared to the other classes. Identification of more homogenous patient sub-populations of drug response may allow for more targeted therapeutic treatment regimens and a better understanding of disease aetiology.


Assuntos
Anticorpos Monoclonais Humanizados/uso terapêutico , Artrite Reumatoide/tratamento farmacológico , Receptores de Interleucina-6/imunologia , Adulto , Idoso , Antirreumáticos/uso terapêutico , Biomarcadores Farmacológicos/sangue , Sedimentação Sanguínea , Progressão da Doença , Quimioterapia Combinada/métodos , Feminino , Humanos , Masculino , Metotrexato/uso terapêutico , Pessoa de Meia-Idade , Receptores de Interleucina-6/metabolismo , Indução de Remissão , Índice de Gravidade de Doença , Resultado do Tratamento
8.
Diabetes Obes Metab ; 11 Suppl 1: 60-6, 2009 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-19143816

RESUMO

AIM: To develop novel methods for identifying new genes that contribute to the risk of developing type 1 diabetes within the Major Histocompatibility Complex (MHC) region on chromosome 6, independently of the known linkage disequilibrium (LD) between human leucocyte antigen (HLA)-DRB1, -DQA1, -DQB1 genes. METHODS: We have developed a novel method that combines single nucleotide polymorphism (SNP) genotyping data with protein-protein interaction (ppi) networks to identify disease-associated network modules enriched for proteins encoded from the MHC region. Approximately 2500 SNPs located in the 4 Mb MHC region were analysed in 1000 affected offspring trios generated by the Type 1 Diabetes Genetics Consortium (T1DGC). The most associated SNP in each gene was chosen and genes were mapped to ppi networks for identification of interaction partners. The association testing and resulting interacting protein modules were statistically evaluated using permutation. RESULTS: A total of 151 genes could be mapped to nodes within the protein interaction network and their interaction partners were identified. Five protein interaction modules reached statistical significance using this approach. The identified proteins are well known in the pathogenesis of T1D, but the modules also contain additional candidates that have been implicated in beta-cell development and diabetic complications. CONCLUSIONS: The extensive LD within the MHC region makes it important to develop new methods for analysing genotyping data for identification of additional risk genes for T1D. Combining genetic data with knowledge about functional pathways provides new insight into mechanisms underlying T1D.


Assuntos
Diabetes Mellitus Tipo 1/genética , Predisposição Genética para Doença/genética , Antígenos HLA/genética , Complexo Principal de Histocompatibilidade/genética , Polimorfismo de Nucleotídeo Único/genética , Apolipoproteínas/genética , Apolipoproteínas M , Antígenos CD4/genética , Proteínas de Ligação ao Cálcio , Cromossomos Humanos Par 6/genética , Proteínas de Ligação a DNA/genética , Genótipo , Proteína HMGB1/genética , Humanos , Lipocalinas , Proteínas dos Microfilamentos , Mapeamento de Interação de Proteínas , Receptor para Produtos Finais de Glicação Avançada , Receptores Imunológicos/genética
9.
Clin Microbiol Infect ; 25(10): 1277-1285, 2019 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-31059795

RESUMO

OBJECTIVES: Sample preparation for high-throughput sequencing (HTS) includes treatment with various laboratory components, potentially carrying viral nucleic acids, the extent of which has not been thoroughly investigated. Our aim was to systematically examine a diverse repertoire of laboratory components used to prepare samples for HTS in order to identify contaminating viral sequences. METHODS: A total of 322 samples of mainly human origin were analysed using eight protocols, applying a wide variety of laboratory components. Several samples (60% of human specimens) were processed using different protocols. In total, 712 sequencing libraries were investigated for viral sequence contamination. RESULTS: Among sequences showing similarity to viruses, 493 were significantly associated with the use of laboratory components. Each of these viral sequences had sporadic appearance, only being identified in a subset of the samples treated with the linked laboratory component, and some were not identified in the non-template control samples. Remarkably, more than 65% of all viral sequences identified were within viral clusters linked to the use of laboratory components. CONCLUSIONS: We show that high prevalence of contaminating viral sequences can be expected in HTS-based virome data and provide an extensive list of novel contaminating viral sequences that can be used for evaluation of viral findings in future virome and metagenome studies. Moreover, we show that detection can be problematic due to stochastic appearance and limited non-template controls. Although the exact origin of these viral sequences requires further research, our results support laboratory-component-linked viral sequence contamination of both biological and synthetic origin.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenômica/métodos , Manejo de Espécimes/métodos , Vírus/isolamento & purificação , Humanos , Vírus/genética
11.
Trends Genet ; 17(8): 425-8, 2001 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-11485798

RESUMO

In sequenced microbial genomes, some of the annotated genes are actually not protein-coding genes, but rather open reading frames that occur by chance. Therefore, the number of annotated genes is higher than the actual number of genes for most of these microbes. Comparison of the length distribution of the annotated genes with the length distribution of those matching a known protein reveals that too many short genes are annotated in many genomes. Here we estimate the true number of protein-coding genes for sequenced genomes. Although it is often claimed that Escherichia coli has about 4300 genes, we show that it probably has only approximately 3800 genes, and that a similar discrepancy exists for almost all published genomes.


Assuntos
Escherichia coli/genética , Genoma Bacteriano , Genoma , Bases de Dados Factuais , Modelos Estatísticos , Fases de Leitura Aberta , Saccharomyces cerevisiae/genética
12.
Curr Opin Struct Biol ; 7(3): 394-8, 1997 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-9204282

RESUMO

Recently, neural networks have been applied to a widening range of problems in molecular biology. An area particularly suited to neural-network methods is the identification of protein sorting signals and the prediction of their cleavage sites, as these functional units are encoded by local, linear sequences of amino acids rather than global 3D structures.


Assuntos
Redes Neurais de Computação , Sinais Direcionadores de Proteínas , Proteínas/química , Proteínas/metabolismo , Cloroplastos/metabolismo , Previsões , Mitocôndrias/metabolismo , Modelos Biológicos , Fragmentos de Peptídeos/química , Fragmentos de Peptídeos/metabolismo
13.
Nucleic Acids Res ; 32(3): 1131-42, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-14960723

RESUMO

Prediction of splice sites in non-coding regions of genes is one of the most challenging aspects of gene structure recognition. We perform a rigorous analysis of such splice sites embedded in human 5' untranslated regions (UTRs), and investigate correlations between this class of splice sites and other features found in the adjacent exons and introns. By restricting the training of neural network algorithms to 'pure' UTRs (not extending partially into protein coding regions), we for the first time investigate the predictive power of the splicing signal proper, in contrast to conventional splice site prediction, which typically relies on the change in sequence at the transition from protein coding to non-coding. By doing so, the algorithms were able to pick up subtler splicing signals that were otherwise masked by 'coding' noise, thus enhancing significantly the prediction of 5' UTR splice sites. For example, the non-coding splice site predicting networks pick up compositional and positional bias in the 3' ends of non-coding exons and 5' non-coding intron ends, where cytosine and guanine are over-represented. This compositional bias at the true UTR donor sites is also visible in the synaptic weights of the neural networks trained to identify UTR donor sites. Conventional splice site prediction methods perform poorly in UTRs because the reading frame pattern is absent. The NetUTR method presented here performs 2-3-fold better compared with NetGene2 and GenScan in 5' UTRs. We also tested the 5' UTR trained method on protein coding regions, and discovered, surprisingly, that it works quite well (although it cannot compete with NetGene2). This indicates that the local splicing pattern in UTRs and coding regions is largely the same. The NetUTR method is made publicly available at www.cbs.dtu.dk/services/NetUTR.


Assuntos
Regiões 5' não Traduzidas/química , Redes Neurais de Computação , Precursores de RNA/química , Sítios de Splice de RNA , Análise de Sequência de RNA/métodos , Regiões 5' não Traduzidas/metabolismo , Éxons , Humanos , Íntrons , Dados de Sequência Molecular , Nucleotídeos/análise , Biossíntese de Proteínas , Precursores de RNA/metabolismo
14.
Nucleic Acids Res ; 28(3): 706-9, 2000 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-10637321

RESUMO

The recently published complete DNA sequence of the bacterium Thermotoga maritima provides evidence, based on protein sequence conservation, for lateral gene transfer between Archaea and Bacteria. We introduce a new method of periodicity analysis of DNA sequences, based on structural parameters, which brings independent evidence for the lateral gene transfer in the genome of T.maritima. The structural analysis relates the Archaea-like DNA sequences to the genome of Pyrococcus horikoshii. Analysis of 24 complete genomic DNA sequences shows different periodicity patterns for organisms of different origin. The typical genomic periodicity for Bacteria is 11 bp whilst it is 10 bp for Archaea. Eukaryotes have more complex spectra but the dominant period in the yeast Saccharomyces cerevisiae is 10.2 bp. These periodicities are most likely reflective of differences in chromatin structure.


Assuntos
Biologia Computacional , DNA/genética , Genoma Bacteriano , Modelos Genéticos , Recombinação Genética , Thermotoga maritima/genética , Cromatina/química , Cromatina/genética , DNA/química , Análise de Fourier , Genoma Arqueal , Genoma Fúngico , Conformação de Ácido Nucleico , Filogenia , Pyrococcus/genética , Saccharomyces cerevisiae/genética , Alinhamento de Sequência , Análise Espectral , Termodinâmica
15.
J Mol Biol ; 227(1): 108-13, 1992 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-1522582

RESUMO

Analysis of an artificial neural network trained to classify DNA as coding or non-coding revealed compositional differences between sequence parts translated into protein and those that were not. The 5' end of human introns was found to have a base composition that was non-random to an extent matching the non-randomness in the 3' end that contains the polypyrimidine tract. The prevailing nucleotides in the initial 50 nucleotides of human introns are guanine and cytosine, the trinucleotide GGG was found to occur almost four times as frequently as it would in sequences with a uniform distribution of the nucleotides. The initial part of terminal exons and their associated terminal introns were shown to have a very special base composition deviating strongly from the normal picture in other exons and introns.


Assuntos
Genes , Íntrons , Composição de Bases , Sequência de Bases , Humanos , Dados de Sequência Molecular , Redes Neurais de Computação
16.
J Mol Biol ; 220(1): 49-65, 1991 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-2067018

RESUMO

Artificial neural networks have been applied to the prediction of splice site location in human pre-mRNA. A joint prediction scheme where prediction of transition regions between introns and exons regulates a cutoff level for splice site assignment was able to predict splice site locations with confidence levels far better than previously reported in the literature. The problem of predicting donor and acceptor sites in human genes is hampered by the presence of numerous amounts of false positives: here, the distribution of these false splice sites is examined and linked to a possible scenario for the splicing mechanism in vivo. When the presented method detects 95% of the true donor and acceptor sites, it makes less than 0.1% false donor site assignments and less than 0.4% false acceptor site assignments. For the large data set used in this study, this means that on average there are one and a half false donor sites per true donor site and six false acceptor sites per true acceptor site. With the joint assignment method, more than a fifth of the true donor sites and around one fourth of the true acceptor sites could be detected without accompaniment of any false positive predictions. Highly confident splice sites could not be isolated with a widely used weight matrix method or by separate splice site networks. A complementary relation between the confidence levels of the coding/non-coding and the separate splice site networks was observed, with many weak splice sites having sharp transitions in the coding/non-coding signal and many stronger splice sites having more ill-defined transitions between coding and non-coding.


Assuntos
DNA/genética , RNA Mensageiro/genética , Sequência de Bases , Bases de Dados Factuais , Éxons , Humanos , Íntrons , Dados de Sequência Molecular , Probabilidade , Homologia de Sequência do Ácido Nucleico
17.
J Mol Biol ; 294(5): 1351-62, 1999 Dec 17.
Artigo em Inglês | MEDLINE | ID: mdl-10600390

RESUMO

Protein phosphorylation at serine, threonine or tyrosine residues affects a multitude of cellular signaling processes. How is specificity in substrate recognition and phosphorylation by protein kinases achieved? Here, we present an artificial neural network method that predicts phosphorylation sites in independent sequences with a sensitivity in the range from 69 % to 96 %. As an example, we predict novel phosphorylation sites in the p300/CBP protein that may regulate interaction with transcription factors and histone acetyltransferase activity. In addition, serine and threonine residues in p300/CBP that can be modified by O-linked glycosylation with N-acetylglucosamine are identified. Glycosylation may prevent phosphorylation at these sites, a mechanism named yin-yang regulation. The prediction server is available on the Internet at http://www.cbs.dtu.dk/services/NetPhos/or via e-mail to NetPhos@cbs. dtu.dk.


Assuntos
Sequência Consenso , Células Eucarióticas/química , Fosfoproteínas/química , Fosfoproteínas/metabolismo , Motivos de Aminoácidos , Sequência de Aminoácidos , Animais , Sítios de Ligação , Glicosilação , Modelos Moleculares , Redes Neurais de Computação , Proteínas Nucleares/química , Proteínas Nucleares/metabolismo , Fosforilação , Filogenia , Estrutura Terciária de Proteína , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Serina/metabolismo , Especificidade por Substrato , Treonina/metabolismo , Transativadores/química , Transativadores/metabolismo , Tirosina/metabolismo
18.
J Mol Biol ; 263(4): 503-10, 1996 Nov 08.
Artigo em Inglês | MEDLINE | ID: mdl-8918932

RESUMO

We describe the structural implications of a periodic pattern found in human exons and introns by hidden Markov models. We show that exons (besides the reading frame) have a specific sequential structure in the form of a pattern with triplet consensus non-T(A/T)G, and a minimal periodicity of roughly ten nucleotides. The periodic pattern is also present in intron sequences, although the strength per nucleotide is weaker. Using two independent profile methods based on triplet bendability parameters from DNase I experiments and nucleosome positioning data, we show that the pattern in multiple alignments of internal exon and intron sequences corresponds to a periodic "in phase" bending potential towards the major groove of the DNA. The nucleosome positioning data show that the consensus triplets (and their complements) have a preference for locations on a bent double helix where the major groove faces inward and is compressed. The in-phase triplets are located adjacent to GCC/GGC triplets known to have the strongest bias in their positioning on the nuclesome. Analysis of mRNA sequences encoding proteins with known tertiary structure exclude the possibility that the pattern is a consequence of the previously well-known periodicity caused by the encoding of alpha-helices in proteins. Finally, we discuss the relation between the bending potential of coding and non-coding regions and its impact on the translational positioning of nucleosomes and the recognition of genes by the transcriptional machinery.


Assuntos
DNA/química , Éxons , Íntrons , Modelos Teóricos , Nucleossomos/genética , Sequência de Bases , Sequência Conservada , DNA/metabolismo , Desoxirribonuclease I/metabolismo , Humanos , Modelos Moleculares , Conformação de Ácido Nucleico , Nucleossomos/química , Nucleossomos/metabolismo , RNA Mensageiro/química , Sequências Repetitivas de Ácido Nucleico , Alinhamento de Sequência , Software
19.
J Mol Biol ; 243(5): 816-20, 1994 Nov 11.
Artigo em Inglês | MEDLINE | ID: mdl-7966302

RESUMO

A neural network trained to classify the 61 nucleotide triplets of the genetic code into 20 amino acid categories develops in its internal representation a pattern matching the relative cost of transferring amino acids with satisfied backbone hydrogen bonds from water to an environment of dielectric constant of roughly 2.0. Such environments are typically found in lipid membranes or in the interior of proteins. In learning the mapping between the codons and the categories, the network groups the amino acids according to the scale of transfer free energies developed by Engelman, Goldman and Steitz. Several other scales based on internal preference statistics also agree reasonably well with the network grouping. The network is able to relate the structure of the genetic code to quantifications of amino acid hydrophobicity-hydrophilicity more systematically than the numerous attempts made earlier. Due to its inherent non-linearity, the code is also shown to impose decisive constraints on algorithmic analysis of the protein coding potential of DNA.


Assuntos
Aminoácidos/química , Transferência de Energia/genética , Redes Neurais de Computação , Sequência de Aminoácidos , Sequência de Bases , Código Genético , Modelos Genéticos , Dados de Sequência Molecular
20.
J Mol Biol ; 281(4): 663-73, 1998 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-9710538

RESUMO

The fact that DNA three-dimensional structure is important for transcriptional regulation begs the question of whether eukaryotic promoters contain general structural features independently of what genes they control. We present an analysis of a large set of human RNA polymerase II promoters with a very low level of sequence similarity. The sequences, which include both TATA-containing and TATA-less promoters, are aligned by hidden Markov models. Using three different models of sequence-derived DNA bendability, the aligned promoters display a common structural profile with bendability being low in a region upstream of the transcriptional start point and significantly higher downstream. Investigation of the sequence composition in the two regions shows that the bendability profile originates from the sequential structure of the DNA, rather than the general nucleotide composition. Several trinucleotides known to have high propensity for major groove compression are found much more frequently in the regions downstream of the transcriptional start point, while the upstream regions contain more low-bendability triplets. Within the region downstream of the start point, we observe a periodic pattern in sequence and bendability, which is in phase with the DNA helical pitch. The periodic bendability profile shows bending peaks roughly at every 10 bp with stronger bending at 20 bp intervals. These observations suggest that DNA in the region downstream of the transcriptional start point is able to wrap around protein in a manner reminiscent of DNA in a nucleosome. This notion is further supported by the finding that the periodic bendability is caused mainly by the complementary triplet pairs CAG/CTG and GGC/GCC, which previously have been found to correlate with nucleosome positioning. We present models where the high-bendability regions position nucleosomes at the downstream end of the transcriptional start point, and consider the possibility of interaction between histone-like TAFs and this area. We also propose the use of this structural signature in computational promoter-finding algorithms.


Assuntos
DNA/química , Regiões Promotoras Genéticas/genética , RNA Polimerase II/genética , Algoritmos , Humanos , Cadeias de Markov , Conformação de Ácido Nucleico , Nucleossomos/química , Nucleotídeos/química , Análise de Sequência de DNA
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa