Pesquisa | Portal de Pesquisa da BVS

1.

An integrated map of structural variation in 2,504 human genomes.

Sudmant, Peter H; Rausch, Tobias; Gardner, Eugene J; Handsaker, Robert E; Abyzov, Alexej; Huddleston, John; Zhang, Yan; Ye, Kai; Jun, Goo; Fritz, Markus Hsi-Yang; Konkel, Miriam K; Malhotra, Ankit; Stütz, Adrian M; Shi, Xinghua; Casale, Francesco Paolo; Chen, Jieming; Hormozdiari, Fereydoun; Dayama, Gargi; Chen, Ken; Malig, Maika; Chaisson, Mark J P; Walter, Klaudia; Meiers, Sascha; Kashin, Seva; Garrison, Erik; Auton, Adam; Lam, Hugo Y K; Mu, Xinmeng Jasmine; Alkan, Can; Antaki, Danny; Bae, Taejeong; Cerveira, Eliza; Chines, Peter; Chong, Zechen; Clarke, Laura; Dal, Elif; Ding, Li; Emery, Sarah; Fan, Xian; Gujral, Madhusudan; Kahveci, Fatma; Kidd, Jeffrey M; Kong, Yu; Lameijer, Eric-Wubbo; McCarthy, Shane; Flicek, Paul; Gibbs, Richard A; Marth, Gabor; Mason, Christopher E; Menelaou, Androniki.

Nature ; 526(7571): 75-81, 2015 Oct 01.

Artigo em Inglês | MEDLINE | ID: mdl-26432246

RESUMO

Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.

Assuntos

Variação Genética/genética , Genoma Humano/genética , Mapeamento Físico do Cromossomo , Sequência de Aminoácidos , Predisposição Genética para Doença , Genética Médica , Genética Populacional , Estudo de Associação Genômica Ampla , Genômica , Genótipo , Haplótipos/genética , Homozigoto , Humanos , Dados de Sequência Molecular , Taxa de Mutação , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Análise de Sequência de DNA , Deleção de Sequência/genética

2.

Characteristics of de novo structural changes in the human genome.

Kloosterman, Wigard P; Francioli, Laurent C; Hormozdiari, Fereydoun; Marschall, Tobias; Hehir-Kwa, Jayne Y; Abdellaoui, Abdel; Lameijer, Eric-Wubbo; Moed, Matthijs H; Koval, Vyacheslav; Renkens, Ivo; van Roosmalen, Markus J; Arp, Pascal; Karssen, Lennart C; Coe, Bradley P; Handsaker, Robert E; Suchiman, Eka D; Cuppen, Edwin; Thung, Djie Tjwan; McVey, Mitch; Wendl, Michael C; Uitterlinden, André; van Duijn, Cornelia M; Swertz, Morris A; Wijmenga, Cisca; van Ommen, GertJan B; Slagboom, P Eline; Boomsma, Dorret I; Schönhuth, Alexander; Eichler, Evan E; de Bakker, Paul I W; Ye, Kai; Guryev, Victor.

Genome Res ; 25(6): 792-801, 2015 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-25883321

RESUMO

Small insertions and deletions (indels) and large structural variations (SVs) are major contributors to human genetic diversity and disease. However, mutation rates and characteristics of de novo indels and SVs in the general population have remained largely unexplored. We report 332 validated de novo structural changes identified in whole genomes of 250 families, including complex indels, retrotransposon insertions, and interchromosomal events. These data indicate a mutation rate of 2.94 indels (1-20 bp) and 0.16 SVs (>20 bp) per generation. De novo structural changes affect on average 4.1 kbp of genomic sequence and 29 coding bases per generation, which is 91 and 52 times more nucleotides than de novo substitutions, respectively. This contrasts with the equal genomic footprint of inherited SVs and substitutions. An excess of structural changes originated on paternal haplotypes. Additionally, we observed a nonuniform distribution of de novo SVs across offspring. These results reveal the importance of different mutational mechanisms to changes in human genome structure across generations.

Assuntos

Variação Genética , Genoma Humano , Alelos , Sequência de Aminoácidos , Feminino , Genômica , Haplótipos , Humanos , Mutação INDEL , Masculino , Dados de Sequência Molecular , Taxa de Mutação , Polimorfismo de Nucleotídeo Único , Retroelementos/genética , Alinhamento de Sequência , Análise de Sequência de DNA

3.

A gain of function mutation in TNFRSF11B encoding osteoprotegerin causes osteoarthritis with chondrocalcinosis.

Ramos, Yolande F M; Bos, Steffan D; van der Breggen, Ruud; Kloppenburg, Margreet; Ye, Kai; Lameijer, Eric-Wubbo E M W; Nelissen, Rob G H H; Slagboom, P Eline; Meulenbelt, Ingrid.

Ann Rheum Dis ; 74(9): 1756-62, 2015 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-24743232

RESUMO

OBJECTIVE: To identify pathogenic mutations that reveal underlying biological mechanisms driving osteoarthritis (OA). METHODS: Exome sequencing was applied to two distant family members with dominantly inherited early onset primary OA at multiple joint sites with chondrocalcinosis (familial generalised osteoarthritis, FOA). Confirmation of mutations occurred by genotyping and linkage analyses across the extended family. The functional effect of the mutation was investigated by means of a cell-based assay. To explore generalisability, mRNA expression analysis of the relevant genes in the discovered pathway was explored in preserved and osteoarthritic articular cartilage of independent patients undergoing joint replacement surgery. RESULTS: We identified a heterozygous, probably damaging, read-through mutation (c.1205A=>T; p.Stop402Leu) in TNFRSF11B encoding osteoprotegerin that is likely causal to the OA phenotype in the extended family. In a bone resorption assay, the mutant form of osteoprotegerin showed enhanced capacity to inhibit osteoclastogenesis and bone resorption. Expression analyses in preserved and affected articular cartilage of independent OA patients showed that upregulation of TNFRSF11B is a general phenomenon in the pathophysiological process. CONCLUSIONS: Albeit that the role of the molecular pathway of osteoprotegerin has been studied in OA, we are the first to demonstrate that enhanced osteoprotegerin function could be a directly underlying cause. We advocate that agents counteracting the function of osteoprotegerin could comply with new therapeutic interventions of OA.

Assuntos

Condrocalcinose/genética , Osteoartrite/genética , Osteoprotegerina/genética , Idoso , Idoso de 80 Anos ou mais , Reabsorção Óssea/genética , Diferenciação Celular/genética , Condrocalcinose/complicações , Exoma , Feminino , Genótipo , Heterozigoto , Humanos , Masculino , Pessoa de Meia-Idade , Mutação , Osteoartrite/complicações , Osteoclastos , Linhagem , Fenótipo

4.

PASSion: a pattern growth algorithm-based pipeline for splice junction detection in paired-end RNA-Seq data.

Zhang, Yanju; Lameijer, Eric-Wubbo; 't Hoen, Peter A C; Ning, Zemin; Slagboom, P Eline; Ye, Kai.

Bioinformatics ; 28(4): 479-86, 2012 Feb 15.

Artigo em Inglês | MEDLINE | ID: mdl-22219203

RESUMO

MOTIVATION: RNA-seq is a powerful technology for the study of transcriptome profiles that uses deep-sequencing technologies. Moreover, it may be used for cellular phenotyping and help establishing the etiology of diseases characterized by abnormal splicing patterns. In RNA-Seq, the exact nature of splicing events is buried in the reads that span exon-exon boundaries. The accurate and efficient mapping of these reads to the reference genome is a major challenge. RESULTS: We developed PASSion, a pattern growth algorithm-based pipeline for splice site detection in paired-end RNA-Seq reads. Comparing the performance of PASSion to three existing RNA-Seq analysis pipelines, TopHat, MapSplice and HMMSplicer, revealed that PASSion is competitive with these packages. Moreover, the performance of PASSion is not affected by read length and coverage. It performs better than the other three approaches when detecting junctions in highly abundant transcripts. PASSion has the ability to detect junctions that do not have known splicing motifs, which cannot be found by the other tools. Of the two public RNA-Seq datasets, PASSion predicted ≈ 137,000 and 173,000 splicing events, of which on average 82 are known junctions annotated in the Ensembl transcript database and 18% are novel. In addition, our package can discover differential and shared splicing patterns among multiple samples. AVAILABILITY: The code and utilities can be freely downloaded from https://trac.nbic.nl/passion and ftp://ftp.sanger.ac.uk/pub/zn1/passion.

Assuntos

Algoritmos , Splicing de RNA , Análise de Sequência de RNA/métodos , Cromossomos Humanos Par 17 , Éxons , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , RNA/genética , Sítios de Splice de RNA , Software

5.

Aging as accelerated accumulation of somatic variants: whole-genome sequencing of centenarian and middle-aged monozygotic twin pairs.

Ye, Kai; Beekman, Marian; Lameijer, Eric-Wubbo; Zhang, Yanju; Moed, Matthijs H; van den Akker, Erik B; Deelen, Joris; Houwing-Duistermaat, Jeanine J; Kremer, Dennis; Anvar, Seyed Yahya; Laros, Jeroen F J; Jones, David; Raine, Keiran; Blackburne, Ben; Potluri, Shobha; Long, Quan; Guryev, Victor; van der Breggen, Ruud; Westendorp, Rudi G J; 't Hoen, Peter A C; den Dunnen, Johan; van Ommen, Gert Jan B; Willemsen, Gonneke; Pitts, Steven J; Cox, David R; Ning, Zemin; Boomsma, Dorret I; Slagboom, P Eline.

Twin Res Hum Genet ; 16(6): 1026-32, 2013 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-24182360

RESUMO

It has been postulated that aging is the consequence of an accelerated accumulation of somatic DNA mutations and that subsequent errors in the primary structure of proteins ultimately reach levels sufficient to affect organismal functions. The technical limitations of detecting somatic changes and the lack of insight about the minimum level of erroneous proteins to cause an error catastrophe hampered any firm conclusions on these theories. In this study, we sequenced the whole genome of DNA in whole blood of two pairs of monozygotic (MZ) twins, 40 and 100 years old, by two independent next-generation sequencing (NGS) platforms (Illumina and Complete Genomics). Potentially discordant single-base substitutions supported by both platforms were validated extensively by Sanger, Roche 454, and Ion Torrent sequencing. We demonstrate that the genomes of the two twin pairs are germ-line identical between co-twins, and that the genomes of the 100-year-old MZ twins are discerned by eight confirmed somatic single-base substitutions, five of which are within introns. Putative somatic variation between the 40-year-old twins was not confirmed in the validation phase. We conclude from this systematic effort that by using two independent NGS platforms, somatic single nucleotide substitutions can be detected, and that a century of life did not result in a large number of detectable somatic mutations in blood. The low number of somatic variants observed by using two NGS platforms might provide a framework for detecting disease-related somatic variants in phenotypically discordant MZ twins.

Assuntos

Envelhecimento/genética , Células Sanguíneas/fisiologia , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Mutação/genética , Gêmeos Monozigóticos/genética , Adulto , Idoso de 80 Anos ou mais , Feminino , Humanos , Masculino , Pessoa de Meia-Idade

6.

Designing active template molecules by combining computational de novo design and human chemist's expertise.

Lameijer, Eric-Wubbo; Tromp, Reynier A; Spanjersberg, Ronald F; Brussee, Johannes; Ijzerman, Adriaan P.

J Med Chem ; 50(8): 1925-32, 2007 Apr 19.

Artigo em Inglês | MEDLINE | ID: mdl-17367122

RESUMO

We used a new software tool for de novo design, the "Molecule Evoluator", to generate a number of small molecules. Explicit constraints were a relatively low molecular weight and otherwise limited functionality, for example, low numbers of hydrogen bond donors and acceptors, one or two aromatic rings, and a small number of rotatable bonds. In this way, we obtained a collection of scaffold- or templatelike molecules rather than fully "decorated" ones. We asked medicinal chemists to evaluate the suggested molecules for ease of synthesis and overall appeal, allowing them to make structural changes to the molecules for these reasons. On the basis of their recommendations, we synthesized eight molecules with an unprecedented (not patented) yet simple structure, which were subsequently tested in a screen of 83 drug targets, mostly G protein-coupled receptors. Four compounds showed affinity for biogenic amine targets (receptor, ion channel, and transport protein), reflecting the training of the medicinal chemists involved. Apparently the generation of leadlike solutions helped the medicinal chemists to select good starting points for future lead optimization, away from existing compound libraries.

Assuntos

Química Farmacêutica/métodos , Bases de Dados Factuais , Desenho de Fármacos , Preparações Farmacêuticas/química , Algoritmos , Compostos de Anilina/síntese química , Compostos de Anilina/química , Compostos de Anilina/farmacologia , Animais , Proteínas de Transporte/metabolismo , Estudos de Viabilidade , Furanos/síntese química , Furanos/química , Furanos/farmacologia , Humanos , Técnicas In Vitro , Canais Iônicos/efeitos dos fármacos , Modelos Moleculares , Preparações Farmacêuticas/síntese química , Diester Fosfórico Hidrolases/metabolismo , Piperidinas/síntese química , Piperidinas/química , Piperidinas/farmacologia , Ensaio Radioligante , Ratos , Receptores Citoplasmáticos e Nucleares/efeitos dos fármacos , Receptores Acoplados a Proteínas G/efeitos dos fármacos , ATPase Trocadora de Sódio-Potássio/metabolismo , Software , Relação Estrutura-Atividade

7.

A two-entropies analysis to identify functional positions in the transmembrane region of class A G protein-coupled receptors.

Ye, Kai; Lameijer, Eric-Wubbo M; Beukers, Margot W; Ijzerman, Adriaan P.

Proteins ; 63(4): 1018-30, 2006 Jun 01.

Artigo em Inglês | MEDLINE | ID: mdl-16532452

RESUMO

Residues in the transmembrane region of G protein-coupled receptors (GPCRs) are important for ligand binding and activation, but the function of individual positions is poorly understood. Using a sequence alignment of class A GPCRs (grouped in subfamilies), we propose a so-called "two-entropies analysis" to determine the potential role of individual positions in the transmembrane region of class A GPCRs. In our approach, such positions appear scattered, while largely clustered according to their biological function. Our method appears superior when compared to other bioinformatics approaches, such as the evolutionary trace method, entropy-variability plot, and correlated mutation analysis, both qualitatively and quantitatively.

Assuntos

Membrana Celular/química , Membrana Celular/metabolismo , Receptores Acoplados a Proteínas G/química , Receptores Acoplados a Proteínas G/metabolismo , Sequência de Aminoácidos , Aminoácidos/química , Sítios de Ligação , Entropia , Ligantes , Modelos Moleculares , Ligação Proteica , Estrutura Quaternária de Proteína , Receptores Acoplados a Proteínas G/classificação , Alinhamento de Sequência , Solventes

8.

Systematic discovery of complex insertions and deletions in human cancers.

Ye, Kai; Wang, Jiayin; Jayasinghe, Reyka; Lameijer, Eric-Wubbo; McMichael, Joshua F; Ning, Jie; McLellan, Michael D; Xie, Mingchao; Cao, Song; Yellapantula, Venkata; Huang, Kuan-lin; Scott, Adam; Foltz, Steven; Niu, Beifang; Johnson, Kimberly J; Moed, Matthijs; Slagboom, P Eline; Chen, Feng; Wendl, Michael C; Ding, Li.

Nat Med ; 22(1): 97-104, 2016 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-26657142

RESUMO

Complex insertions and deletions (indels) are formed by simultaneously deleting and inserting DNA fragments of different sizes at a common genomic location. Here we present a systematic analysis of somatic complex indels in the coding sequences of samples from over 8,000 cancer cases using Pindel-C. We discovered 285 complex indels in cancer-associated genes (such as PIK3R1, TP53, ARID1A, GATA3 and KMT2D) in approximately 3.5% of cases analyzed; nearly all instances of complex indels were overlooked (81.1%) or misannotated (17.6%) in previous reports of 2,199 samples. In-frame complex indels are enriched in PIK3R1 and EGFR, whereas frameshifts are prevalent in VHL, GATA3, TP53, ARID1A, PTEN and ATRX. Furthermore, complex indels display strong tissue specificity (such as VHL in kidney cancer samples and GATA3 in breast cancer samples). Finally, structural analyses support findings of previously missed, but potentially druggable, mutations in the EGFR, MET and KIT oncogenes. This study indicates the critical importance of improving complex indel discovery and interpretation in medical research.

Assuntos

Mineração de Dados/métodos , Genômica/métodos , Mutação INDEL/genética , Neoplasias/genética , Linhagem Celular Tumoral , Classe Ia de Fosfatidilinositol 3-Quinase , DNA Helicases/genética , Proteínas de Ligação a DNA/genética , Receptores ErbB/genética , Fator de Transcrição GATA3/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Proteínas de Neoplasias/genética , Proteínas Nucleares/genética , PTEN Fosfo-Hidrolase/genética , Fosfatidilinositol 3-Quinases/genética , Proteínas Proto-Oncogênicas c-kit/genética , Proteínas Proto-Oncogênicas c-met/genética , Fatores de Transcrição/genética , Proteína Supressora de Tumor p53/genética , Proteína Supressora de Tumor Von Hippel-Lindau/genética , Proteína Nuclear Ligada ao X

9.

A high-quality human reference panel reveals the complexity and distribution of genomic structural variants.

Hehir-Kwa, Jayne Y; Marschall, Tobias; Kloosterman, Wigard P; Francioli, Laurent C; Baaijens, Jasmijn A; Dijkstra, Louis J; Abdellaoui, Abdel; Koval, Vyacheslav; Thung, Djie Tjwan; Wardenaar, René; Renkens, Ivo; Coe, Bradley P; Deelen, Patrick; de Ligt, Joep; Lameijer, Eric-Wubbo; van Dijk, Freerk; Hormozdiari, Fereydoun; Uitterlinden, André G; van Duijn, Cornelia M; Eichler, Evan E; de Bakker, Paul I W; Swertz, Morris A; Wijmenga, Cisca; van Ommen, Gert-Jan B; Slagboom, P Eline; Boomsma, Dorret I; Schönhuth, Alexander; Ye, Kai; Guryev, Victor.

Nat Commun ; 7: 12989, 2016 10 06.

Artigo em Inglês | MEDLINE | ID: mdl-27708267

RESUMO

Structural variation (SV) represents a major source of differences between individual human genomes and has been linked to disease phenotypes. However, the majority of studies provide neither a global view of the full spectrum of these variants nor integrate them into reference panels of genetic variation. Here, we analyse whole genome sequencing data of 769 individuals from 250 Dutch families, and provide a haplotype-resolved map of 1.9 million genome variants across 9 different variant classes, including novel forms of complex indels, and retrotransposition-mediated insertions of mobile elements and processed RNAs. A large proportion are previously under reported variants sized between 21 and 100 bp. We detect 4 megabases of novel sequence, encoding 11 new transcripts. Finally, we show 191 known, trait-associated SNPs to be in strong linkage disequilibrium with SVs and demonstrate that our panel facilitates accurate imputation of SVs in unrelated individuals.

Assuntos

Genoma Humano , Variação Estrutural do Genoma , Genômica , Algoritmos , Cromossomos/ultraestrutura , Biologia Computacional , Deleção de Genes , Genótipo , Haplótipos , Humanos , Mutação INDEL , Desequilíbrio de Ligação , Países Baixos , Reação em Cadeia da Polimerase , Polimorfismo de Nucleotídeo Único , RNA/metabolismo , Análise de Sequência de DNA , Análise de Sequência de RNA , Software

10.

Identification and systematic annotation of tissue-specific differentially methylated regions using the Illumina 450k array.

Slieker, Roderick C; Bos, Steffan D; Goeman, Jelle J; Bovée, Judith Vmg; Talens, Rudolf P; van der Breggen, Ruud; Suchiman, H Eka D; Lameijer, Eric-Wubbo; Putter, Hein; van den Akker, Erik B; Zhang, Yanju; Jukema, J Wouter; Slagboom, P Eline; Meulenbelt, Ingrid; Heijmans, Bastiaan T.

Epigenetics Chromatin ; 6(1): 26, 2013 Aug 06.

Artigo em Inglês | MEDLINE | ID: mdl-23919675

RESUMO

BACKGROUND: DNA methylation has been recognized as a key mechanism in cell differentiation. Various studies have compared tissues to characterize epigenetically regulated genomic regions, but due to differences in study design and focus there still is no consensus as to the annotation of genomic regions predominantly involved in tissue-specific methylation. We used a new algorithm to identify and annotate tissue-specific differentially methylated regions (tDMRs) from Illumina 450k chip data for four peripheral tissues (blood, saliva, buccal swabs and hair follicles) and six internal tissues (liver, muscle, pancreas, subcutaneous fat, omentum and spleen with matched blood samples). RESULTS: The majority of tDMRs, in both relative and absolute terms, occurred in CpG-poor regions. Further analysis revealed that these regions were associated with alternative transcription events (alternative first exons, mutually exclusive exons and cassette exons). Only a minority of tDMRs mapped to gene-body CpG islands (13%) or CpG islands shores (25%) suggesting a less prominent role for these regions than indicated previously. Implementation of ENCODE annotations showed enrichment of tDMRs in DNase hypersensitive sites and transcription factor binding sites. Despite the predominance of tissue differences, inter-individual differences in DNA methylation in internal tissues were correlated with those for blood for a subset of CpG sites in a locus- and tissue-specific manner. CONCLUSIONS: We conclude that tDMRs preferentially occur in CpG-poor regions and are associated with alternative transcription. Furthermore, our data suggest the utility of creating an atlas cataloguing variably methylated regions in internal tissues that correlate to DNA methylation measured in easy accessible peripheral tissues.

11.

Precision Medicine: What Challenges Are We Facing?

Xue, Yu; Lameijer, Eric-Wubbo; Ye, Kai; Zhang, Kunlin; Chang, Suhua; Wang, Xiaoyue; Wu, Jianmin; Gao, Ge; Zhao, Fangqing; Li, Jian; Han, Chunsheng; Xu, Shuhua; Xiao, Jingfa; Yang, Xuerui; Ying, Xiaomin; Zhang, Xuegong; Chen, Wei-Hua; Liu, Yun; Zhang, Zhang; Huang, Kun; Yu, Jun.

Genomics Proteomics Bioinformatics ; 14(5): 253-261, 2016 10.

Artigo em Inglês | MEDLINE | ID: mdl-27744061

12.

The molecule evoluator. An interactive evolutionary algorithm for the design of drug-like molecules.

Lameijer, Eric-Wubbo; Kok, Joost N; Bäck, Thomas; Ijzerman, Ad P.

J Chem Inf Model ; 46(2): 545-52, 2006.

Artigo em Inglês | MEDLINE | ID: mdl-16562982

RESUMO

We developed a software tool to design drug-like molecules, the "Molecule Evoluator", which we introduce and describe here. An atom-based evolutionary approach was used allowing both several types of mutation and crossover to occur. The novelty, we claim, is the unprecedented interactive evolution, in which the user acts as a fitness function. This brings a human being's creativity, implicit knowledge, and imagination into the design process, next to the more standard chemical rules. Proof-of-concept was demonstrated in a number of ways, both computationally and in the lab. Thus, we synthesized a number of compounds designed with the aid of the Molecule Evoluator. One of these is described here, a new chemical entity with activity on alpha-adrenergic receptors.

Assuntos

Algoritmos , Simulação por Computador , Desenho de Fármacos , Design de Software , Inteligência Artificial , Estrutura Molecular , Interface Usuário-Computador

13.

Mining a chemical database for fragment co-occurrence: discovery of "chemical clichés".

Lameijer, Eric-Wubbo; Kok, Joost N; Bäck, Thomas; Ijzerman, Ad P.

J Chem Inf Model ; 46(2): 553-62, 2006.

Artigo em Inglês | MEDLINE | ID: mdl-16562983

RESUMO

Nowadays millions of different compounds are known, their structures stored in electronic databases. Analysis of these data could yield valuable insights into the laws of chemistry and the habits of chemists. We have therefore explored the public database of the National Cancer Institute (>250,000 compounds) by pattern searching. We split the molecules of this database into fragments to find out which fragments exist, how frequent they are, and whether the occurrence of one fragment in a molecule is related to the occurrence of another, nonoverlapping fragment. It turns out that some fragments and combinations of fragments are so frequent that they can be called "chemical clichés". We believe that the fragment data can give insight into the chemical space explored so far by synthesis. The lists of fragments and their (co-)occurrences can help create novel chemical compounds by (i) systematically listing the most popular and therefore most easily used substituents and ring systems for synthesizing new compounds, (ii) being an easily accessible repository for rarer fragments suitable for lead compound optimization, and (iii) pointing out some of the yet unexplored parts of chemical space.

Assuntos

Bases de Dados como Assunto , Desenho de Fármacos , Relação Estrutura-Atividade , Antineoplásicos/química , Diazepam/química , Estrutura Molecular

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA