Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
1.
Nature ; 588(7836): 83-88, 2020 12.
Article in English | MEDLINE | ID: mdl-33049755

ABSTRACT

Training algorithms to computationally plan multistep organic syntheses has been a challenge for more than 50 years1-7. However, the field has progressed greatly since the development of early programs such as LHASA1,7, for which reaction choices at each step were made by human operators. Multiple software platforms6,8-14 are now capable of completely autonomous planning. But these programs 'think' only one step at a time and have so far been limited to relatively simple targets, the syntheses of which could arguably be designed by human chemists within minutes, without the help of a computer. Furthermore, no algorithm has yet been able to design plausible routes to complex natural products, for which much more far-sighted, multistep planning is necessary15,16 and closely related literature precedents cannot be relied on. Here we demonstrate that such computational synthesis planning is possible, provided that the program's knowledge of organic chemistry and data-based artificial intelligence routines are augmented with causal relationships17,18, allowing it to 'strategize' over multiple synthetic steps. Using a Turing-like test administered to synthesis experts, we show that the routes designed by such a program are largely indistinguishable from those designed by humans. We also successfully validated three computer-designed syntheses of natural products in the laboratory. Taken together, these results indicate that expert-level automated synthetic planning is feasible, pending continued improvements to the reaction knowledge base and further code optimization.


Subject(s)
Artificial Intelligence , Biological Products/chemical synthesis , Chemistry Techniques, Synthetic/methods , Chemistry, Organic/methods , Software , Artificial Intelligence/standards , Automation/methods , Automation/standards , Benzylisoquinolines/chemical synthesis , Benzylisoquinolines/chemistry , Chemistry Techniques, Synthetic/standards , Chemistry, Organic/standards , Indans/chemical synthesis , Indans/chemistry , Indole Alkaloids/chemical synthesis , Indole Alkaloids/chemistry , Knowledge Bases , Lactones/chemical synthesis , Lactones/chemistry , Macrolides/chemical synthesis , Macrolides/chemistry , Reproducibility of Results , Sesquiterpenes/chemical synthesis , Sesquiterpenes/chemistry , Software/standards , Tetrahydroisoquinolines/chemical synthesis , Tetrahydroisoquinolines/chemistry
2.
Anal Chem ; 91(15): 10310-10319, 2019 08 06.
Article in English | MEDLINE | ID: mdl-31283196

ABSTRACT

Top-down proteomics approaches are becoming ever more popular, due to the advantages offered by knowledge of the intact protein mass in correctly identifying the various proteoforms that potentially arise due to point mutation, alternative splicing, post-translational modifications, etc. Usually, the average mass is used in this context; however, it is known that this can fluctuate significantly due to both natural and technical causes. Ideally, one would prefer to use the monoisotopic precursor mass, but this falls below the detection limit for all but the smallest proteins. Methods that predict the monoisotopic mass based on the average mass are potentially affected by imprecisions associated with the average mass. To address this issue, we have developed a framework based on simple, linear models that allows prediction of the monoisotopic mass based on the exact mass of the most-abundant (aggregated) isotope peak, which is a robust measure of mass, insensitive to the aforementioned natural and technical causes. This linear model was tested experimentally, as well as in silico, and typically predicts monoisotopic masses with an accuracy of only a few parts per million. A confidence measure is associated with the predicted monoisotopic mass to handle the off-by-one-Da prediction error. Furthermore, we introduce a correction function to extract the "true" (i.e., theoretically) most-abundant isotope peak from a spectrum, even if the observed isotope distribution is distorted by noise or poor ion statistics. The method is available online as an R shiny app: https://valkenborg-lab.shinyapps.io/mind/.


Subject(s)
Algorithms , Chromatography, Liquid/methods , Models, Statistical , Proteins/analysis , Proteome/analysis , Tandem Mass Spectrometry/methods , Humans , Protein Processing, Post-Translational , Proteins/metabolism
3.
Angew Chem Int Ed Engl ; 57(9): 2367-2371, 2018 02 23.
Article in English | MEDLINE | ID: mdl-29405528

ABSTRACT

Analysis of the chemical-organic knowledge represented as a giant network reveals that it contains millions of reaction sequences closing into cycles. Without realizing it, independent chemists working at different times have jointly created examples of cyclic sequences that allow for the recovery of useful reagents and for the autoamplification of synthetically important molecules, those that mimic biological cycles, and those that can be operated one-pot.

4.
Genome Res ; 23(1): 23-33, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23034409

ABSTRACT

An unanticipated and tremendous amount of the noncoding sequence of the human genome is transcribed. Long noncoding RNAs (lncRNAs) constitute a significant fraction of non-protein-coding transcripts; however, their functions remain enigmatic. We demonstrate that deletions of a small noncoding differentially methylated region at 16q24.1, including lncRNA genes, cause a lethal lung developmental disorder, alveolar capillary dysplasia with misalignment of pulmonary veins (ACD/MPV), with parent-of-origin effects. We identify overlapping deletions 250 kb upstream of FOXF1 in nine patients with ACD/MPV that arose de novo specifically on the maternally inherited chromosome and delete lung-specific lncRNA genes. These deletions define a distant cis-regulatory region that harbors, besides lncRNA genes, also a differentially methylated CpG island, binds GLI2 depending on the methylation status of this CpG island, and physically interacts with and up-regulates the FOXF1 promoter. We suggest that lung-transcribed 16q24.1 lncRNAs may contribute to long-range regulation of FOXF1 by GLI2 and other transcription factors. Perturbation of lncRNA-mediated chromatin interactions may, in general, be responsible for position effect phenomena and potentially cause many disorders of human development.


Subject(s)
DNA Copy Number Variations , DNA Methylation , Persistent Fetal Circulation Syndrome/genetics , RNA, Long Noncoding/genetics , Chromatin/metabolism , Chromosomes, Human, Pair 16/genetics , CpG Islands , Enhancer Elements, Genetic , Fatal Outcome , Forkhead Transcription Factors/genetics , Forkhead Transcription Factors/metabolism , Gene Expression Regulation , Genomic Imprinting , HEK293 Cells , Humans , Infant, Newborn , Kruppel-Like Transcription Factors/metabolism , Nuclear Proteins/metabolism , Persistent Fetal Circulation Syndrome/diagnosis , Promoter Regions, Genetic , RNA, Long Noncoding/metabolism , Sequence Deletion , Transcription, Genetic , Zinc Finger Protein Gli2
5.
Genome Res ; 23(9): 1395-409, 2013 Sep.
Article in English | MEDLINE | ID: mdl-23657883

ABSTRACT

We delineated and analyzed directly oriented paralogous low-copy repeats (DP-LCRs) in the most recent version of the human haploid reference genome. The computationally defined DP-LCRs were cross-referenced with our chromosomal microarray analysis (CMA) database of 25,144 patients subjected to genome-wide assays. This computationally guided approach to the empirically derived large data set allowed us to investigate genomic rearrangement relative frequencies and identify new loci for recurrent nonallelic homologous recombination (NAHR)-mediated copy-number variants (CNVs). The most commonly observed recurrent CNVs were NPHP1 duplications (233), CHRNA7 duplications (175), and 22q11.21 deletions (DiGeorge/velocardiofacial syndrome, 166). In the ∼25% of CMA cases for which parental studies were available, we identified 190 de novo recurrent CNVs. In this group, the most frequently observed events were deletions of 22q11.21 (48), 16p11.2 (autism, 34), and 7q11.23 (Williams-Beuren syndrome, 11). Several features of DP-LCRs, including length, distance between NAHR substrate elements, DNA sequence identity (fraction matching), GC content, and concentration of the homologous recombination (HR) hot spot motif 5'-CCNCCNTNNCCNC-3', correlate with the frequencies of the recurrent CNVs events. Four novel adjacent DP-LCR-flanked and NAHR-prone regions, involving 2q12.2q13, were elucidated in association with novel genomic disorders. Our study quantitates genome architectural features responsible for NAHR-mediated genomic instability and further elucidates the role of NAHR in human disease.


Subject(s)
Alleles , Chromosome Disorders/genetics , DNA Copy Number Variations , Genetic Diseases, Inborn/genetics , Homologous Recombination , Adaptor Proteins, Signal Transducing/genetics , Base Composition , Chromosome Deletion , Chromosome Duplication , Cytoskeletal Proteins , Genome, Human , Humans , Membrane Proteins/genetics , Nucleotide Motifs , alpha7 Nicotinic Acetylcholine Receptor/genetics
6.
Angew Chem Int Ed Engl ; 55(20): 5904-37, 2016 05 10.
Article in English | MEDLINE | ID: mdl-27062365

ABSTRACT

Exactly half a century has passed since the launch of the first documented research project (1965 Dendral) on computer-assisted organic synthesis. Many more programs were created in the 1970s and 1980s but the enthusiasm of these pioneering days had largely dissipated by the 2000s, and the challenge of teaching the computer how to plan organic syntheses earned itself the reputation of a "mission impossible". This is quite curious given that, in the meantime, computers have "learned" many other skills that had been considered exclusive domains of human intellect and creativity-for example, machines can nowadays play chess better than human world champions and they can compose classical music pleasant to the human ear. Although there have been no similar feats in organic synthesis, this Review argues that to concede defeat would be premature. Indeed, bringing together the combination of modern computational power and algorithms from graph/network theory, chemical rules (with full stereo- and regiochemistry) coded in appropriate formats, and the elements of quantum mechanics, the machine can finally be "taught" how to plan syntheses of non-trivial organic molecules in a matter of seconds to minutes. The Review begins with an overview of some basic theoretical concepts essential for the big-data analysis of chemical syntheses. It progresses to the problem of optimizing pathways involving known reactions. It culminates with discussion of algorithms that allow for a completely de novo and fully automated design of syntheses leading to relatively complex targets, including those that have not been made before. Of course, there are still things to be improved, but computers are finally becoming relevant and helpful to the practice of organic-synthetic planning. Paraphrasing Churchill's famous words after the Allies' first major victory over the Axis forces in Africa, it is not the end, it is not even the beginning of the end, but it is the end of the beginning for the computer-assisted synthesis planning. The machine is here to stay.

7.
BMC Biol ; 12: 74, 2014 Sep 23.
Article in English | MEDLINE | ID: mdl-25246103

ABSTRACT

BACKGROUND: Recurrent rearrangements of the human genome resulting in disease or variation are mainly mediated by non-allelic homologous recombination (NAHR) between low-copy repeats. However, other genomic structures, including AT-rich palindromes and retroviruses, have also been reported to underlie recurrent structural rearrangements. Notably, recurrent deletions of Yq12 conveying azoospermia, as well as non-pathogenic reciprocal duplications, are mediated by human endogenous retroviral elements (HERVs). We hypothesized that HERV elements throughout the genome can serve as substrates for genomic instability and result in human copy-number variation (CNV). RESULTS: We developed parameters to identify HERV elements similar to those that mediate Yq12 rearrangements as well as recurrent deletions of 3q13.2q13.31. We used these parameters to identify HERV pairs genome-wide that may cause instability. Our analysis highlighted 170 pairs, flanking 12.1% of the genome. We cross-referenced these predicted susceptibility regions with CNVs from our clinical databases for potentially HERV-mediated rearrangements and identified 78 CNVs. We subsequently molecularly confirmed recurrent deletion and duplication rearrangements at four loci in ten individuals, including reciprocal rearrangements at two loci. Breakpoint sequencing revealed clustering in regions of high sequence identity enriched in PRDM9-mediated recombination hotspot motifs. CONCLUSIONS: The presence of deletions and reciprocal duplications suggests NAHR as the causative mechanism of HERV-mediated CNV, even though the length and the sequence homology of the HERV elements are less than currently thought to be required for NAHR. We propose that in addition to HERVs, other repetitive elements, such as long interspersed elements, may also be responsible for the formation of recurrent CNVs via NAHR.


Subject(s)
DNA Copy Number Variations , DNA, Viral/genetics , Endogenous Retroviruses/genetics , Genome, Human , Genomic Instability , Base Sequence , Chromosome Breakpoints , DNA, Viral/metabolism , Endogenous Retroviruses/metabolism , Homologous Recombination , Humans , Molecular Sequence Data , Repetitive Sequences, Nucleic Acid , Sequence Deletion
8.
Angew Chem Int Ed Engl ; 54(37): 10797-801, 2015 Sep 07.
Article in English | MEDLINE | ID: mdl-26215084

ABSTRACT

A thermodynamically guided calculation of free energies of substrate and product molecules allows for the estimation of the yields of organic reactions. The non-ideality of the system and the solvent effects are taken into account through the activity coefficients calculated at the molecular level by perturbed-chain statistical associating fluid theory (PC-SAFT). The model is iteratively trained using a diverse set of reactions with yields that have been reported previously. This trained model can then estimate a priori the yields of reactions not included in the training set with an accuracy of ca. ±15 %. This ability has the potential to translate into significant economic savings through the selection and then execution of only those reactions that can proceed in good yields.

9.
Hum Mutat ; 34(1): 210-20, 2013 Jan.
Article in English | MEDLINE | ID: mdl-22965494

ABSTRACT

Inverse paralogous low-copy repeats (IP-LCRs) can cause genome instability by nonallelic homologous recombination (NAHR)-mediated balanced inversions. When disrupting a dosage-sensitive gene(s), balanced inversions can lead to abnormal phenotypes. We delineated the genome-wide distribution of IP-LCRs >1 kB in size with >95% sequence identity and mapped the genes, potentially intersected by an inversion, that overlap at least one of the IP-LCRs. Remarkably, our results show that 12.0% of the human genome is potentially susceptible to such inversions and 942 genes, 99 of which are on the X chromosome, are predicted to be disrupted secondary to such an inversion! In addition, IP-LCRs larger than 800 bp with at least 98% sequence identity (duplication/triplication facilitating IP-LCRs, DTIP-LCRs) were recently implicated in the formation of complex genomic rearrangements with a duplication-inverted triplication-duplication (DUP-TRP/INV-DUP) structure by a replication-based mechanism involving a template switch between such inverted repeats. We identified 1,551 DTIP-LCRs that could facilitate DUP-TRP/INV-DUP formation. Remarkably, 1,445 disease-associated genes are at risk of undergoing copy-number gain as they map to genomic intervals susceptible to the formation of DUP-TRP/INV-DUP complex rearrangements. We implicate inverted LCRs as a human genome architectural feature that could potentially be responsible for genomic instability associated with many human disease traits.


Subject(s)
Chromosome Inversion , Genome, Human/genetics , Genomic Instability , Segmental Duplications, Genomic/genetics , Chromosome Mapping , Gene Deletion , Gene Dosage , Gene Duplication , Gene Rearrangement , Genetic Predisposition to Disease/genetics , Humans , Models, Genetic , Recombination, Genetic
10.
Hum Mutat ; 34(10): 1415-23, 2013 Oct.
Article in English | MEDLINE | ID: mdl-23878096

ABSTRACT

We describe the molecular and clinical characterization of nine individuals with recurrent, 3.4-Mb, de novo deletions of 3q13.2-q13.31 detected by chromosomal microarray analysis. All individuals have hypotonia and language and motor delays; they variably express mild to moderate cognitive delays (8/9), abnormal behavior (7/9), and autism spectrum disorders (3/9). Common facial features include downslanting palpebral fissures with epicanthal folds, a slightly bulbous nose, and relative macrocephaly. Twenty-eight genes map to the deleted region, including four strong candidate genes, DRD3, ZBTB20, GAP43, and BOC, with important roles in neural and/or muscular development. Analysis of the breakpoint regions based on array data revealed directly oriented human endogenous retrovirus (HERV-H) elements of ~5 kb in size and of >95% DNA sequence identity flanking the deletion. Subsequent DNA sequencing revealed different deletion breakpoints and suggested nonallelic homologous recombination (NAHR) between HERV-H elements as a mechanism of deletion formation, analogous to HERV-I-flanked and NAHR-mediated AZFa deletions. We propose that similar HERV elements may also mediate other recurrent deletion and duplication events on a genome-wide scale. Observation of rare recurrent chromosomal events such as these deletions helps to further the understanding of mechanisms behind naturally occurring variation in the human genome and its contribution to genetic disease.


Subject(s)
Chromosome Deletion , Chromosomes, Human, Pair 3/genetics , Cognition Disorders/genetics , Developmental Disabilities/genetics , Endogenous Retroviruses/genetics , Muscle Hypotonia/genetics , Adolescent , Adult , Base Sequence , Child , Child, Preschool , Chromosome Breakpoints , Cognition Disorders/diagnosis , Comparative Genomic Hybridization , Developmental Disabilities/diagnosis , Facies , Female , Gene Order , Humans , Infant , Male , Molecular Sequence Data , Muscle Hypotonia/diagnosis , Phenotype , Sequence Alignment , Syndrome , Young Adult
SELECTION OF CITATIONS
SEARCH DETAIL