|

1.

Pacybara: accurate long-read sequencing for barcoded mutagenized allelic libraries.

Weile, Jochen; Ferra, Gabrielle; Boyle, Gabriel; Pendyala, Sriram; Amorosi, Clara; Yeh, Chiann-Ling; Cote, Atina G; Kishore, Nishka; Tabet, Daniel; van Loggerenberg, Warren; Rayhan, Ashyad; Fowler, Douglas M; Dunham, Maitreya J; Roth, Frederick P.

Bioinformatics ; 40(4)2024 Mar 29.

Article En | MEDLINE | ID: mdl-38569896

MOTIVATION: Long-read sequencing technologies, an attractive solution for many applications, often suffer from higher error rates. Alignment of multiple reads can improve base-calling accuracy, but some applications, e.g. sequencing mutagenized libraries where multiple distinct clones differ by one or few variants, require the use of barcodes or unique molecular identifiers. Unfortunately, sequencing errors can interfere with correct barcode identification, and a given barcode sequence may be linked to multiple independent clones within a given library. RESULTS: Here we focus on the target application of sequencing mutagenized libraries in the context of multiplexed assays of variant effects (MAVEs). MAVEs are increasingly used to create comprehensive genotype-phenotype maps that can aid clinical variant interpretation. Many MAVE methods use long-read sequencing of barcoded mutant libraries for accurate association of barcode with genotype. Existing long-read sequencing pipelines do not account for inaccurate sequencing or nonunique barcodes. Here, we describe Pacybara, which handles these issues by clustering long reads based on the similarities of (error-prone) barcodes while also detecting barcodes that have been associated with multiple genotypes. Pacybara also detects recombinant (chimeric) clones and reduces false positive indel calls. In three example applications, we show that Pacybara identifies and correctly resolves these issues. AVAILABILITY AND IMPLEMENTATION: Pacybara, freely available at https://github.com/rothlab/pacybara, is implemented using R, Python, and bash for Linux. It runs on GNU/Linux HPC clusters via Slurm, PBS, or GridEngine schedulers. A single-machine simplex version is also available.

High-Throughput Nucleotide Sequencing , Software , Sequence Analysis, DNA/methods , High-Throughput Nucleotide Sequencing/methods , Gene Library , Genotype , Cluster Analysis

2.

Genome-scale mapping of DNA damage suppressors through phenotypic CRISPR-Cas9 screens.

Zhao, Yichao; Tabet, Daniel; Rubio Contreras, Diana; Lao, Linjiang; Kousholt, Arne Nedergaard; Weile, Jochen; Melo, Henrique; Hoeg, Lisa; Feng, Sumin; Coté, Atina G; Lin, Zhen-Yuan; Setiaputra, Dheva; Jonkers, Jos; Gingras, Anne-Claude; Gómez Herreros, Fernando; Roth, Frederick P; Durocher, Daniel.

Mol Cell ; 83(15): 2792-2809.e9, 2023 08 03.

Article En | MEDLINE | ID: mdl-37478847

To maintain genome integrity, cells must accurately duplicate their genome and repair DNA lesions when they occur. To uncover genes that suppress DNA damage in human cells, we undertook flow-cytometry-based CRISPR-Cas9 screens that monitored DNA damage. We identified 160 genes whose mutation caused spontaneous DNA damage, a list enriched in essential genes, highlighting the importance of genomic integrity for cellular fitness. We also identified 227 genes whose mutation caused DNA damage in replication-perturbed cells. Among the genes characterized, we discovered that deoxyribose-phosphate aldolase DERA suppresses DNA damage caused by cytarabine (Ara-C) and that GNB1L, a gene implicated in 22q11.2 syndrome, promotes biogenesis of ATR and related phosphatidylinositol 3-kinase-related kinases (PIKKs). These results implicate defective PIKK biogenesis as a cause of some phenotypes associated with 22q11.2 syndrome. The phenotypic mapping of genes that suppress DNA damage therefore provides a rich resource to probe the cellular pathways that influence genome maintenance.

CRISPR-Cas Systems , DNA Damage , Humans , Mutation , DNA Repair , Phenotype

3.

satmut_utils: a simulation and variant calling package for multiplexed assays of variant effect.

Hoskins, Ian; Sun, Song; Cote, Atina; Roth, Frederick P; Cenik, Can.

Genome Biol ; 24(1): 82, 2023 04 20.

Article En | MEDLINE | ID: mdl-37081510

The impact of millions of individual genetic variants on molecular phenotypes in coding sequences remains unknown. Multiplexed assays of variant effect (MAVEs) are scalable methods to annotate relevant variants, but existing software lacks standardization, requires cumbersome configuration, and does not scale to large targets. We present satmut_utils as a flexible solution for simulation and variant quantification. We then benchmark MAVE software using simulated and real MAVE data. We finally determine mRNA abundance for thousands of cystathionine beta-synthase variants using two experimental methods. The satmut_utils package enables high-performance analysis of MAVEs and reveals the capability of variants to alter mRNA abundance.

High-Throughput Nucleotide Sequencing , Software , Computer Simulation , Phenotype , Exons , High-Throughput Nucleotide Sequencing/methods

4.

A comprehensive map of human glucokinase variant activity.

Gersing, Sarah; Cagiada, Matteo; Gebbia, Marinella; Gjesing, Anette P; Coté, Atina G; Seesankar, Gireesh; Li, Roujia; Tabet, Daniel; Weile, Jochen; Stein, Amelie; Gloyn, Anna L; Hansen, Torben; Roth, Frederick P; Lindorff-Larsen, Kresten; Hartmann-Petersen, Rasmus.

Genome Biol ; 24(1): 97, 2023 04 26.

Article En | MEDLINE | ID: mdl-37101203

BACKGROUND: Glucokinase (GCK) regulates insulin secretion to maintain appropriate blood glucose levels. Sequence variants can alter GCK activity to cause hyperinsulinemic hypoglycemia or hyperglycemia associated with GCK-maturity-onset diabetes of the young (GCK-MODY), collectively affecting up to 10 million people worldwide. Patients with GCK-MODY are frequently misdiagnosed and treated unnecessarily. Genetic testing can prevent this but is hampered by the challenge of interpreting novel missense variants. RESULT: Here, we exploit a multiplexed yeast complementation assay to measure both hyper- and hypoactive GCK variation, capturing 97% of all possible missense and nonsense variants. Activity scores correlate with in vitro catalytic efficiency, fasting glucose levels in carriers of GCK variants and with evolutionary conservation. Hypoactive variants are concentrated at buried positions, near the active site, and at a region of known importance for GCK conformational dynamics. Some hyperactive variants shift the conformational equilibrium towards the active state through a relative destabilization of the inactive conformation. CONCLUSION: Our comprehensive assessment of GCK variant activity promises to facilitate variant interpretation and diagnosis, expand our mechanistic understanding of hyperactive variants, and inform development of therapeutics targeting GCK.

Diabetes Mellitus, Type 2 , Glucokinase , Humans , Glucokinase/genetics , Glucokinase/chemistry , Diabetes Mellitus, Type 2/genetics , Diabetes Mellitus, Type 2/diagnosis , Mutation, Missense , Genetic Testing , Mutation

5.

Next-generation large-scale binary protein interaction network for Drosophila melanogaster.

Tang, Hong-Wen; Spirohn, Kerstin; Hu, Yanhui; Hao, Tong; Kovács, István A; Gao, Yue; Binari, Richard; Yang-Zhou, Donghui; Wan, Kenneth H; Bader, Joel S; Balcha, Dawit; Bian, Wenting; Booth, Benjamin W; Coté, Atina G; de Rouck, Steffi; Desbuleux, Alice; Goh, Kah Yong; Kim, Dae-Kyum; Knapp, Jennifer J; Lee, Wen Xing; Lemmens, Irma; Li, Cathleen; Li, Mian; Li, Roujia; Lim, Hyobin Julianne; Liu, Yifang; Luck, Katja; Markey, Dylan; Pollis, Carl; Rangarajan, Sudharshan; Rodiger, Jonathan; Schlabach, Sadie; Shen, Yun; Sheykhkarimli, Dayag; TeeKing, Bridget; Roth, Frederick P; Tavernier, Jan; Calderwood, Michael A; Hill, David E; Celniker, Susan E; Vidal, Marc; Perrimon, Norbert; Mohr, Stephanie E.

Nat Commun ; 14(1): 2162, 2023 04 15.

Article En | MEDLINE | ID: mdl-37061542

Generating reference maps of interactome networks illuminates genetic studies by providing a protein-centric approach to finding new components of existing pathways, complexes, and processes. We apply state-of-the-art methods to identify binary protein-protein interactions (PPIs) for Drosophila melanogaster. Four all-by-all yeast two-hybrid (Y2H) screens of > 10,000 Drosophila proteins result in the 'FlyBi' dataset of 8723 PPIs among 2939 proteins. Testing subsets of data from FlyBi and previous PPI studies using an orthogonal assay allows for normalization of data quality; subsequent integration of FlyBi and previous data results in an expanded binary Drosophila reference interaction network, DroRI, comprising 17,232 interactions among 6511 proteins. We use FlyBi data to generate an autophagy network, then validate in vivo using autophagy-related assays. The deformed wings (dwg) gene encodes a protein that is both a regulator and a target of autophagy. Altogether, these resources provide a foundation for building new hypotheses regarding protein networks and function.

Drosophila Proteins , Protein Interaction Maps , Animals , Protein Interaction Maps/genetics , Drosophila melanogaster/genetics , Drosophila melanogaster/metabolism , Drosophila/genetics , Saccharomyces cerevisiae/metabolism , Drosophila Proteins/genetics , Drosophila Proteins/metabolism , Protein Interaction Mapping/methods , Two-Hybrid System Techniques

6.

Pacybara: Accurate long-read sequencing for barcoded mutagenized allelic libraries.

Weile, Jochen; Ferra, Gabrielle; Boyle, Gabriel; Pendyala, Sriram; Amorosi, Clara; Yeh, Chiann-Ling; Cote, Atina G; Kishore, Nishka; Tabet, Daniel; van Loggerenberg, Warren; Rayhan, Ashyad; Fowler, Douglas M; Dunham, Maitreya J; Roth, Frederick P.

bioRxiv ; 2023 Dec 07.

Article En | MEDLINE | ID: mdl-36865234

Long read sequencing technologies, an attractive solution for many applications, often suffer from higher error rates. Alignment of multiple reads can improve base-calling accuracy, but some applications, e.g. sequencing mutagenized libraries where multiple distinct clones differ by one or few variants, require the use of barcodes or unique molecular identifiers. Unfortunately, sequencing errors can interfere with correct barcode identification, and a given barcode sequence may be linked to multiple independent clones within a given library. Here we focus on the target application of sequencing mutagenized libraries in the context of multiplexed assays of variant effects (MAVEs). MAVEs are increasingly used to create comprehensive genotype-phenotype maps that can aid clinical variant interpretation. Many MAVE methods use long-read sequencing of barcoded mutant libraries for accurate association of barcode with genotype. Existing long-read sequencing pipelines do not account for inaccurate sequencing or non-unique barcodes. Here, we describe Pacybara, which handles these issues by clustering long reads based on the similarities of (error-prone) barcodes while also detecting barcodes that have been associated with multiple genotypes. Pacybara also detects recombinant (chimeric) clones and reduces false positive indel calls. In three example applications, we show that Pacybara identifies and correctly resolves these issues.

7.

A proteome-scale map of the SARS-CoV-2-human contactome.

Kim, Dae-Kyum; Weller, Benjamin; Lin, Chung-Wen; Sheykhkarimli, Dayag; Knapp, Jennifer J; Dugied, Guillaume; Zanzoni, Andreas; Pons, Carles; Tofaute, Marie J; Maseko, Sibusiso B; Spirohn, Kerstin; Laval, Florent; Lambourne, Luke; Kishore, Nishka; Rayhan, Ashyad; Sauer, Mayra; Young, Veronika; Halder, Hridi; la Rosa, Nora Marín-de; Pogoutse, Oxana; Strobel, Alexandra; Schwehn, Patrick; Li, Roujia; Rothballer, Simin T; Altmann, Melina; Cassonnet, Patricia; Coté, Atina G; Vergara, Lena Elorduy; Hazelwood, Isaiah; Liu, Betty B; Nguyen, Maria; Pandiarajan, Ramakrishnan; Dohai, Bushra; Coloma, Patricia A Rodriguez; Poirson, Juline; Giuliana, Paolo; Willems, Luc; Taipale, Mikko; Jacob, Yves; Hao, Tong; Hill, David E; Brun, Christine; Twizere, Jean-Claude; Krappmann, Daniel; Heinig, Matthias; Falter, Claudia; Aloy, Patrick; Demeret, Caroline; Vidal, Marc; Calderwood, Michael A.

Nat Biotechnol ; 41(1): 140-149, 2023 01.

Article En | MEDLINE | ID: mdl-36217029

Understanding the mechanisms of coronavirus disease 2019 (COVID-19) disease severity to efficiently design therapies for emerging virus variants remains an urgent challenge of the ongoing pandemic. Infection and immune reactions are mediated by direct contacts between viral molecules and the host proteome, and the vast majority of these virus-host contacts (the 'contactome') have not been identified. Here, we present a systematic contactome map of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) with the human host encompassing more than 200 binary virus-host and intraviral protein-protein interactions. We find that host proteins genetically associated with comorbidities of severe illness and long COVID are enriched in SARS-CoV-2 targeted network communities. Evaluating contactome-derived hypotheses, we demonstrate that viral NSP14 activates nuclear factor κB (NF-κB)-dependent transcription, even in the presence of cytokine signaling. Moreover, for several tested host proteins, genetic knock-down substantially reduces viral replication. Additionally, we show for USP25 that this effect is phenocopied by the small-molecule inhibitor AZ1. Our results connect viral proteins to human genetic architecture for COVID-19 severity and offer potential therapeutic targets.

COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , COVID-19/genetics , Proteome/genetics , Post-Acute COVID-19 Syndrome , Virus Replication/genetics , Ubiquitin Thiolesterase/pharmacology

8.

A reference map of the human binary protein interactome.

Luck, Katja; Kim, Dae-Kyum; Lambourne, Luke; Spirohn, Kerstin; Begg, Bridget E; Bian, Wenting; Brignall, Ruth; Cafarelli, Tiziana; Campos-Laborie, Francisco J; Charloteaux, Benoit; Choi, Dongsic; Coté, Atina G; Daley, Meaghan; Deimling, Steven; Desbuleux, Alice; Dricot, Amélie; Gebbia, Marinella; Hardy, Madeleine F; Kishore, Nishka; Knapp, Jennifer J; Kovács, István A; Lemmens, Irma; Mee, Miles W; Mellor, Joseph C; Pollis, Carl; Pons, Carles; Richardson, Aaron D; Schlabach, Sadie; Teeking, Bridget; Yadav, Anupama; Babor, Mariana; Balcha, Dawit; Basha, Omer; Bowman-Colin, Christian; Chin, Suet-Feung; Choi, Soon Gang; Colabella, Claudia; Coppin, Georges; D'Amata, Cassandra; De Ridder, David; De Rouck, Steffi; Duran-Frigola, Miquel; Ennajdaoui, Hanane; Goebels, Florian; Goehring, Liana; Gopal, Anjali; Haddad, Ghazal; Hatchi, Elodie; Helmy, Mohamed; Jacob, Yves.

Nature ; 580(7803): 402-408, 2020 04.

Article En | MEDLINE | ID: mdl-32296183

Global insights into cellular organization and genome function require comprehensive understanding of the interactome networks that mediate genotype-phenotype relationships1,2. Here we present a human 'all-by-all' reference interactome map of human binary protein interactions, or 'HuRI'. With approximately 53,000 protein-protein interactions, HuRI has approximately four times as many such interactions as there are high-quality curated interactions from small-scale studies. The integration of HuRI with genome3, transcriptome4 and proteome5 data enables cellular function to be studied within most physiological or pathological cellular contexts. We demonstrate the utility of HuRI in identifying the specific subcellular roles of protein-protein interactions. Inferred tissue-specific networks reveal general principles for the formation of cellular context-specific functions and elucidate potential molecular mechanisms that might underlie tissue-specific phenotypes of Mendelian diseases. HuRI is a systematic proteome-wide reference that links genomic variation to phenotypic outcomes.

Proteome/metabolism , Extracellular Space/metabolism , Humans , Organ Specificity , Protein Interaction Mapping

9.

A proactive genotype-to-patient-phenotype map for cystathionine beta-synthase.

Sun, Song; Weile, Jochen; Verby, Marta; Wu, Yingzhou; Wang, Yang; Cote, Atina G; Fotiadou, Iosifina; Kitaygorodsky, Julia; Vidal, Marc; Rine, Jasper; Jesina, Pavel; Kozich, Viktor; Roth, Frederick P.

Genome Med ; 12(1): 13, 2020 01 30.

Article En | MEDLINE | ID: mdl-32000841

BACKGROUND: For the majority of rare clinical missense variants, pathogenicity status cannot currently be classified. Classical homocystinuria, characterized by elevated homocysteine in plasma and urine, is caused by variants in the cystathionine beta-synthase (CBS) gene, most of which are rare. With early detection, existing therapies are highly effective. METHODS: Damaging CBS variants can be detected based on their failure to restore growth in yeast cells lacking the yeast ortholog CYS4. This assay has only been applied reactively, after first observing a variant in patients. Using saturation codon-mutagenesis, en masse growth selection, and sequencing, we generated a comprehensive, proactive map of CBS missense variant function. RESULTS: Our CBS variant effect map far exceeds the performance of computational predictors of disease variants. Map scores correlated strongly with both disease severity (Spearman's Ï± = 0.9) and human clinical response to vitamin B6 (Ï± = 0.93). CONCLUSIONS: We demonstrate that highly multiplexed cell-based assays can yield proactive maps of variant function and patient response to therapy, even for rare variants not previously seen in the clinic.

Cystathionine beta-Synthase/genetics , Genetic Complementation Test/methods , Genetic Testing/methods , Homocystinuria/genetics , Mutation, Missense , Cystathionine beta-Synthase/metabolism , Genotype , Humans , Phenotype , Saccharomyces cerevisiae , Saccharomyces cerevisiae Proteins/genetics

10.

Highly Combinatorial Genetic Interaction Analysis Reveals a Multi-Drug Transporter Influence Network.

Celaj, Albi; Gebbia, Marinella; Musa, Louai; Cote, Atina G; Snider, Jamie; Wong, Victoria; Ko, Minjeong; Fong, Tiffany; Bansal, Paul; Mellor, Joseph C; Seesankar, Gireesh; Nguyen, Maria; Zhou, Shijie; Wang, Liangxi; Kishore, Nishka; Stagljar, Igor; Suzuki, Yo; Yachie, Nozomu; Roth, Frederick P.

Cell Syst ; 10(1): 25-38.e10, 2020 01 22.

Article En | MEDLINE | ID: mdl-31668799

Many traits are complex, depending non-additively on variant combinations. Even in model systems, such as the yeast S. cerevisiae, carrying out the high-order variant-combination testing needed to dissect complex traits remains a daunting challenge. Here, we describe "X-gene" genetic analysis (XGA), a strategy for engineering and profiling highly combinatorial gene perturbations. We demonstrate XGA on yeast ABC transporters by engineering 5,353 strains, each deleted for a random subset of 16 transporters, and profiling each strain's resistance to 16 compounds. XGA yielded 85,648 genotype-to-resistance observations, revealing high-order genetic interactions for 13 of the 16 transporters studied. Neural networks yielded intuitive functional models and guided exploration of fluconazole resistance, which was influenced non-additively by five genes. Together, our results showed that highly combinatorial genetic perturbation can functionally dissect complex traits, supporting pursuit of analogous strategies in human cells and other model systems.

Biological Transport/genetics , Membrane Transport Proteins/genetics , Humans

11.

A web application and service for imputing and visualizing missense variant effect maps.

Wu, Yingzhou; Weile, Jochen; Cote, Atina G; Sun, Song; Knapp, Jennifer; Verby, Marta; Roth, Frederick P.

Bioinformatics ; 35(17): 3191-3193, 2019 09 01.

Article En | MEDLINE | ID: mdl-30649215

SUMMARY: The promise of personalized genomic medicine depends on our ability to assess the functional impact of rare sequence variation. Multiplexed assays can experimentally measure the functional impact of missense variants on a massive scale. However, even after such assays, many missense variants remain poorly measured. Here we describe a software pipeline and application to impute missing information in experimentally determined variant effect maps. AVAILABILITY AND IMPLEMENTATION: http://impute.varianteffect.org source code: https://github.com/joewuca/imputation. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Software , Genome , Genomics , Mutation, Missense

12.

Mapping DNA damage-dependent genetic interactions in yeast via party mating and barcode fusion genetics.

Díaz-Mejía, J Javier; Celaj, Albi; Mellor, Joseph C; Coté, Atina; Balint, Attila; Ho, Brandon; Bansal, Pritpal; Shaeri, Fatemeh; Gebbia, Marinella; Weile, Jochen; Verby, Marta; Karkhanina, Anna; Zhang, YiFan; Wong, Cassandra; Rich, Justin; Prendergast, D'Arcy; Gupta, Gaurav; Öztürk, Sedide; Durocher, Daniel; Brown, Grant W; Roth, Frederick P.

Mol Syst Biol ; 14(5): e7985, 2018 05 28.

Article En | MEDLINE | ID: mdl-29807908

Condition-dependent genetic interactions can reveal functional relationships between genes that are not evident under standard culture conditions. State-of-the-art yeast genetic interaction mapping, which relies on robotic manipulation of arrays of double-mutant strains, does not scale readily to multi-condition studies. Here, we describe barcode fusion genetics to map genetic interactions (BFG-GI), by which double-mutant strains generated via en masse "party" mating can also be monitored en masse for growth to detect genetic interactions. By using site-specific recombination to fuse two DNA barcodes, each representing a specific gene deletion, BFG-GI enables multiplexed quantitative tracking of double mutants via next-generation sequencing. We applied BFG-GI to a matrix of DNA repair genes under nine different conditions, including methyl methanesulfonate (MMS), 4-nitroquinoline 1-oxide (4NQO), bleomycin, zeocin, and three other DNA-damaging environments. BFG-GI recapitulated known genetic interactions and yielded new condition-dependent genetic interactions. We validated and further explored a subnetwork of condition-dependent genetic interactions involving MAG1, SLX4, and genes encoding the Shu complex, and inferred that loss of the Shu complex leads to an increase in the activation of the checkpoint protein kinase Rad53.

Chromosome Mapping , DNA Barcoding, Taxonomic , DNA Damage , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae/genetics , DNA Repair , Epistasis, Genetic , Gene Deletion , Genetic Loci , High-Throughput Nucleotide Sequencing , Methyl Methanesulfonate , Models, Theoretical , Promoter Regions, Genetic , Reproducibility of Results

13.

A framework for exhaustively mapping functional missense variants.

Weile, Jochen; Sun, Song; Cote, Atina G; Knapp, Jennifer; Verby, Marta; Mellor, Joseph C; Wu, Yingzhou; Pons, Carles; Wong, Cassandra; van Lieshout, Natascha; Yang, Fan; Tasan, Murat; Tan, Guihong; Yang, Shan; Fowler, Douglas M; Nussbaum, Robert; Bloom, Jesse D; Vidal, Marc; Hill, David E; Aloy, Patrick; Roth, Frederick P.

Mol Syst Biol ; 13(12): 957, 2017 12 21.

Article En | MEDLINE | ID: mdl-29269382

Although we now routinely sequence human genomes, we can confidently identify only a fraction of the sequence variants that have a functional impact. Here, we developed a deep mutational scanning framework that produces exhaustive maps for human missense variants by combining random codon mutagenesis and multiplexed functional variation assays with computational imputation and refinement. We applied this framework to four proteins corresponding to six human genes: UBE2I (encoding SUMO E2 conjugase), SUMO1 (small ubiquitin-like modifier), TPK1 (thiamin pyrophosphokinase), and CALM1/2/3 (three genes encoding the protein calmodulin). The resulting maps recapitulate known protein features and confidently identify pathogenic variation. Assays potentially amenable to deep mutational scanning are already available for 57% of human disease genes, suggesting that DMS could ultimately map functional variation for all human disease genes.

DNA Mutational Analysis/methods , Mutation, Missense/genetics , Calmodulin/genetics , Disease/genetics , Humans , Machine Learning , Phenotype , Phylogeny , Reproducibility of Results , SUMO-1 Protein/genetics , Ubiquitin-Conjugating Enzymes/genetics , Ubiquitin-Conjugating Enzymes/metabolism

14.

Assessing predictions of fitness effects of missense mutations in SUMO-conjugating enzyme UBE2I.

Zhang, Jing; Kinch, Lisa N; Cong, Qian; Weile, Jochen; Sun, Song; Cote, Atina G; Roth, Frederick P; Grishin, Nick V.

Hum Mutat ; 38(9): 1051-1063, 2017 09.

Article En | MEDLINE | ID: mdl-28817247

The exponential growth of genomic variants uncovered by next-generation sequencing necessitates efficient and accurate computational analyses to predict their functional effects. A number of computational methods have been developed for the task, but few unbiased comparisons of their performance are available. To fill the gap, The Critical Assessment of Genome Interpretation (CAGI) comprehensively assesses phenotypic predictions on newly collected experimental datasets. Here, we present the results of the SUMO conjugase challenge where participants were predicting functional effects of missense mutations in human SUMO-conjugating enzyme UBE2I. The performance of the predictors is similar to each other and is far from perfection. Evolutionary information from sequence alignments dominates the success: deleterious mutations at conserved positions and benign mutations at variable positions are accurately predicted. Prediction accuracy of other mutations remains unsatisfactory, and this fast-growing field of research is yet to learn the use of spatial structure information to improve the predictions significantly.

Computational Biology/methods , Mutation, Missense , Ubiquitin-Conjugating Enzymes/genetics , Ubiquitin-Conjugating Enzymes/metabolism , Databases, Genetic , Evolution, Molecular , High-Throughput Nucleotide Sequencing , Humans , Models, Molecular , Protein Binding , Selection, Genetic , Sequence Alignment , Ubiquitin-Conjugating Enzymes/chemistry

15.

Exploring genetic suppression interactions on a global scale.

van Leeuwen, Jolanda; Pons, Carles; Mellor, Joseph C; Yamaguchi, Takafumi N; Friesen, Helena; Koschwanez, John; Usaj, Mojca Mattiazzi; Pechlaner, Maria; Takar, Mehmet; Usaj, Matej; VanderSluis, Benjamin; Andrusiak, Kerry; Bansal, Pritpal; Baryshnikova, Anastasia; Boone, Claire E; Cao, Jessica; Cote, Atina; Gebbia, Marinella; Horecka, Gene; Horecka, Ira; Kuzmin, Elena; Legro, Nicole; Liang, Wendy; van Lieshout, Natascha; McNee, Margaret; San Luis, Bryan-Joseph; Shaeri, Fatemeh; Shuteriqi, Ermira; Sun, Song; Yang, Lu; Youn, Ji-Young; Yuen, Michael; Costanzo, Michael; Gingras, Anne-Claude; Aloy, Patrick; Oostenbrink, Chris; Murray, Andrew; Graham, Todd R; Myers, Chad L; Andrews, Brenda J; Roth, Frederick P; Boone, Charles.

Science ; 354(6312)2016 11 04.

Article En | MEDLINE | ID: mdl-27811238

Genetic suppression occurs when the phenotypic defects caused by a mutation in a particular gene are rescued by a mutation in a second gene. To explore the principles of genetic suppression, we examined both literature-curated and unbiased experimental data, involving systematic genetic mapping and whole-genome sequencing, to generate a large-scale suppression network among yeast genes. Most suppression pairs identified novel relationships among functionally related genes, providing new insights into the functional wiring diagram of the cell. In addition to suppressor mutations, we identified frequent secondary mutations,in a subset of genes, that likely cause a delay in the onset of stationary phase, which appears to promote their enrichment within a propagating population. These findings allow us to formulate and quantify general mechanisms of genetic suppression.

Gene Regulatory Networks , Genes, Fungal , Genes, Suppressor , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae/genetics , Suppression, Genetic , Cell Physiological Phenomena/genetics , Chromosome Mapping

16.

Pooled-matrix protein interaction screens using Barcode Fusion Genetics.

Yachie, Nozomu; Petsalaki, Evangelia; Mellor, Joseph C; Weile, Jochen; Jacob, Yves; Verby, Marta; Ozturk, Sedide B; Li, Siyang; Cote, Atina G; Mosca, Roberto; Knapp, Jennifer J; Ko, Minjeong; Yu, Analyn; Gebbia, Marinella; Sahni, Nidhi; Yi, Song; Tyagi, Tanya; Sheykhkarimli, Dayag; Roth, Jonathan F; Wong, Cassandra; Musa, Louai; Snider, Jamie; Liu, Yi-Chun; Yu, Haiyuan; Braun, Pascal; Stagljar, Igor; Hao, Tong; Calderwood, Michael A; Pelletier, Laurence; Aloy, Patrick; Hill, David E; Vidal, Marc; Roth, Frederick P.

Mol Syst Biol ; 12(4): 863, 2016 Apr 22.

Article En | MEDLINE | ID: mdl-27107012

High-throughput binary protein interaction mapping is continuing to extend our understanding of cellular function and disease mechanisms. However, we remain one or two orders of magnitude away from a complete interaction map for humans and other major model organisms. Completion will require screening at substantially larger scales with many complementary assays, requiring further efficiency gains in proteome-scale interaction mapping. Here, we report Barcode Fusion Genetics-Yeast Two-Hybrid (BFG-Y2H), by which a full matrix of protein pairs can be screened in a single multiplexed strain pool. BFG-Y2H uses Cre recombination to fuse DNA barcodes from distinct plasmids, generating chimeric protein-pair barcodes that can be quantified via next-generation sequencing. We applied BFG-Y2H to four different matrices ranging in scale from ~25 K to 2.5 M protein pairs. The results show that BFG-Y2H increases the efficiency of protein matrix screening, with quality that is on par with state-of-the-art Y2H methods.

Centrosome/metabolism , Protein Interaction Mapping/methods , Proteome/metabolism , Saccharomyces cerevisiae/genetics , Chromosomes, Human/metabolism , Gene Library , High-Throughput Nucleotide Sequencing , Humans , Protein Binding , Two-Hybrid System Techniques

17.

Determination and inference of eukaryotic transcription factor sequence specificity.

Weirauch, Matthew T; Yang, Ally; Albu, Mihai; Cote, Atina G; Montenegro-Montero, Alejandro; Drewe, Philipp; Najafabadi, Hamed S; Lambert, Samuel A; Mann, Ishminder; Cook, Kate; Zheng, Hong; Goity, Alejandra; van Bakel, Harm; Lozano, Jean-Claude; Galli, Mary; Lewsey, Mathew G; Huang, Eryong; Mukherjee, Tuhin; Chen, Xiaoting; Reece-Hoyes, John S; Govindarajan, Sridhar; Shaulsky, Gad; Walhout, Albertha J M; Bouget, François-Yves; Ratsch, Gunnar; Larrondo, Luis F; Ecker, Joseph R; Hughes, Timothy R.

Cell ; 158(6): 1431-1443, 2014 Sep 11.

Article En | MEDLINE | ID: mdl-25215497

Transcription factor (TF) DNA sequence preferences direct their regulatory activity, but are currently known for only â¼1% of eukaryotic TFs. Broadly sampling DNA-binding domain (DBD) types from multiple eukaryotic clades, we determined DNA sequence preferences for >1,000 TFs encompassing 54 different DBD classes from 131 diverse eukaryotes. We find that closely related DBDs almost always have very similar DNA sequence preferences, enabling inference of motifs for â¼34% of the â¼170,000 known or predicted eukaryotic TFs. Sequences matching both measured and inferred motifs are enriched in chromatin immunoprecipitation sequencing (ChIP-seq) peaks and upstream of transcription start sites in diverse eukaryotic lineages. SNPs defining expression quantitative trait loci in Arabidopsis promoters are also enriched for predicted TF binding sites. Importantly, our motif "library" can be used to identify specific TFs whose binding may be altered by human disease risk alleles. These data present a powerful resource for mapping transcriptional networks across eukaryotes.

Arabidopsis/genetics , Nucleotide Motifs , Sequence Analysis, DNA , Transcription Factors/metabolism , Arabidopsis/metabolism , Chromatin Immunoprecipitation , Humans , Polymorphism, Single Nucleotide , Promoter Regions, Genetic , Protein Binding , Quantitative Trait Loci

18.

Evaluation of methods for modeling transcription factor sequence specificity.

Weirauch, Matthew T; Cote, Atina; Norel, Raquel; Annala, Matti; Zhao, Yue; Riley, Todd R; Saez-Rodriguez, Julio; Cokelaer, Thomas; Vedenko, Anastasia; Talukder, Shaheynoor; Bussemaker, Harmen J; Morris, Quaid D; Bulyk, Martha L; Stolovitzky, Gustavo; Hughes, Timothy R.

Nat Biotechnol ; 31(2): 126-34, 2013 Feb.

Article En | MEDLINE | ID: mdl-23354101

Genomic analyses often involve scanning for potential transcription factor (TF) binding sites using models of the sequence specificity of DNA binding proteins. Many approaches have been developed to model and learn a protein's DNA-binding specificity, but these methods have not been systematically compared. Here we applied 26 such approaches to in vitro protein binding microarray data for 66 mouse TFs belonging to various families. For nine TFs, we also scored the resulting motif models on in vivo data, and found that the best in vitro-derived motifs performed similarly to motifs derived from the in vivo data. Our results indicate that simple models based on mononucleotide position weight matrices trained by the best methods perform similarly to more complex models for most TFs examined, but fall short in specific cases (<10% of the TFs examined here). In addition, the best-performing motifs typically have relatively low information content, consistent with widespread degeneracy in eukaryotic TF sequence preferences.

DNA-Binding Proteins/genetics , Nucleotide Motifs/genetics , Position-Specific Scoring Matrices , Transcription Factors , Algorithms , Animals , Computational Biology , DNA-Binding Proteins/chemistry , Genome , Mice , Protein Array Analysis , Transcription Factors/genetics , Transcription Factors/metabolism

19.

The draft genome and transcriptome of Cannabis sativa.

van Bakel, Harm; Stout, Jake M; Cote, Atina G; Tallon, Carling M; Sharpe, Andrew G; Hughes, Timothy R; Page, Jonathan E.

Genome Biol ; 12(10): R102, 2011 Oct 20.

Article En | MEDLINE | ID: mdl-22014239

BACKGROUND: Cannabis sativa has been cultivated throughout human history as a source of fiber, oil and food, and for its medicinal and intoxicating properties. Selective breeding has produced cannabis plants for specific uses, including high-potency marijuana strains and hemp cultivars for fiber and seed production. The molecular biology underlying cannabinoid biosynthesis and other traits of interest is largely unexplored. RESULTS: We sequenced genomic DNA and RNA from the marijuana strain Purple Kush using shortread approaches. We report a draft haploid genome sequence of 534 Mb and a transcriptome of 30,000 genes. Comparison of the transcriptome of Purple Kush with that of the hemp cultivar 'Finola' revealed that many genes encoding proteins involved in cannabinoid and precursor pathways are more highly expressed in Purple Kush than in 'Finola'. The exclusive occurrence of Δ9-tetrahydrocannabinolic acid synthase in the Purple Kush transcriptome, and its replacement by cannabidiolic acid synthase in 'Finola', may explain why the psychoactive cannabinoid Δ9-tetrahydrocannabinol (THC) is produced in marijuana but not in hemp. Resequencing the hemp cultivars 'Finola' and 'USO-31' showed little difference in gene copy numbers of cannabinoid pathway enzymes. However, single nucleotide variant analysis uncovered a relatively high level of variation among four cannabis types, and supported a separation of marijuana and hemp. CONCLUSIONS: The availability of the Cannabis sativa genome enables the study of a multifunctional plant that occupies a unique role in human culture. Its availability will aid the development of therapeutic marijuana strains with tailored cannabinoid profiles and provide a basis for the breeding of hemp with improved agronomic characteristics.

Cannabis/genetics , DNA, Plant/genetics , Genome, Plant , RNA, Plant/genetics , Transcriptome , Base Sequence , Breeding , Cannabis/enzymology , Dronabinol/metabolism , Flowers/genetics , Gene Dosage , Gene Expression Profiling , Gene Expression Regulation, Plant , Intramolecular Oxidoreductases/genetics , Molecular Sequence Data , Polymorphism, Single Nucleotide , Pseudogenes , Seeds/genetics

20.

Structural basis for recognition of AT-rich DNA by unrelated xenogeneic silencing proteins.

Gordon, Blair R G; Li, Yifei; Cote, Atina; Weirauch, Matthew T; Ding, Pengfei; Hughes, Timothy R; Navarre, William Wiley; Xia, Bin; Liu, Jun.

Proc Natl Acad Sci U S A ; 108(26): 10690-5, 2011 Jun 28.

Article En | MEDLINE | ID: mdl-21673140

H-NS and Lsr2 are nucleoid-associated proteins from Gram-negative bacteria and Mycobacteria, respectively, that play an important role in the silencing of horizontally acquired foreign DNA that is more AT-rich than the resident genome. Despite the fact that Lsr2 and H-NS proteins are dissimilar in sequence and structure, they serve apparently similar functions and can functionally complement one another. The mechanism by which these xenogeneic silencers selectively target AT-rich DNA has been enigmatic. We performed high-resolution protein binding microarray analysis to simultaneously assess the binding preference of H-NS and Lsr2 for all possible 8-base sequences. Concurrently, we performed a detailed structure-function relationship analysis of their C-terminal DNA binding domains by NMR. Unexpectedly, we found that H-NS and Lsr2 use a common DNA binding mechanism where a short loop containing a "Q/RGR" motif selectively interacts with the DNA minor groove, where the highest affinity is for AT-rich sequences that lack A-tracts. Mutations of the Q/RGR motif abolished DNA binding activity. Netropsin, a DNA minor groove-binding molecule effectively outcompeted H-NS and Lsr2 for binding to AT-rich sequences. These results provide a unified molecular mechanism to explain findings related to xenogeneic silencing proteins, including their lack of apparent sequence specificity but preference for AT-rich sequences. Our findings also suggest that structural information contained within the DNA minor groove is deciphered by xenogeneic silencing proteins to distinguish genetic material that is self from nonself.

AT Rich Sequence , Bacterial Proteins/metabolism , DNA-Binding Proteins/metabolism , DNA/metabolism , Nucleic Acid Conformation , Amino Acid Sequence , Bacterial Proteins/chemistry , Base Sequence , DNA/chemistry , DNA-Binding Proteins/chemistry , Models, Molecular , Molecular Sequence Data , Nuclear Magnetic Resonance, Biomolecular , Sequence Homology, Amino Acid