Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
J Chem Inf Model ; 63(7): 1852-1857, 2023 04 10.
Artigo em Inglês | MEDLINE | ID: mdl-36977316

RESUMO

To solve recurring problems in drug discovery, matched molecular pair (MMP) analysis is used to understand relationships between chemical structure and function. For the MMP analysis of large data sets (>10,000 compounds), available tools lack flexible search and visualization functionality and require computational expertise. Here, we present Matcher, an open-source application for MMP analysis, with novel search algorithms and fully automated querying-to-visualization that requires no programming expertise. Matcher enables unprecedented control over the search and clustering of MMP transformations based on both variable fragment and constant environment structure, which is critical for disentangling relevant and irrelevant data to a given problem. Users can exert such control through a built-in chemical sketcher and with a few mouse clicks can navigate between resulting MMP transformations, statistics, property distribution graphs, and structures with raw experimental data, for confident and accelerated decision making. Matcher can be used with any collection of structure/property data; here, we demonstrate usage with a public ChEMBL data set of about 20,000 small molecules with CYP3A4 and/or hERG inhibition data. Users can reproduce all examples demonstrated herein via unique links within Matcher's interface-a functionality that anyone can use to preserve and share their own analyses. Matcher and all its dependencies are open-source, can be used for free, and are available with containerized deployment from code at https://github.com/Merck/Matcher. Matcher makes large structure/property data sets more transparent than ever before and accelerates the data-driven solution of common problems in drug discovery.


Assuntos
Algoritmos , Software , Desenho de Fármacos , Descoberta de Drogas/métodos , Análise por Conglomerados
2.
J Chem Inf Model ; 62(5): 1259-1267, 2022 03 14.
Artigo em Inglês | MEDLINE | ID: mdl-35192366

RESUMO

Therapeutic peptides offer potential advantages over small molecules in terms of selectivity, affinity, and their ability to target "undruggable" proteins that are associated with a wide range of pathologies. Despite their importance, current molecular design capabilities that inform medicinal chemistry decisions on peptide programs are limited. More specifically, there are unmet needs for structure-activity relationship (SAR) analysis and visualization of linear, cyclic, and cross-linked peptides containing non-natural motifs, which are widely used in drug discovery. To bridge this gap, we developed PepSeA (Peptide Sequence Alignment and Visualization), an open-source, freely available package of sequence-based tools (https://github.com/Merck/PepSeA). PepSeA enables multiple sequence alignment of non-natural amino acids and enhanced visualization with the hierarchical editing language for macromolecules (HELM). Via stepwise SAR analysis of a ChEMBL peptide data set, we demonstrate the utility of PepSeA to accelerate decision making in lead optimization campaigns in pharmaceutical setting. PepSeA represents an initial attempt to expand cheminformatics capabilities for therapeutic peptides and to enable rapid and more efficient design-make-test cycles.


Assuntos
Peptídeos , Proteínas , Sequência de Aminoácidos , Quimioinformática , Peptídeos/química , Alinhamento de Sequência
3.
Nucleic Acids Res ; 48(13): 7154-7168, 2020 07 27.
Artigo em Inglês | MEDLINE | ID: mdl-32496538

RESUMO

Mono-ubiquitylation of histone H2B (H2Bub1) and phosphorylation of elongation factor Spt5 by cyclin-dependent kinase 9 (Cdk9) occur during transcription by RNA polymerase II (RNAPII), and are mutually dependent in fission yeast. It remained unclear whether Cdk9 and H2Bub1 cooperate to regulate the expression of individual genes. Here, we show that Cdk9 inhibition or H2Bub1 loss induces intragenic antisense transcription of ∼10% of fission yeast genes, with each perturbation affecting largely distinct subsets; ablation of both pathways de-represses antisense transcription of over half the genome. H2Bub1 and phospho-Spt5 have similar genome-wide distributions; both modifications are enriched, and directly proportional to each other, in coding regions, and decrease abruptly around the cleavage and polyadenylation signal (CPS). Cdk9-dependence of antisense suppression at specific genes correlates with high H2Bub1 occupancy, and with promoter-proximal RNAPII pausing. Genetic interactions link Cdk9, H2Bub1 and the histone deacetylase Clr6-CII, while combined Cdk9 inhibition and H2Bub1 loss impair Clr6-CII recruitment to chromatin and lead to decreased occupancy and increased acetylation of histones within gene coding regions. These results uncover novel interactions between co-transcriptional histone modification pathways, which link regulation of RNAPII transcription elongation to suppression of aberrant initiation.


Assuntos
Proteínas de Ciclo Celular/metabolismo , Quinase 9 Dependente de Ciclina/metabolismo , Histonas/metabolismo , RNA Polimerase II/metabolismo , Proteínas de Schizosaccharomyces pombe/metabolismo , Schizosaccharomyces/genética , Elongação da Transcrição Genética , Fosforilação , Fatores de Elongação da Transcrição/metabolismo , Ubiquitinação
4.
Nat Prod Rep ; 38(6): 1100-1108, 2021 06 23.
Artigo em Inglês | MEDLINE | ID: mdl-33245088

RESUMO

Covering: up to the end of 2020. The machine learning field can be defined as the study and application of algorithms that perform classification and prediction tasks through pattern recognition instead of explicitly defined rules. Among other areas, machine learning has excelled in natural language processing. As such methods have excelled at understanding written languages (e.g. English), they are also being applied to biological problems to better understand the "genomic language". In this review we focus on recent advances in applying machine learning to natural products and genomics, and how those advances are improving our understanding of natural product biology, chemistry, and drug discovery. We discuss machine learning applications in genome mining (identifying biosynthetic signatures in genomic data), predictions of what structures will be created from those genomic signatures, and the types of activity we might expect from those molecules. We further explore the application of these approaches to data derived from complex microbiomes, with a focus on the human microbiome. We also review challenges in leveraging machine learning approaches in the field, and how the availability of other "omics" data layers provides value. Finally, we provide insights into the challenges associated with interpreting machine learning models and the underlying biology and promises of applying machine learning to natural product drug discovery. We believe that the application of machine learning methods to natural product research is poised to accelerate the identification of new molecular entities that may be used to treat a variety of disease indications.


Assuntos
Produtos Biológicos , Genômica , Aprendizado de Máquina , Produtos Biológicos/química , Produtos Biológicos/farmacologia , Vias Biossintéticas/genética , Descoberta de Drogas , Humanos , Microbiota
5.
Nucleic Acids Res ; 47(18): e110, 2019 10 10.
Artigo em Inglês | MEDLINE | ID: mdl-31400112

RESUMO

Natural products represent a rich reservoir of small molecule drug candidates utilized as antimicrobial drugs, anticancer therapies, and immunomodulatory agents. These molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). The increase in full microbial genomes and similar resources has led to development of BGC prediction algorithms, although their precision and ability to identify novel BGC classes could be improved. Here we present a deep learning strategy (DeepBGC) that offers reduced false positive rates in BGC identification and an improved ability to extrapolate and identify novel BGC classes compared to existing machine-learning tools. We supplemented this with random forest classifiers that accurately predicted BGC product classes and potential chemical activity. Application of DeepBGC to bacterial genomes uncovered previously undetectable putative BGCs that may code for natural products with novel biologic activities. The improved accuracy and classification ability of DeepBGC represents a major addition to in-silico BGC identification.


Assuntos
Vias Biossintéticas/genética , Biologia Computacional/métodos , Mineração de Dados/métodos , Família Multigênica/genética , Aprendizado Profundo , Genoma , Genoma Bacteriano/genética
6.
Mol Biol Evol ; 36(8): 1612-1623, 2019 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-31077324

RESUMO

The relationship between DNA sequence, biochemical function, and molecular evolution is relatively well-described for protein-coding regions of genomes, but far less clear in noncoding regions, particularly, in eukaryote genomes. In part, this is because we lack a complete description of the essential noncoding elements in a eukaryote genome. To contribute to this challenge, we used saturating transposon mutagenesis to interrogate the Schizosaccharomyces pombe genome. We generated 31 million transposon insertions, a theoretical coverage of 2.4 insertions per genomic site. We applied a five-state hidden Markov model (HMM) to distinguish insertion-depleted regions from insertion biases. Both raw insertion-density and HMM-defined fitness estimates showed significant quantitative relationships to gene knockout fitness, genetic diversity, divergence, and expected functional regions based on transcription and gene annotations. Through several analyses, we conclude that transposon insertions produced fitness effects in 66-90% of the genome, including substantial portions of the noncoding regions. Based on the HMM, we estimate that 10% of the insertion depleted sites in the genome showed no signal of conservation between species and were weakly transcribed, demonstrating limitations of comparative genomics and transcriptomics to detect functional units. In this species, 3'- and 5'-untranslated regions were the most prominent insertion-depleted regions that were not represented in measures of constraint from comparative genomics. We conclude that the combination of transposon mutagenesis, evolutionary, and biochemical data can provide new insights into the relationship between genome function and molecular evolution.


Assuntos
Aptidão Genética , Genoma Fúngico , Schizosaccharomyces/genética , Modelos Genéticos , Mutagênese Insercional
7.
RNA ; 24(9): 1195-1213, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-29914874

RESUMO

Long noncoding RNAs (lncRNAs), which are longer than 200 nucleotides but often unstable, contribute a substantial and diverse portion to pervasive noncoding transcriptomes. Most lncRNAs are poorly annotated and understood, although several play important roles in gene regulation and diseases. Here we systematically uncover and analyze lncRNAs in Schizosaccharomyces pombe. Based on RNA-seq data from twelve RNA-processing mutants and nine physiological conditions, we identify 5775 novel lncRNAs, nearly 4× the previously annotated lncRNAs. The expression of most lncRNAs becomes strongly induced under the genetic and physiological perturbations, most notably during late meiosis. Most lncRNAs are cryptic and suppressed by three RNA-processing pathways: the nuclear exosome, cytoplasmic exonuclease, and RNAi. Double-mutant analyses reveal substantial coordination and redundancy among these pathways. We classify lncRNAs by their dominant pathway into cryptic unstable transcripts (CUTs), Xrn1-sensitive unstable transcripts (XUTs), and Dicer-sensitive unstable transcripts (DUTs). XUTs and DUTs are enriched for antisense lncRNAs, while CUTs are often bidirectional and actively translated. The cytoplasmic exonuclease, along with RNAi, dampens the expression of thousands of lncRNAs and mRNAs that become induced during meiosis. Antisense lncRNA expression mostly negatively correlates with sense mRNA expression in the physiological, but not the genetic conditions. Intergenic and bidirectional lncRNAs emerge from nucleosome-depleted regions, upstream of positioned nucleosomes. Our results highlight both similarities and differences to lncRNA regulation in budding yeast. This broad survey of the lncRNA repertoire and characteristics in S. pombe, and the interwoven regulatory pathways that target lncRNAs, provides a rich framework for their further functional analyses.


Assuntos
Exonucleases/metabolismo , Exossomos/metabolismo , RNA Longo não Codificante/genética , Schizosaccharomyces/genética , Análise de Sequência de RNA/métodos , Núcleo Celular/metabolismo , Citoplasma/enzimologia , Proteínas Fúngicas/metabolismo , Perfilação da Expressão Gênica/métodos , Regulação Fúngica da Expressão Gênica , Meiose , Anotação de Sequência Molecular , Mutação , Interferência de RNA , Estabilidade de RNA , RNA Fúngico/genética , RNA Longo não Codificante/química , Schizosaccharomyces/química , Schizosaccharomyces/enzimologia
8.
Genome Res ; 25(6): 884-96, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-25883323

RESUMO

Exon skipping is considered a principal mechanism by which eukaryotic cells expand their transcriptome and proteome repertoires, creating different splice variants with distinct cellular functions. Here we analyze RNA-seq data from 116 transcriptomes in fission yeast (Schizosaccharomyces pombe), covering multiple physiological conditions as well as transcriptional and RNA processing mutants. We applied brute-force algorithms to detect all possible exon-skipping events, which were widespread but rare compared to normal splicing events. Exon-skipping events increased in cells deficient for the nuclear exosome or the 5'-3' exonuclease Dhp1, and also at late stages of meiotic differentiation when nuclear-exosome transcripts decreased. The pervasive exon-skipping transcripts were stochastic, did not increase in specific physiological conditions, and were mostly present at less than one copy per cell, even in the absence of nuclear RNA surveillance and during late meiosis. These exon-skipping transcripts are therefore unlikely to be functional and may reflect splicing errors that are actively removed by nuclear RNA surveillance. The average splicing rate by exon skipping was ∼ 0.24% in wild type and ∼ 1.75% in nuclear exonuclease mutants. We also detected approximately 250 circular RNAs derived from single or multiple exons. These circular RNAs were rare and stochastic, although a few became stabilized during quiescence and in splicing mutants. Using an exhaustive search algorithm, we also uncovered thousands of previously unknown splice sites, indicating pervasive splicing; yet most of these splicing variants were cryptic and increased in nuclear degradation mutants. This study highlights widespread but low frequency alternative or aberrant splicing events that are targeted by nuclear RNA surveillance.


Assuntos
Éxons , Genoma Fúngico , RNA Nuclear/genética , Schizosaccharomyces/genética , Processamento Alternativo , Exorribonucleases/genética , Exorribonucleases/metabolismo , Meiose , RNA/genética , RNA/metabolismo , RNA Circular , RNA Nuclear/metabolismo , Schizosaccharomyces/metabolismo , Proteínas de Schizosaccharomyces pombe/genética , Proteínas de Schizosaccharomyces pombe/metabolismo , Alinhamento de Sequência , Análise de Sequência de RNA , Transcriptoma
9.
Genome Res ; 24(7): 1169-79, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24709818

RESUMO

Both canonical and alternative splicing of RNAs are governed by intronic sequence elements and produce transient lariat structures fastened by branch points within introns. To map precisely the location of branch points on a genomic scale, we developed LaSSO (Lariat Sequence Site Origin), a data-driven algorithm which utilizes RNA-seq data. Using fission yeast cells lacking the debranching enzyme Dbr1, LaSSO not only accurately identified canonical splicing events, but also pinpointed novel, but rare, exon-skipping events, which may reflect aberrantly spliced transcripts. Compromised intron turnover perturbed gene regulation at multiple levels, including splicing and protein translation. Notably, Dbr1 function was also critical for the expression of mitochondrial genes and for the processing of self-spliced mitochondrial introns. LaSSO showed better sensitivity and accuracy than algorithms used for computational branch-point prediction or for empirical branch-point determination. Even when applied to a human data set acquired in the presence of debranching activity, LaSSO identified both canonical and exon-skipping branch points. LaSSO thus provides an effective approach for defining high-resolution maps of branch-site sequences and intronic elements on a genomic scale. LaSSO should be useful to validate introns and uncover branch-point sequences in any eukaryote, and it could be integrated into RNA-seq pipelines.


Assuntos
Algoritmos , Mapeamento Cromossômico , Íntrons , Motivos de Nucleotídeos , Splicing de RNA , Sequências Reguladoras de Ácido Nucleico , Sequência de Bases , Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , Éxons , Deleção de Genes , Perfilação da Expressão Gênica , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Matrizes de Pontuação de Posição Específica , Precursores de RNA/genética , RNA Fúngico/genética , Schizosaccharomyces/genética , Transcrição Gênica , Transcriptoma
10.
Eukaryot Cell ; 12(11): 1472-89, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-24014766

RESUMO

The spliceosome is a dynamic macromolecular machine that catalyzes the removal of introns from pre-mRNA, yielding mature message. Schizosaccharomyces pombe Cwf10 (homolog of Saccharomyces cerevisiae Snu114 and human U5-116K), an integral member of the U5 snRNP, is a GTPase that has multiple roles within the splicing cycle. Cwf10/Snu114 family members are highly homologous to eukaryotic translation elongation factor EF2, and they contain a conserved N-terminal extension (NTE) to the EF2-like portion, predicted to be an intrinsically unfolded domain. Using S. pombe as a model system, we show that the NTE is not essential, but cells lacking this domain are defective in pre-mRNA splicing. Genetic interactions between cwf10-ΔNTE and other pre-mRNA splicing mutants are consistent with a role for the NTE in spliceosome activation and second-step catalysis. Characterization of Cwf10-NTE by various biophysical techniques shows that in solution the NTE contains regions of both structure and disorder. The first 23 highly conserved amino acids of the NTE are essential for its role in splicing but when overexpressed are not sufficient to restore pre-mRNA splicing to wild-type levels in cwf10-ΔNTE cells. When the entire NTE is overexpressed in the cwf10-ΔNTE background, it can complement the truncated Cwf10 protein in trans, and it immunoprecipitates a complex similar in composition to the late-stage U5.U2/U6 spliceosome. These data show that the structurally flexible NTE is capable of independently incorporating into the spliceosome and improving splicing function, possibly indicating a role for the NTE in stabilizing conformational rearrangements during a splice cycle.


Assuntos
GTP Fosfo-Hidrolases/metabolismo , Ribonucleoproteína Nuclear Pequena U5/metabolismo , Proteínas de Schizosaccharomyces pombe/metabolismo , Schizosaccharomyces/enzimologia , Motivos de Aminoácidos , Sequência de Aminoácidos , Sítios de Ligação , GTP Fosfo-Hidrolases/genética , Dados de Sequência Molecular , Mutação , Ligação Proteica , Estrutura Terciária de Proteína , Splicing de RNA , Ribonucleoproteína Nuclear Pequena U5/química , Ribonucleoproteína Nuclear Pequena U5/genética , Schizosaccharomyces/química , Schizosaccharomyces/genética , Proteínas de Schizosaccharomyces pombe/química , Proteínas de Schizosaccharomyces pombe/genética , Spliceossomos/metabolismo
11.
Drug Discov Today ; 29(3): 103884, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38219969

RESUMO

The volume of nucleic acid sequence data has exploded recently, amplifying the challenge of transforming data into meaningful information. Processing data can require an increasingly complex ecosystem of customized tools, which increases difficulty in communicating analyses in an understandable way yet is of sufficient detail to enable informed decisions or repeats. This can be of particular interest to institutions and companies communicating computations in a regulatory environment. BioCompute Objects (BCOs; an instance of pipeline documentation that conforms to the IEEE 2791-2020 standard) were developed as a standardized mechanism for analysis reporting. A suite of BCOs is presented, representing interconnected elements of a computation modeled after those that might be found in a regulatory submission but are shared publicly - in this case a pipeline designed to identify viral contaminants in biological manufacturing, such as for vaccines.


Assuntos
Biologia Computacional , Vacinas , Sequenciamento de Nucleotídeos em Larga Escala , Fluxo de Trabalho
12.
NAR Genom Bioinform ; 6(1): lqae028, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38482061

RESUMO

Recent COVID-19 vaccines unleashed the potential of mRNA-based therapeutics. A common bottleneck across mRNA-based therapeutic approaches is the rapid design of mRNA sequences that are translationally efficient, long-lived and non-immunogenic. Currently, an accessible software tool to aid in the design of such high-quality mRNA is lacking. Here, we present mRNAid, an open-source platform for therapeutic mRNA optimization, design and visualization that offers a variety of optimization strategies for sequence and structural features, allowing one to customize desired properties into their mRNA sequence. We experimentally demonstrate that transcripts optimized by mRNAid have characteristics comparable with commercially available sequences. To encompass additional aspects of mRNA design, we experimentally show that incorporation of certain uridine analogs and untranslated regions can further enhance stability, boost protein output and mitigate undesired immunogenicity effects. Finally, this study provides a roadmap for rational design of therapeutic mRNA transcripts.

13.
Bioinform Adv ; 3(1): vbad083, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37456510

RESUMO

Motivation: Despite the advent of next-generation sequencing technology and its widespread applications, Sanger sequencing remains instrumental for molecular biology subcloning work in biological and medical research and indispensable for drug discovery campaigns. Although Sanger sequencing technology has been long established, existing software for processing and visualization of trace file chromatograms is limited in terms of functionality, scalability and availability for commercial use. Results: To fill this gap, we developed TraceTrack, an open-source web application tool for batch alignment, analysis and visualization of Sanger trace files. TraceTrack offers high-throughput matching of trace files to reference sequences, rapid identification of mutations and an intuitive chromatogram analysis. Comparative analysis between TraceTrack and existing software tools highlights the advantages of TraceTrack with regards to batch processing, visualization and export functionalities. Availability and implementation: TraceTrack is available at https://github.com/MSDLLCpapers/TraceTrack and as a web application at https://tracetrack.dichlab.org. TraceTrack is a web application for batch processing and visualization of Sanger trace file chromatograms that meets the increasing demand of industrial sequence validation workflows in pharmaceutical settings. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

14.
MAbs ; 15(1): 2248671, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37610144

RESUMO

Identification of favorable biophysical properties for protein therapeutics as part of developability assessment is a crucial part of the preclinical development process. Successful prediction of such properties and bioassay results from calculated in silico features has potential to reduce the time and cost of delivering clinical-grade material to patients, but nevertheless has remained an ongoing challenge to the field. Here, we demonstrate an automated and flexible machine learning workflow designed to compare and identify the most powerful features from computationally derived physiochemical feature sets, generated from popular commercial software packages. We implement this workflow with medium-sized datasets of human and humanized IgG molecules to generate predictive regression models for two key developability endpoints, hydrophobicity and poly-specificity. The most important features discovered through the automated workflow corroborate several previous literature reports, and newly discovered features suggest directions for further research and potential model improvement.


Assuntos
Anticorpos Monoclonais , Imunoglobulina G , Humanos , Anticorpos Monoclonais/química , Aprendizado de Máquina
15.
Mol Syst Biol ; 7: 559, 2011 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-22186733

RESUMO

Strand-specific RNA sequencing of S. pombe revealed a highly structured programme of ncRNA expression at over 600 loci. Waves of antisense transcription accompanied sexual differentiation. A substantial proportion of ncRNA arose from mechanisms previously considered to be largely artefactual, including improper 3' termination and bidirectional transcription. Constitutive induction of the entire spk1+, spo4+, dis1+ and spo6+ antisense transcripts from an integrated, ectopic, locus disrupted their respective meiotic functions. This ability of antisense transcripts to disrupt gene function when expressed in trans suggests that cis production at native loci during sexual differentiation may also control gene function. Consistently, insertion of a marker gene adjacent to the dis1+ antisense start site mimicked ectopic antisense expression in reducing the levels of this microtubule regulator and abolishing the microtubule-dependent 'horsetail' stage of meiosis. Antisense production had no impact at any of these loci when the RNA interference (RNAi) machinery was removed. Thus, far from being simply 'genome chatter', this extensive ncRNA landscape constitutes a fundamental component in the controls that drive the complex programme of sexual differentiation in S. pombe.


Assuntos
Regulação Fúngica da Expressão Gênica , Meiose/genética , RNA Antissenso/genética , RNA não Traduzido/genética , Schizosaccharomyces/fisiologia , Bases de Dados de Ácidos Nucleicos , Genes Fúngicos , Fenômenos Microbiológicos , RNA Antissenso/metabolismo , RNA Fúngico , RNA Interferente Pequeno , RNA não Traduzido/metabolismo , Schizosaccharomyces/genética , Biologia de Sistemas , Transcrição Gênica
16.
MAbs ; 14(1): 2020203, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35133949

RESUMO

Despite recent advances in transgenic animal models and display technologies, humanization of mouse sequences remains one of the main routes for therapeutic antibody development. Traditionally, humanization is manual, laborious, and requires expert knowledge. Although automation efforts are advancing, existing methods are either demonstrated on a small scale or are entirely proprietary. To predict the immunogenicity risk, the human-likeness of sequences can be evaluated using existing humanness scores, but these lack diversity, granularity or interpretability. Meanwhile, immune repertoire sequencing has generated rich antibody libraries such as the Observed Antibody Space (OAS) that offer augmented diversity not yet exploited for antibody engineering. Here we present BioPhi, an open-source platform featuring novel methods for humanization (Sapiens) and humanness evaluation (OASis). Sapiens is a deep learning humanization method trained on the OAS using language modeling. Based on an in silico humanization benchmark of 177 antibodies, Sapiens produced sequences at scale while achieving results comparable to that of human experts. OASis is a granular, interpretable and diverse humanness score based on 9-mer peptide search in the OAS. OASis separated human and non-human sequences with high accuracy, and correlated with clinical immunogenicity. BioPhi thus offers an antibody design interface with automated methods that capture the richness of natural antibody repertoires to produce therapeutics with desired properties and accelerate antibody discovery campaigns. The BioPhi platform is accessible at https://biophi.dichlab.org and https://github.com/Merck/BioPhi.


Assuntos
Aprendizado Profundo , Animais , Anticorpos , Camundongos
17.
ACS Synth Biol ; 10(2): 357-370, 2021 02 19.
Artigo em Inglês | MEDLINE | ID: mdl-33433999

RESUMO

Protein engineering is the discipline of developing useful proteins for applications in research, therapeutic, and industrial processes by modification of naturally occurring proteins or by invention of de novo proteins. Modern protein engineering relies on the ability to rapidly generate and screen diverse libraries of mutant proteins. However, design of mutant libraries is typically hampered by scale and complexity, necessitating development of advanced automation and optimization tools that can improve efficiency and accuracy. At present, automated library design tools are functionally limited or not freely available. To address these issues, we developed Mutation Maker, an open source mutagenic oligo design software for large-scale protein engineering experiments. Mutation Maker is not only specifically tailored to multisite random and directed mutagenesis protocols, but also pioneers bespoke mutagenic oligo design for de novo gene synthesis workflows. Enabled by a novel bundle of orchestrated heuristics, optimization, constraint-satisfaction and backtracking algorithms, Mutation Maker offers a versatile toolbox for gene diversification design at industrial scale. Supported by in silico simulations and compelling experimental validation data, Mutation Maker oligos produce diverse gene libraries at high success rates irrespective of genes or vectors used. Finally, Mutation Maker was created as an extensible platform on the notion that directed evolution techniques will continue to evolve and revolutionize current and future-oriented applications.


Assuntos
Mutagênese Sítio-Dirigida/métodos , Mutagênese , Mutação , Oligonucleotídeos/genética , Proteínas/genética , Software , Algoritmos , Códon/genética , Simulação por Computador , Evolução Molecular Direcionada/métodos , Escherichia coli/genética , Biblioteca Gênica , Proteínas Mutantes
18.
Mol Cell Proteomics ; 7(5): 853-63, 2008 May.
Artigo em Inglês | MEDLINE | ID: mdl-17951628

RESUMO

There are a number of leukemogenic protein-tyrosine kinases (PTKs) associated with leukemic transformation. Although each is linked with a specific disease their functional activity poses the question whether they have a degree of commonality in their effects upon target cells. Exon array analysis of the effects of six leukemogenic PTKs (BCR/ABL, TEL/PDGFRbeta, FIP1/PDGFRalpha, D816V KIT, NPM/ALK, and FLT3ITD) revealed few common effects on the transcriptome. It is apparent, however, that proteome changes are not directly governed by transcriptome changes. Therefore, we assessed and used a new generation of iTRAQ tagging, enabling eight-channel relative quantification discovery proteomics, to analyze the effects of these six leukemogenic PTKs. Again these were found to have disparate effects on the proteome with few common targets. BCR/ABL had the greatest effect on the proteome and had more effects in common with FIP1/PDGFRalpha. The proteomic effects of the four type III receptor kinases were relatively remotely related. The only protein commonly affected was eosinophil-associated ribonuclease 7. Five of six PTKs affected the motility-related proteins CAPG and vimentin, although this did not correspond to changes in motility. However, correlation of the proteomics data with that from the exon microarray not only showed poor levels of correlation between transcript and protein levels but also revealed alternative patterns of regulation of the CAPG protein by different oncogenes, illustrating the utility of such a combined approach.


Assuntos
Leucemia/enzimologia , Espectrometria de Massas/métodos , Proteínas Oncogênicas/metabolismo , Proteínas Serina-Treonina Quinases/metabolismo , Proteoma/análise , Proteômica/métodos , Animais , Linhagem Celular , Quimiotaxia , Éxons , Perfilação da Expressão Gênica , Leucemia/genética , Camundongos , Análise de Sequência com Séries de Oligonucleotídeos , Proteínas Oncogênicas/genética , Biossíntese de Proteínas/genética , Proteínas Serina-Treonina Quinases/genética , Proteoma/genética , Proteoma/metabolismo
19.
BMC Bioinformatics ; 9: 118, 2008 Feb 25.
Artigo em Inglês | MEDLINE | ID: mdl-18298841

RESUMO

BACKGROUND: Previous studies comparing quantitative proteomics and microarray data have generally found poor correspondence between the two. We hypothesised that this might in part be because the different assays were targeting different parts of the expressed genome and might therefore be subjected to confounding effects from processes such as alternative splicing. RESULTS: Using a genome database as a platform for integration, we combined quantitative protein mass spectrometry with Affymetrix Exon array data at the level of individual exons. We found significantly higher degrees of correlation than have been previously observed (r = 0.808). The study was performed using cell lines in equilibrium in order to reduce a major potential source of biological variation, thus allowing the analysis to focus on the data integration methods in order to establish their performance. CONCLUSION: We conclude that part of the variation observed when integrating microarray and proteomics data may occur as a consequence both of the data analysis and of the high granularity to which studies have until recently been limited. The approach opens up the possibility for the first time of considering combined microarray and proteomics datasets at the level of individual exons and isoforms, important given the high proportion of alternative splicing observed in the human genome.


Assuntos
Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Mapeamento de Peptídeos/métodos , Proteoma/genética , Proteoma/metabolismo , Proteômica/métodos , Sítios de Splice de RNA/genética , Algoritmos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Integração de Sistemas
20.
Genome Biol ; 17(1): 240, 2016 11 25.
Artigo em Inglês | MEDLINE | ID: mdl-27887640

RESUMO

BACKGROUND: The control of energy metabolism is fundamental for cell growth and function and anomalies in it are implicated in complex diseases and ageing. Metabolism in yeast cells can be manipulated by supplying different carbon sources: yeast grown on glucose rapidly proliferates by fermentation, analogous to tumour cells growing by aerobic glycolysis, whereas on non-fermentable carbon sources metabolism shifts towards respiration. RESULTS: We screened deletion libraries of fission yeast to identify over 200 genes required for respiratory growth. Growth media and auxotrophic mutants strongly influenced respiratory metabolism. Most genes uncovered in the mutant screens have not been implicated in respiration in budding yeast. We applied gene-expression profiling approaches to compare steady-state fermentative and respiratory growth and to analyse the dynamic adaptation to respiratory growth. The transcript levels of most genes functioning in energy metabolism pathways are coherently tuned, reflecting anticipated differences in metabolic flows between fermenting and respiring cells. We show that acetyl-CoA synthase, rather than citrate lyase, is essential for acetyl-CoA synthesis in fission yeast. We also investigated the transcriptional response to mitochondrial damage by genetic or chemical perturbations, defining a retrograde response that involves the concerted regulation of distinct groups of nuclear genes that may avert harm from mitochondrial malfunction. CONCLUSIONS: This study provides a rich framework of the genetic and regulatory basis of energy metabolism in fission yeast and beyond, and it pinpoints weaknesses of commonly used auxotroph mutants for investigating metabolism. As a model for cellular energy regulation, fission yeast provides an attractive and complementary system to budding yeast.


Assuntos
Metabolismo Energético/genética , Perfilação da Expressão Gênica , Regulação Fúngica da Expressão Gênica , Schizosaccharomyces/genética , Schizosaccharomyces/metabolismo , Transcriptoma , Acetilcoenzima A/metabolismo , Adaptação Biológica , Núcleo Celular/genética , Núcleo Celular/metabolismo , Fermentação , Glucose/metabolismo , Sequenciamento de Nucleotídeos em Larga Escala , Mitocôndrias/genética , Mitocôndrias/metabolismo , Mutação , Transdução de Sinais
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA