RESUMO
To solve recurring problems in drug discovery, matched molecular pair (MMP) analysis is used to understand relationships between chemical structure and function. For the MMP analysis of large data sets (>10,000 compounds), available tools lack flexible search and visualization functionality and require computational expertise. Here, we present Matcher, an open-source application for MMP analysis, with novel search algorithms and fully automated querying-to-visualization that requires no programming expertise. Matcher enables unprecedented control over the search and clustering of MMP transformations based on both variable fragment and constant environment structure, which is critical for disentangling relevant and irrelevant data to a given problem. Users can exert such control through a built-in chemical sketcher and with a few mouse clicks can navigate between resulting MMP transformations, statistics, property distribution graphs, and structures with raw experimental data, for confident and accelerated decision making. Matcher can be used with any collection of structure/property data; here, we demonstrate usage with a public ChEMBL data set of about 20,000 small molecules with CYP3A4 and/or hERG inhibition data. Users can reproduce all examples demonstrated herein via unique links within Matcher's interface-a functionality that anyone can use to preserve and share their own analyses. Matcher and all its dependencies are open-source, can be used for free, and are available with containerized deployment from code at https://github.com/Merck/Matcher. Matcher makes large structure/property data sets more transparent than ever before and accelerates the data-driven solution of common problems in drug discovery.
Assuntos
Algoritmos , Software , Desenho de Fármacos , Descoberta de Drogas/métodos , Análise por ConglomeradosRESUMO
Therapeutic peptides offer potential advantages over small molecules in terms of selectivity, affinity, and their ability to target "undruggable" proteins that are associated with a wide range of pathologies. Despite their importance, current molecular design capabilities that inform medicinal chemistry decisions on peptide programs are limited. More specifically, there are unmet needs for structure-activity relationship (SAR) analysis and visualization of linear, cyclic, and cross-linked peptides containing non-natural motifs, which are widely used in drug discovery. To bridge this gap, we developed PepSeA (Peptide Sequence Alignment and Visualization), an open-source, freely available package of sequence-based tools (https://github.com/Merck/PepSeA). PepSeA enables multiple sequence alignment of non-natural amino acids and enhanced visualization with the hierarchical editing language for macromolecules (HELM). Via stepwise SAR analysis of a ChEMBL peptide data set, we demonstrate the utility of PepSeA to accelerate decision making in lead optimization campaigns in pharmaceutical setting. PepSeA represents an initial attempt to expand cheminformatics capabilities for therapeutic peptides and to enable rapid and more efficient design-make-test cycles.
Assuntos
Peptídeos , Proteínas , Sequência de Aminoácidos , Quimioinformática , Peptídeos/química , Alinhamento de SequênciaRESUMO
Mono-ubiquitylation of histone H2B (H2Bub1) and phosphorylation of elongation factor Spt5 by cyclin-dependent kinase 9 (Cdk9) occur during transcription by RNA polymerase II (RNAPII), and are mutually dependent in fission yeast. It remained unclear whether Cdk9 and H2Bub1 cooperate to regulate the expression of individual genes. Here, we show that Cdk9 inhibition or H2Bub1 loss induces intragenic antisense transcription of â¼10% of fission yeast genes, with each perturbation affecting largely distinct subsets; ablation of both pathways de-represses antisense transcription of over half the genome. H2Bub1 and phospho-Spt5 have similar genome-wide distributions; both modifications are enriched, and directly proportional to each other, in coding regions, and decrease abruptly around the cleavage and polyadenylation signal (CPS). Cdk9-dependence of antisense suppression at specific genes correlates with high H2Bub1 occupancy, and with promoter-proximal RNAPII pausing. Genetic interactions link Cdk9, H2Bub1 and the histone deacetylase Clr6-CII, while combined Cdk9 inhibition and H2Bub1 loss impair Clr6-CII recruitment to chromatin and lead to decreased occupancy and increased acetylation of histones within gene coding regions. These results uncover novel interactions between co-transcriptional histone modification pathways, which link regulation of RNAPII transcription elongation to suppression of aberrant initiation.
Assuntos
Proteínas de Ciclo Celular/metabolismo , Quinase 9 Dependente de Ciclina/metabolismo , Histonas/metabolismo , RNA Polimerase II/metabolismo , Proteínas de Schizosaccharomyces pombe/metabolismo , Schizosaccharomyces/genética , Elongação da Transcrição Genética , Fosforilação , Fatores de Elongação da Transcrição/metabolismo , UbiquitinaçãoRESUMO
Covering: up to the end of 2020. The machine learning field can be defined as the study and application of algorithms that perform classification and prediction tasks through pattern recognition instead of explicitly defined rules. Among other areas, machine learning has excelled in natural language processing. As such methods have excelled at understanding written languages (e.g. English), they are also being applied to biological problems to better understand the "genomic language". In this review we focus on recent advances in applying machine learning to natural products and genomics, and how those advances are improving our understanding of natural product biology, chemistry, and drug discovery. We discuss machine learning applications in genome mining (identifying biosynthetic signatures in genomic data), predictions of what structures will be created from those genomic signatures, and the types of activity we might expect from those molecules. We further explore the application of these approaches to data derived from complex microbiomes, with a focus on the human microbiome. We also review challenges in leveraging machine learning approaches in the field, and how the availability of other "omics" data layers provides value. Finally, we provide insights into the challenges associated with interpreting machine learning models and the underlying biology and promises of applying machine learning to natural product drug discovery. We believe that the application of machine learning methods to natural product research is poised to accelerate the identification of new molecular entities that may be used to treat a variety of disease indications.
Assuntos
Produtos Biológicos , Genômica , Aprendizado de Máquina , Produtos Biológicos/química , Produtos Biológicos/farmacologia , Vias Biossintéticas/genética , Descoberta de Drogas , Humanos , MicrobiotaRESUMO
Natural products represent a rich reservoir of small molecule drug candidates utilized as antimicrobial drugs, anticancer therapies, and immunomodulatory agents. These molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). The increase in full microbial genomes and similar resources has led to development of BGC prediction algorithms, although their precision and ability to identify novel BGC classes could be improved. Here we present a deep learning strategy (DeepBGC) that offers reduced false positive rates in BGC identification and an improved ability to extrapolate and identify novel BGC classes compared to existing machine-learning tools. We supplemented this with random forest classifiers that accurately predicted BGC product classes and potential chemical activity. Application of DeepBGC to bacterial genomes uncovered previously undetectable putative BGCs that may code for natural products with novel biologic activities. The improved accuracy and classification ability of DeepBGC represents a major addition to in-silico BGC identification.
Assuntos
Vias Biossintéticas/genética , Biologia Computacional/métodos , Mineração de Dados/métodos , Família Multigênica/genética , Aprendizado Profundo , Genoma , Genoma Bacteriano/genéticaRESUMO
The relationship between DNA sequence, biochemical function, and molecular evolution is relatively well-described for protein-coding regions of genomes, but far less clear in noncoding regions, particularly, in eukaryote genomes. In part, this is because we lack a complete description of the essential noncoding elements in a eukaryote genome. To contribute to this challenge, we used saturating transposon mutagenesis to interrogate the Schizosaccharomyces pombe genome. We generated 31 million transposon insertions, a theoretical coverage of 2.4 insertions per genomic site. We applied a five-state hidden Markov model (HMM) to distinguish insertion-depleted regions from insertion biases. Both raw insertion-density and HMM-defined fitness estimates showed significant quantitative relationships to gene knockout fitness, genetic diversity, divergence, and expected functional regions based on transcription and gene annotations. Through several analyses, we conclude that transposon insertions produced fitness effects in 66-90% of the genome, including substantial portions of the noncoding regions. Based on the HMM, we estimate that 10% of the insertion depleted sites in the genome showed no signal of conservation between species and were weakly transcribed, demonstrating limitations of comparative genomics and transcriptomics to detect functional units. In this species, 3'- and 5'-untranslated regions were the most prominent insertion-depleted regions that were not represented in measures of constraint from comparative genomics. We conclude that the combination of transposon mutagenesis, evolutionary, and biochemical data can provide new insights into the relationship between genome function and molecular evolution.
Assuntos
Aptidão Genética , Genoma Fúngico , Schizosaccharomyces/genética , Modelos Genéticos , Mutagênese InsercionalRESUMO
Long noncoding RNAs (lncRNAs), which are longer than 200 nucleotides but often unstable, contribute a substantial and diverse portion to pervasive noncoding transcriptomes. Most lncRNAs are poorly annotated and understood, although several play important roles in gene regulation and diseases. Here we systematically uncover and analyze lncRNAs in Schizosaccharomyces pombe. Based on RNA-seq data from twelve RNA-processing mutants and nine physiological conditions, we identify 5775 novel lncRNAs, nearly 4× the previously annotated lncRNAs. The expression of most lncRNAs becomes strongly induced under the genetic and physiological perturbations, most notably during late meiosis. Most lncRNAs are cryptic and suppressed by three RNA-processing pathways: the nuclear exosome, cytoplasmic exonuclease, and RNAi. Double-mutant analyses reveal substantial coordination and redundancy among these pathways. We classify lncRNAs by their dominant pathway into cryptic unstable transcripts (CUTs), Xrn1-sensitive unstable transcripts (XUTs), and Dicer-sensitive unstable transcripts (DUTs). XUTs and DUTs are enriched for antisense lncRNAs, while CUTs are often bidirectional and actively translated. The cytoplasmic exonuclease, along with RNAi, dampens the expression of thousands of lncRNAs and mRNAs that become induced during meiosis. Antisense lncRNA expression mostly negatively correlates with sense mRNA expression in the physiological, but not the genetic conditions. Intergenic and bidirectional lncRNAs emerge from nucleosome-depleted regions, upstream of positioned nucleosomes. Our results highlight both similarities and differences to lncRNA regulation in budding yeast. This broad survey of the lncRNA repertoire and characteristics in S. pombe, and the interwoven regulatory pathways that target lncRNAs, provides a rich framework for their further functional analyses.
Assuntos
Exonucleases/metabolismo , Exossomos/metabolismo , RNA Longo não Codificante/genética , Schizosaccharomyces/genética , Análise de Sequência de RNA/métodos , Núcleo Celular/metabolismo , Citoplasma/enzimologia , Proteínas Fúngicas/metabolismo , Perfilação da Expressão Gênica/métodos , Regulação Fúngica da Expressão Gênica , Meiose , Anotação de Sequência Molecular , Mutação , Interferência de RNA , Estabilidade de RNA , RNA Fúngico/genética , RNA Longo não Codificante/química , Schizosaccharomyces/química , Schizosaccharomyces/enzimologiaRESUMO
Exon skipping is considered a principal mechanism by which eukaryotic cells expand their transcriptome and proteome repertoires, creating different splice variants with distinct cellular functions. Here we analyze RNA-seq data from 116 transcriptomes in fission yeast (Schizosaccharomyces pombe), covering multiple physiological conditions as well as transcriptional and RNA processing mutants. We applied brute-force algorithms to detect all possible exon-skipping events, which were widespread but rare compared to normal splicing events. Exon-skipping events increased in cells deficient for the nuclear exosome or the 5'-3' exonuclease Dhp1, and also at late stages of meiotic differentiation when nuclear-exosome transcripts decreased. The pervasive exon-skipping transcripts were stochastic, did not increase in specific physiological conditions, and were mostly present at less than one copy per cell, even in the absence of nuclear RNA surveillance and during late meiosis. These exon-skipping transcripts are therefore unlikely to be functional and may reflect splicing errors that are actively removed by nuclear RNA surveillance. The average splicing rate by exon skipping was â¼ 0.24% in wild type and â¼ 1.75% in nuclear exonuclease mutants. We also detected approximately 250 circular RNAs derived from single or multiple exons. These circular RNAs were rare and stochastic, although a few became stabilized during quiescence and in splicing mutants. Using an exhaustive search algorithm, we also uncovered thousands of previously unknown splice sites, indicating pervasive splicing; yet most of these splicing variants were cryptic and increased in nuclear degradation mutants. This study highlights widespread but low frequency alternative or aberrant splicing events that are targeted by nuclear RNA surveillance.
Assuntos
Éxons , Genoma Fúngico , RNA Nuclear/genética , Schizosaccharomyces/genética , Processamento Alternativo , Exorribonucleases/genética , Exorribonucleases/metabolismo , Meiose , RNA/genética , RNA/metabolismo , RNA Circular , RNA Nuclear/metabolismo , Schizosaccharomyces/metabolismo , Proteínas de Schizosaccharomyces pombe/genética , Proteínas de Schizosaccharomyces pombe/metabolismo , Alinhamento de Sequência , Análise de Sequência de RNA , TranscriptomaRESUMO
Both canonical and alternative splicing of RNAs are governed by intronic sequence elements and produce transient lariat structures fastened by branch points within introns. To map precisely the location of branch points on a genomic scale, we developed LaSSO (Lariat Sequence Site Origin), a data-driven algorithm which utilizes RNA-seq data. Using fission yeast cells lacking the debranching enzyme Dbr1, LaSSO not only accurately identified canonical splicing events, but also pinpointed novel, but rare, exon-skipping events, which may reflect aberrantly spliced transcripts. Compromised intron turnover perturbed gene regulation at multiple levels, including splicing and protein translation. Notably, Dbr1 function was also critical for the expression of mitochondrial genes and for the processing of self-spliced mitochondrial introns. LaSSO showed better sensitivity and accuracy than algorithms used for computational branch-point prediction or for empirical branch-point determination. Even when applied to a human data set acquired in the presence of debranching activity, LaSSO identified both canonical and exon-skipping branch points. LaSSO thus provides an effective approach for defining high-resolution maps of branch-site sequences and intronic elements on a genomic scale. LaSSO should be useful to validate introns and uncover branch-point sequences in any eukaryote, and it could be integrated into RNA-seq pipelines.
Assuntos
Algoritmos , Mapeamento Cromossômico , Íntrons , Motivos de Nucleotídeos , Splicing de RNA , Sequências Reguladoras de Ácido Nucleico , Sequência de Bases , Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , Éxons , Deleção de Genes , Perfilação da Expressão Gênica , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Matrizes de Pontuação de Posição Específica , Precursores de RNA/genética , RNA Fúngico/genética , Schizosaccharomyces/genética , Transcrição Gênica , TranscriptomaRESUMO
The primary structure and phosphorylation pattern of the tandem Y(1)S(2)P(3)T(4)S(5)P(6)S(7) repeats of the RNA polymerase II carboxyl-terminal domain (CTD) comprise an informational code that coordinates transcription, chromatin modification, and RNA processing. To gauge the contributions of individual CTD coding "letters" to gene expression, we analyzed the poly(A)(+) transcriptomes of fission yeast mutants that lack each of the four inessential CTD phosphoacceptors: Tyr1, Ser2, Thr4, and Ser7. There was a hierarchy of CTD mutational effects with respect to the number of dysregulated protein-coding RNAs, with S2A (n = 227) >> Y1F (n = 71) > S7A (n = 58) >> T4A (n = 7). The majority of the protein-coding RNAs affected in Y1F cells were coordinately affected by S2A, suggesting that Tyr1-Ser2 constitutes a two-letter code "word." Y1F and S2A elicited increased expression of genes encoding proteins involved in iron uptake (Frp1, Fip1, Fio1, Str3, Str1, Sib1), without affecting the expression of the genes that repress the iron regulon, implying that Tyr1-Ser2 transduces a repressive signal. Y1F and S2A cells had increased levels of ferric reductase activity and were hypersensitive to phleomycin, indicative of elevated intracellular iron. The T4A and S7A mutations had opposing effects on the phosphate response pathway. T4A reduced the expression of two genes encoding proteins involved in phosphate acquisition (the Pho1 acid phosphatase and the phosphate transporter SPBC8E4.01c), without affecting the expression of known genes that regulate the phosphate response pathway, whereas S7A increased pho1(+) expression. These results highlight specific cellular gene expression programs that are responsive to distinct CTD cues.
Assuntos
Regulação Fúngica da Expressão Gênica/genética , Homeostase/fisiologia , Ferro/metabolismo , Oligopeptídeos/genética , RNA Polimerase II/genética , Schizosaccharomyces/genética , Sequência de Bases , Primers do DNA/genética , Biblioteca Gênica , Dados de Sequência Molecular , Mutação/genética , Schizosaccharomyces/fisiologia , Alinhamento de Sequência , Análise de Sequência de DNA , Sequências de Repetição em Tandem/genéticaRESUMO
The spliceosome is a dynamic macromolecular machine that catalyzes the removal of introns from pre-mRNA, yielding mature message. Schizosaccharomyces pombe Cwf10 (homolog of Saccharomyces cerevisiae Snu114 and human U5-116K), an integral member of the U5 snRNP, is a GTPase that has multiple roles within the splicing cycle. Cwf10/Snu114 family members are highly homologous to eukaryotic translation elongation factor EF2, and they contain a conserved N-terminal extension (NTE) to the EF2-like portion, predicted to be an intrinsically unfolded domain. Using S. pombe as a model system, we show that the NTE is not essential, but cells lacking this domain are defective in pre-mRNA splicing. Genetic interactions between cwf10-ΔNTE and other pre-mRNA splicing mutants are consistent with a role for the NTE in spliceosome activation and second-step catalysis. Characterization of Cwf10-NTE by various biophysical techniques shows that in solution the NTE contains regions of both structure and disorder. The first 23 highly conserved amino acids of the NTE are essential for its role in splicing but when overexpressed are not sufficient to restore pre-mRNA splicing to wild-type levels in cwf10-ΔNTE cells. When the entire NTE is overexpressed in the cwf10-ΔNTE background, it can complement the truncated Cwf10 protein in trans, and it immunoprecipitates a complex similar in composition to the late-stage U5.U2/U6 spliceosome. These data show that the structurally flexible NTE is capable of independently incorporating into the spliceosome and improving splicing function, possibly indicating a role for the NTE in stabilizing conformational rearrangements during a splice cycle.
Assuntos
GTP Fosfo-Hidrolases/metabolismo , Ribonucleoproteína Nuclear Pequena U5/metabolismo , Proteínas de Schizosaccharomyces pombe/metabolismo , Schizosaccharomyces/enzimologia , Motivos de Aminoácidos , Sequência de Aminoácidos , Sítios de Ligação , GTP Fosfo-Hidrolases/genética , Dados de Sequência Molecular , Mutação , Ligação Proteica , Estrutura Terciária de Proteína , Splicing de RNA , Ribonucleoproteína Nuclear Pequena U5/química , Ribonucleoproteína Nuclear Pequena U5/genética , Schizosaccharomyces/química , Schizosaccharomyces/genética , Proteínas de Schizosaccharomyces pombe/química , Proteínas de Schizosaccharomyces pombe/genética , Spliceossomos/metabolismoRESUMO
Annotation of multiple regions of interest across the whole mouse brain is an indispensable process for quantitative evaluation of a multitude of study endpoints in neuroscience digital pathology. Prior experience and domain expert knowledge are the key aspects for image annotation quality and consistency. At present, image annotation is often achieved manually by certified pathologists or trained technicians, limiting the total throughput of studies performed at neuroscience digital pathology labs. It may also mean that simpler and quicker methods of examining tissue samples are used by non-pathologists, especially in the early stages of research and preclinical studies. To address these limitations and to meet the growing demand for image analysis in a pharmaceutical setting, we developed AnNoBrainer, an open-source software tool that leverages deep learning, image registration, and standard cortical brain templates to automatically annotate individual brain regions on 2D pathology slides. Application of AnNoBrainer to a published set of pathology slides from transgenic mice models of synucleinopathy revealed comparable accuracy, increased reproducibility, and a significant reduction (~ 50%) in time spent on brain annotation, quality control and labelling compared to trained scientists in pathology. Taken together, AnNoBrainer offers a rapid, accurate, and reproducible automated annotation of mouse brain images that largely meets the experts' histopathological assessment standards (> 85% of cases) and enables high-throughput image analysis workflows in digital pathology labs.
RESUMO
Recent COVID-19 vaccines unleashed the potential of mRNA-based therapeutics. A common bottleneck across mRNA-based therapeutic approaches is the rapid design of mRNA sequences that are translationally efficient, long-lived and non-immunogenic. Currently, an accessible software tool to aid in the design of such high-quality mRNA is lacking. Here, we present mRNAid, an open-source platform for therapeutic mRNA optimization, design and visualization that offers a variety of optimization strategies for sequence and structural features, allowing one to customize desired properties into their mRNA sequence. We experimentally demonstrate that transcripts optimized by mRNAid have characteristics comparable with commercially available sequences. To encompass additional aspects of mRNA design, we experimentally show that incorporation of certain uridine analogs and untranslated regions can further enhance stability, boost protein output and mitigate undesired immunogenicity effects. Finally, this study provides a roadmap for rational design of therapeutic mRNA transcripts.
RESUMO
The volume of nucleic acid sequence data has exploded recently, amplifying the challenge of transforming data into meaningful information. Processing data can require an increasingly complex ecosystem of customized tools, which increases difficulty in communicating analyses in an understandable way yet is of sufficient detail to enable informed decisions or repeats. This can be of particular interest to institutions and companies communicating computations in a regulatory environment. BioCompute Objects (BCOs; an instance of pipeline documentation that conforms to the IEEE 2791-2020 standard) were developed as a standardized mechanism for analysis reporting. A suite of BCOs is presented, representing interconnected elements of a computation modeled after those that might be found in a regulatory submission but are shared publicly - in this case a pipeline designed to identify viral contaminants in biological manufacturing, such as for vaccines.
Assuntos
Biologia Computacional , Vacinas , Sequenciamento de Nucleotídeos em Larga Escala , Fluxo de TrabalhoRESUMO
We developed a murine model of CNS disease to obtain a better understanding of the pathogenesis of CNS involvement in pre-B-cell acute lymphoblastic leukemia (ALL). Semiquantitative proteomic discovery-based approaches identified unique expression of asparaginyl endopeptidase (AEP), intercellular adhesion molecule 1 (ICAM1), and ras-related C3 botulinum toxin substrate 2 (RAC2), among others, in an invasive pre-B-cell line that produced CNS leukemia in NOD-SCID mice. Targeting RAC2 significantly inhibited in vitro invasion and delayed disease onset in mice. Induced expression of RAC2 in cell lines with low/absent expression of AEP and ICAM1 did not result in an invasive phenotype or murine CNS disease. Flow cytometric analysis identified an enriched population of blast cells expressing ICAM1/lymphocyte function associated antigen-1 (LFA-1)/CD70 in the CD10(+)/CD19(+) fraction of bone marrow aspirates obtained from relapsed compared with normal controls and those with primary disease. CD10(+)/CD19(+) fractions obtained from relapsed patients also express RAC2 and give rise to CNS disease in mice. Our data suggest that combinations of processes are involved in the pathogenesis of CNS disease in pre-B-cell ALL, support a model in which CNS disease occurs as a result of external invasion, and suggest that targeting the processes of adhesion and invasion unique to pre-B cells may prevent recurrences within the CNS.
Assuntos
Neoplasias do Sistema Nervoso Central/fisiopatologia , Cisteína Endopeptidases/genética , Molécula 1 de Adesão Intercelular/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras B/fisiopatologia , Proteínas rac de Ligação ao GTP/genética , Animais , Adesão Celular/fisiologia , Linhagem Celular Tumoral , Membrana Celular/fisiologia , Neoplasias do Sistema Nervoso Central/genética , Neoplasias do Sistema Nervoso Central/patologia , Criança , Cisteína Endopeptidases/metabolismo , Modelos Animais de Doenças , Regulação Leucêmica da Expressão Gênica/fisiologia , Humanos , Molécula 1 de Adesão Intercelular/metabolismo , Camundongos , Camundongos Endogâmicos NOD , Camundongos SCID , Invasividade Neoplásica , Leucemia-Linfoma Linfoblástico de Células Precursoras B/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras B/patologia , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras/patologia , Leucemia-Linfoma Linfoblástico de Células Precursoras/fisiopatologia , Proteômica , Proteínas rac de Ligação ao GTP/metabolismo , Proteína RAC2 de Ligação ao GTPRESUMO
Motivation: Despite the advent of next-generation sequencing technology and its widespread applications, Sanger sequencing remains instrumental for molecular biology subcloning work in biological and medical research and indispensable for drug discovery campaigns. Although Sanger sequencing technology has been long established, existing software for processing and visualization of trace file chromatograms is limited in terms of functionality, scalability and availability for commercial use. Results: To fill this gap, we developed TraceTrack, an open-source web application tool for batch alignment, analysis and visualization of Sanger trace files. TraceTrack offers high-throughput matching of trace files to reference sequences, rapid identification of mutations and an intuitive chromatogram analysis. Comparative analysis between TraceTrack and existing software tools highlights the advantages of TraceTrack with regards to batch processing, visualization and export functionalities. Availability and implementation: TraceTrack is available at https://github.com/MSDLLCpapers/TraceTrack and as a web application at https://tracetrack.dichlab.org. TraceTrack is a web application for batch processing and visualization of Sanger trace file chromatograms that meets the increasing demand of industrial sequence validation workflows in pharmaceutical settings. Supplementary information: Supplementary data are available at Bioinformatics Advances online.
RESUMO
Identification of favorable biophysical properties for protein therapeutics as part of developability assessment is a crucial part of the preclinical development process. Successful prediction of such properties and bioassay results from calculated in silico features has potential to reduce the time and cost of delivering clinical-grade material to patients, but nevertheless has remained an ongoing challenge to the field. Here, we demonstrate an automated and flexible machine learning workflow designed to compare and identify the most powerful features from computationally derived physiochemical feature sets, generated from popular commercial software packages. We implement this workflow with medium-sized datasets of human and humanized IgG molecules to generate predictive regression models for two key developability endpoints, hydrophobicity and poly-specificity. The most important features discovered through the automated workflow corroborate several previous literature reports, and newly discovered features suggest directions for further research and potential model improvement.
Assuntos
Anticorpos Monoclonais , Imunoglobulina G , Humanos , Anticorpos Monoclonais/química , Aprendizado de MáquinaRESUMO
Strand-specific RNA sequencing of S. pombe revealed a highly structured programme of ncRNA expression at over 600 loci. Waves of antisense transcription accompanied sexual differentiation. A substantial proportion of ncRNA arose from mechanisms previously considered to be largely artefactual, including improper 3' termination and bidirectional transcription. Constitutive induction of the entire spk1+, spo4+, dis1+ and spo6+ antisense transcripts from an integrated, ectopic, locus disrupted their respective meiotic functions. This ability of antisense transcripts to disrupt gene function when expressed in trans suggests that cis production at native loci during sexual differentiation may also control gene function. Consistently, insertion of a marker gene adjacent to the dis1+ antisense start site mimicked ectopic antisense expression in reducing the levels of this microtubule regulator and abolishing the microtubule-dependent 'horsetail' stage of meiosis. Antisense production had no impact at any of these loci when the RNA interference (RNAi) machinery was removed. Thus, far from being simply 'genome chatter', this extensive ncRNA landscape constitutes a fundamental component in the controls that drive the complex programme of sexual differentiation in S. pombe.
Assuntos
Regulação Fúngica da Expressão Gênica , Meiose/genética , RNA Antissenso/genética , RNA não Traduzido/genética , Schizosaccharomyces/fisiologia , Bases de Dados de Ácidos Nucleicos , Genes Fúngicos , Fenômenos Microbiológicos , RNA Antissenso/metabolismo , RNA Fúngico , RNA Interferente Pequeno , RNA não Traduzido/metabolismo , Schizosaccharomyces/genética , Biologia de Sistemas , Transcrição GênicaRESUMO
Despite recent advances in transgenic animal models and display technologies, humanization of mouse sequences remains one of the main routes for therapeutic antibody development. Traditionally, humanization is manual, laborious, and requires expert knowledge. Although automation efforts are advancing, existing methods are either demonstrated on a small scale or are entirely proprietary. To predict the immunogenicity risk, the human-likeness of sequences can be evaluated using existing humanness scores, but these lack diversity, granularity or interpretability. Meanwhile, immune repertoire sequencing has generated rich antibody libraries such as the Observed Antibody Space (OAS) that offer augmented diversity not yet exploited for antibody engineering. Here we present BioPhi, an open-source platform featuring novel methods for humanization (Sapiens) and humanness evaluation (OASis). Sapiens is a deep learning humanization method trained on the OAS using language modeling. Based on an in silico humanization benchmark of 177 antibodies, Sapiens produced sequences at scale while achieving results comparable to that of human experts. OASis is a granular, interpretable and diverse humanness score based on 9-mer peptide search in the OAS. OASis separated human and non-human sequences with high accuracy, and correlated with clinical immunogenicity. BioPhi thus offers an antibody design interface with automated methods that capture the richness of natural antibody repertoires to produce therapeutics with desired properties and accelerate antibody discovery campaigns. The BioPhi platform is accessible at https://biophi.dichlab.org and https://github.com/Merck/BioPhi.
Assuntos
Aprendizado Profundo , Animais , Anticorpos , CamundongosRESUMO
Protein engineering is the discipline of developing useful proteins for applications in research, therapeutic, and industrial processes by modification of naturally occurring proteins or by invention of de novo proteins. Modern protein engineering relies on the ability to rapidly generate and screen diverse libraries of mutant proteins. However, design of mutant libraries is typically hampered by scale and complexity, necessitating development of advanced automation and optimization tools that can improve efficiency and accuracy. At present, automated library design tools are functionally limited or not freely available. To address these issues, we developed Mutation Maker, an open source mutagenic oligo design software for large-scale protein engineering experiments. Mutation Maker is not only specifically tailored to multisite random and directed mutagenesis protocols, but also pioneers bespoke mutagenic oligo design for de novo gene synthesis workflows. Enabled by a novel bundle of orchestrated heuristics, optimization, constraint-satisfaction and backtracking algorithms, Mutation Maker offers a versatile toolbox for gene diversification design at industrial scale. Supported by in silico simulations and compelling experimental validation data, Mutation Maker oligos produce diverse gene libraries at high success rates irrespective of genes or vectors used. Finally, Mutation Maker was created as an extensible platform on the notion that directed evolution techniques will continue to evolve and revolutionize current and future-oriented applications.