Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
1.
Cell ; 184(4): 1047-1063.e23, 2021 02 18.
Artículo en Inglés | MEDLINE | ID: mdl-33539780

RESUMEN

DNA has not been utilized to record temporal information, although DNA has been used to record biological information and to compute mathematical problems. Here, we found that indel generation by Cas9 and guide RNA can occur at steady rates, in contrast to typical dynamic biological reactions, and the accumulated indel frequency can be a function of time. By measuring indel frequencies, we developed a method for recording and measuring absolute time periods over hours to weeks in mammalian cells. These time-recordings were conducted in several cell types, with different promoters and delivery vectors for Cas9, and in both cultured cells and cells of living mice. As applications, we recorded the duration of chemical exposure and the lengths of elapsed time since the onset of biological events (e.g., heat exposure and inflammation). We propose that our systems could serve as synthetic "DNA clocks."


Asunto(s)
Proteína 9 Asociada a CRISPR/metabolismo , Animales , Secuencia de Bases , Microambiente Celular , Simulación por Computador , Células HEK293 , Semivida , Humanos , Mutación INDEL/genética , Inflamación/patología , Integrasas/metabolismo , Masculino , Ratones Desnudos , Regiones Promotoras Genéticas/genética , ARN Guía de Kinetoplastida/genética , Reproducibilidad de los Resultados , Factores de Tiempo
2.
Nat Methods ; 20(7): 999-1009, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-37188955

RESUMEN

Recently, various small Cas9 orthologs and variants have been reported for use in in vivo delivery applications. Although small Cas9s are particularly suited for this purpose, selecting the most optimal small Cas9 for use at a specific target sequence continues to be challenging. Here, to this end, we have systematically compared the activities of 17 small Cas9s for thousands of target sequences. For each small Cas9, we have characterized the protospacer adjacent motif and determined optimal single guide RNA expression formats and scaffold sequence. High-throughput comparative analyses revealed distinct high- and low-activity groups of small Cas9s. We also developed DeepSmallCas9, a set of computational models predicting the activities of the small Cas9s at matched and mismatched target sequences. Together, this analysis and these computational models provide a useful guide for researchers to select the most suitable small Cas9 for specific applications.


Asunto(s)
Sistemas CRISPR-Cas , Edición Génica
3.
Nat Chem Biol ; 19(8): 972-980, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-36894722

RESUMEN

Although several high-fidelity SpCas9 variants have been reported, it has been observed that this increased specificity is associated with reduced on-target activity, limiting the applications of the high-fidelity variants when efficient genome editing is required. Here, we developed an improved version of Sniper-Cas9, Sniper2L, which represents an exception to this trade-off trend as it showed higher specificity with retained high activity. We evaluated Sniper2L activities at a large number of target sequences and developed DeepSniper, a deep learning model that can predict the activity of Sniper2L. We also confirmed that Sniper2L can induce highly efficient and specific editing at a large number of target sequences when it is delivered as a ribonucleoprotein complex. Mechanically, the high specificity of Sniper2L originates from its superior ability to avoid unwinding a target DNA containing even a single mismatch. We envision that Sniper2L will be useful when efficient and specific genome editing is required.


Asunto(s)
Sistemas CRISPR-Cas , Edición Génica , ADN/genética
4.
Bioinformatics ; 38(3): 671-677, 2022 01 12.
Artículo en Inglés | MEDLINE | ID: mdl-34677573

RESUMEN

MOTIVATION: MicroRNAs (miRNAs) play pivotal roles in gene expression regulation by binding to target sites of messenger RNAs (mRNAs). While identifying functional targets of miRNAs is of utmost importance, their prediction remains a great challenge. Previous computational algorithms have major limitations. They use conservative candidate target site (CTS) selection criteria mainly focusing on canonical site types, rely on laborious and time-consuming manual feature extraction, and do not fully capitalize on the information underlying miRNA-CTS interactions. RESULTS: In this article, we introduce TargetNet, a novel deep learning-based algorithm for functional miRNA target prediction. To address the limitations of previous approaches, TargetNet has three key components: (i) relaxed CTS selection criteria accommodating irregularities in the seed region, (ii) a novel miRNA-CTS sequence encoding scheme incorporating extended seed region alignments and (iii) a deep residual network-based prediction model. The proposed model was trained with miRNA-CTS pair datasets and evaluated with miRNA-mRNA pair datasets. TargetNet advances the previous state-of-the-art algorithms used in functional miRNA target classification. Furthermore, it demonstrates great potential for distinguishing high-functional miRNA targets. AVAILABILITY AND IMPLEMENTATION: The codes and pre-trained models are available at https://github.com/mswzeus/TargetNet.


Asunto(s)
MicroARNs , MicroARNs/genética , MicroARNs/metabolismo , Redes Neurales de la Computación , Algoritmos , ARN Mensajero/genética , Regulación de la Expresión Génica , Biología Computacional
5.
Brief Bioinform ; 18(5): 851-869, 2017 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-27473064

RESUMEN

In the era of big data, transformation of biomedical big data into valuable knowledge has been one of the most important challenges in bioinformatics. Deep learning has advanced rapidly since the early 2000s and now demonstrates state-of-the-art performance in various fields. Accordingly, application of deep learning in bioinformatics to gain insight from data has been emphasized in both academia and industry. Here, we review deep learning in bioinformatics, presenting examples of current research. To provide a useful and comprehensive perspective, we categorize research both by the bioinformatics domain (i.e. omics, biomedical imaging, biomedical signal processing) and deep learning architecture (i.e. deep neural networks, convolutional neural networks, recurrent neural networks, emergent architectures) and present brief descriptions of each study. Additionally, we discuss theoretical and practical issues of deep learning in bioinformatics and suggest future research directions. We believe that this review will provide valuable insights and serve as a starting point for researchers to apply deep learning approaches in their bioinformatics studies.


Asunto(s)
Aprendizaje Automático , Biología Computacional , Humanos , Redes Neurales de la Computación
6.
Nat Biotechnol ; 42(3): 484-497, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-37188916

RESUMEN

Applications of base editing are frequently restricted by the requirement for a protospacer adjacent motif (PAM), and selecting the optimal base editor (BE) and single-guide RNA pair (sgRNA) for a given target can be difficult. To select for BEs and sgRNAs without extensive experimental work, we systematically compared the editing windows, outcomes and preferred motifs for seven BEs, including two cytosine BEs, two adenine BEs and three C•G to G•C BEs at thousands of target sequences. We also evaluated nine Cas9 variants that recognize different PAM sequences and developed a deep learning model, DeepCas9variants, for predicting which variants function most efficiently at sites with a given target sequence. We then develop a computational model, DeepBE, that predicts editing efficiencies and outcomes of 63 BEs that were generated by incorporating nine Cas9 variants as nickase domains into the seven BE variants. The predicted median efficiencies of BEs with DeepBE-based design were 2.9- to 20-fold higher than those of rationally designed SpCas9-containing BEs.


Asunto(s)
Ácidos Alcanesulfónicos , Sistemas CRISPR-Cas , Aprendizaje Profundo , Sistemas CRISPR-Cas/genética , Edición Génica , Proteína 9 Asociada a CRISPR/genética , Proteína 9 Asociada a CRISPR/metabolismo , ARN Guía de Sistemas CRISPR-Cas
7.
Physiol Meas ; 44(5)2023 05 10.
Artículo en Inglés | MEDLINE | ID: mdl-36638544

RESUMEN

Objective.Recently, many electrocardiogram (ECG) classification algorithms using deep learning have been proposed. Because the ECG characteristics vary across datasets owing to variations in factors such as recorded hospitals and the race of participants, the model needs to have a consistently high generalization performance across datasets. In this study, as part of the PhysioNet/Computing in Cardiology Challenge (PhysioNet Challenge) 2021, we present a model to classify cardiac abnormalities from the 12- and the reduced-lead ECGs.Approach.To improve the generalization performance of our earlier proposed model, we adopted a practical suite of techniques, i.e. constant-weighted cross-entropy loss, additional features, mixup augmentation, squeeze/excitation block, and OneCycle learning rate scheduler. We evaluated its generalization performance using the leave-one-dataset-out cross-validation setting. Furthermore, we demonstrate that the knowledge distillation from the 12-lead and large-teacher models improved the performance of the reduced-lead and small-student models.Main results.With the proposed model, our DSAIL SNU team has received Challenge scores of 0.55, 0.58, 0.58, 0.57, and 0.57 (ranked 2nd, 1st, 1st, 2nd, and 2nd of 39 teams) for the 12-, 6-, 4-, 3-, and 2-lead versions of the hidden test set, respectively.Significance.The proposed model achieved a higher generalization performance over six different hidden test datasets than the one we submitted to the PhysioNet Challenge 2020.


Asunto(s)
Fibrilación Atrial , Humanos , Algoritmos , Electrocardiografía/métodos , Entropía
8.
Artículo en Inglés | MEDLINE | ID: mdl-32809941

RESUMEN

Recent advances in next-generation sequencing technologies have led to the successful insertion of video information into DNA using synthesized oligonucleotides. Several attempts have been made to embed larger data into living organisms. This process of embedding messages is called steganography and it is used for hiding and watermarking data to protect intellectual property. In contrast, steganalysis is a group of algorithms that serves to detect hidden information from covert media. Various methods have been developed to detect messages embedded in conventional covert channels. However, conventional steganalysis algorithms are mostly limited to common covert media. Most common detection approaches, such as frequency analysis-based methods, often overlook important signals when directly applied to DNA steganography and are easily bypassed by recently developed steganography techniques. To address the limitations of conventional approaches, a sequence-learning-based malicious DNA sequence analysis method based on neural networks has been proposed. The proposed method learns intrinsic distributions and identifies distribution variations using a classification score to predict whether a sequence is to be a coding or non-coding sequence. Based on our experiments and results, we have developed a framework to safeguard security against DNA steganography.


Asunto(s)
Redes Neurales de la Computación , Privacidad , Algoritmos , Secuencia de Bases , ADN/genética
9.
PLoS One ; 16(5): e0251865, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34003870

RESUMEN

Heat shock proteins (HSPs) play a pivotal role as molecular chaperones against unfavorable conditions. Although HSPs are of great importance, their computational identification remains a significant challenge. Previous studies have two major limitations. First, they relied heavily on amino acid composition features, which inevitably limited their prediction performance. Second, their prediction performance was overestimated because of the independent two-stage evaluations and train-test data redundancy. To overcome these limitations, we introduce two novel deep learning algorithms: (1) time-efficient DeepHSP and (2) high-performance DeeperHSP. We propose a convolutional neural network (CNN)-based DeepHSP that classifies both non-HSPs and six HSP families simultaneously. It outperforms state-of-the-art algorithms, despite taking 14-15 times less time for both training and inference. We further improve the performance of DeepHSP by taking advantage of protein transfer learning. While DeepHSP is trained on raw protein sequences, DeeperHSP is trained on top of pre-trained protein representations. Therefore, DeeperHSP remarkably outperforms state-of-the-art algorithms increasing F1 scores in both cross-validation and independent test experiments by 20% and 10%, respectively. We envision that the proposed algorithms can provide a proteome-wide prediction of HSPs and help in various downstream analyses for pathology and clinical research.


Asunto(s)
Proteínas de Choque Térmico/genética , Aprendizaje Automático , Chaperonas Moleculares/genética , Redes Neurales de la Computación , Algoritmos , Secuencia de Aminoácidos/genética , Biología Computacional/tendencias , Aprendizaje Profundo , Proteínas de Choque Térmico/aislamiento & purificación , Humanos , Transporte de Proteínas/genética
10.
Nat Biotechnol ; 39(2): 198-206, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-32958957

RESUMEN

Prime editing enables the introduction of virtually any small-sized genetic change without requiring donor DNA or double-strand breaks. However, evaluation of prime editing efficiency requires time-consuming experiments, and the factors that affect efficiency have not been extensively investigated. In this study, we performed high-throughput evaluation of prime editor 2 (PE2) activities in human cells using 54,836 pairs of prime editing guide RNAs (pegRNAs) and their target sequences. The resulting data sets allowed us to identify factors affecting PE2 efficiency and to develop three computational models to predict pegRNA efficiency. For a given target sequence, the computational models predict efficiencies of pegRNAs with different lengths of primer binding sites and reverse transcriptase templates for edits of various types and positions. Testing the accuracy of the predictions using test data sets that were not used for training, we found Spearman's correlations between 0.47 and 0.81. Our computational models and information about factors affecting PE2 efficiency will facilitate practical application of prime editing.


Asunto(s)
Edición Génica , ARN Guía de Kinetoplastida/genética , Algoritmos , Proteína 9 Asociada a CRISPR/metabolismo , Línea Celular Tumoral , Simulación por Computador , Células HEK293 , Humanos , Aprendizaje Automático
11.
Curr Protoc ; 1(5): e113, 2021 May.
Artículo en Inglés | MEDLINE | ID: mdl-33961736

RESUMEN

Models from machine learning (ML) or artificial intelligence (AI) increasingly assist in guiding experimental design and decision making in molecular biology and medicine. Recently, Language Models (LMs) have been adapted from Natural Language Processing (NLP) to encode the implicit language written in protein sequences. Protein LMs show enormous potential in generating descriptive representations (embeddings) for proteins from just their sequences, in a fraction of the time with respect to previous approaches, yet with comparable or improved predictive ability. Researchers have trained a variety of protein LMs that are likely to illuminate different angles of the protein language. By leveraging the bio_embeddings pipeline and modules, simple and reproducible workflows can be laid out to generate protein embeddings and rich visualizations. Embeddings can then be leveraged as input features through machine learning libraries to develop methods predicting particular aspects of protein function and structure. Beyond the workflows included here, embeddings have been leveraged as proxies to traditional homology-based inference and even to align similar protein sequences. A wealth of possibilities remain for researchers to harness through the tools provided in the following protocols. © 2021 The Authors. Current Protocols published by Wiley Periodicals LLC. The following protocols are included in this manuscript: Basic Protocol 1: Generic use of the bio_embeddings pipeline to plot protein sequences and annotations Basic Protocol 2: Generate embeddings from protein sequences using the bio_embeddings pipeline Basic Protocol 3: Overlay sequence annotations onto a protein space visualization Basic Protocol 4: Train a machine learning classifier on protein embeddings Alternate Protocol 1: Generate 3D instead of 2D visualizations Alternate Protocol 2: Visualize protein solubility instead of protein subcellular localization Support Protocol: Join embedding generation and sequence space visualization in a pipeline.


Asunto(s)
Inteligencia Artificial , Aprendizaje Profundo , Aprendizaje Automático , Procesamiento de Lenguaje Natural , Proteínas
12.
Nat Commun ; 12(1): 5617, 2021 09 23.
Artículo en Inglés | MEDLINE | ID: mdl-34556671

RESUMEN

Although prime editing is a promising genome editing method, the efficiency of prime editor 2 (PE2) is often insufficient. Here we generate a more efficient variant of PE2, named hyPE2, by adding the Rad51 DNA-binding domain. When tested at endogenous sites, hyPE2 shows a median of 1.5- or 1.4- fold (range, 0.99- to 2.6-fold) higher efficiencies than PE2; furthermore, at sites where PE2-induced prime editing is very inefficient (efficiency < 1%), hyPE2 enables prime editing with efficiencies ranging from 1.1% to 2.9% at up to 34% of target sequences, potentially facilitating prime editing applications.


Asunto(s)
Algoritmos , Sistemas CRISPR-Cas , ADN/metabolismo , Edición Génica/métodos , Modelos Genéticos , Recombinasa Rad51/metabolismo , Secuencia de Aminoácidos , Sitios de Unión/genética , ADN/genética , Células HCT116 , Células HEK293 , Humanos , Recombinasa Rad51/genética , Reproducibilidad de los Resultados
13.
Nat Biotechnol ; 38(11): 1328-1336, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32514125

RESUMEN

Several Streptococcus pyogenes Cas9 (SpCas9) variants have been developed to improve an enzyme's specificity or to alter or broaden its protospacer-adjacent motif (PAM) compatibility, but selecting the optimal variant for a given target sequence and application remains difficult. To build computational models to predict the sequence-specific activity of 13 SpCas9 variants, we first assessed their cleavage efficiency at 26,891 target sequences. We found that, of the 256 possible four-nucleotide NNNN sequences, 156 can be used as a PAM by at least one of the SpCas9 variants. For the high-fidelity variants, overall activity could be ranked as SpCas9 ≥ Sniper-Cas9 > eSpCas9(1.1) > SpCas9-HF1 > HypaCas9 ≈ xCas9 >> evoCas9, whereas their overall specificities could be ranked as evoCas9 >> HypaCas9 ≥ SpCas9-HF1 ≈ eSpCas9(1.1) > xCas9 > Sniper-Cas9 > SpCas9. Using these data, we developed 16 deep-learning-based computational models that accurately predict the activity of these variants at any target sequence.


Asunto(s)
Proteína 9 Asociada a CRISPR/genética , Mutación/genética , Secuencia de Bases , Aprendizaje Profundo , Biblioteca de Genes , Células HEK293 , Humanos , Mutación INDEL/genética , Lentivirus/genética , Modelos Genéticos , ARN Guía de Kinetoplastida/genética
14.
Nat Biomed Eng ; 4(1): 111-124, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-31937939

RESUMEN

The applications of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing can be limited by a lack of compatible protospacer adjacent motifs (PAMs), insufficient on-target activity and off-target effects. Here, we report an extensive comparison of the PAM-sequence compatibilities and the on-target and off-target activities of Cas9 from Streptococcus pyogenes (SpCas9) and the SpCas9 variants xCas9 and SpCas9-NG (which are known to have broader PAM compatibility than SpCas9) at 26,478 lentivirally integrated target sequences and 78 endogenous target sites in human cells. We found that xCas9 has the lowest tolerance for mismatched target sequences and that SpCas9-NG has the broadest PAM compatibility. We also show, on the basis of newly identified non-NGG PAM sequences, that SpCas9-NG and SpCas9 can edit six previously unedited endogenous sites associated with genetic diseases. Moreover, we provide deep-learning models that predict the activities of xCas9 and SpCas9-NG at the target sequences. The resulting deeper understanding of the activities of xCas9, SpCas9-NG and SpCas9 in human cells should facilitate their use.


Asunto(s)
Proteína 9 Asociada a CRISPR/genética , Sistemas CRISPR-Cas/genética , Edición Génica/métodos , Aprendizaje Profundo , Vectores Genéticos/genética , Células HEK293 , Humanos , Lentivirus/fisiología , Streptococcus pyogenes/genética
15.
Nat Biotechnol ; 38(9): 1037-1043, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-32632303

RESUMEN

Base editors, including adenine base editors (ABEs)1 and cytosine base editors (CBEs)2,3, are widely used to induce point mutations. However, determining whether a specific nucleotide in its genomic context can be edited requires time-consuming experiments. Furthermore, when the editable window contains multiple target nucleotides, various genotypic products can be generated. To develop computational tools to predict base-editing efficiency and outcome product frequencies, we first evaluated the efficiencies of an ABE and a CBE and the outcome product frequencies at 13,504 and 14,157 target sequences, respectively, in human cells. We found that there were only modest asymmetric correlations between the activities of the base editors and Cas9 at the same targets. Using deep-learning-based computational modeling, we built tools to predict the efficiencies and outcome frequencies of ABE- and CBE-directed editing at any target sequence, with Pearson correlations ranging from 0.50 to 0.95. These tools and results will facilitate modeling and therapeutic correction of genetic diseases by base editing.


Asunto(s)
Adenina , Citosina , Edición Génica/métodos , Reparación del Gen Blanco/métodos , Aminohidrolasas/metabolismo , Proteína 9 Asociada a CRISPR/metabolismo , Sistemas CRISPR-Cas , Citosina Desaminasa/metabolismo , Ingeniería Genética , Genoma Humano/genética , Células HEK293 , Humanos , Mutación Puntual , ARN Guía de Kinetoplastida/genética
17.
Sci Adv ; 5(11): eaax9249, 2019 11.
Artículo en Inglés | MEDLINE | ID: mdl-31723604

RESUMEN

We evaluated SpCas9 activities at 12,832 target sequences using a high-throughput approach based on a human cell library containing single-guide RNA-encoding and target sequence pairs. Deep learning-based training on this large dataset of SpCas9-induced indel frequencies led to the development of a SpCas9 activity-predicting model named DeepSpCas9. When tested against independently generated datasets (our own and those published by other groups), DeepSpCas9 showed high generalization performance. DeepSpCas9 is available at http://deepcrispr.info/DeepSpCas9.


Asunto(s)
Proteína 9 Asociada a CRISPR/metabolismo , Sistemas CRISPR-Cas , Aprendizaje Profundo , ARN Guía de Kinetoplastida/metabolismo , Edición Génica/métodos , Humanos , Internet , Mutación , ARN Guía de Kinetoplastida/genética , Reproducibilidad de los Resultados
18.
Nat Biotechnol ; 36(3): 239-241, 2018 03.
Artículo en Inglés | MEDLINE | ID: mdl-29431740

RESUMEN

We present two algorithms to predict the activity of AsCpf1 guide RNAs. Indel frequencies for 15,000 target sequences were used in a deep-learning framework based on a convolutional neural network to train Seq-deepCpf1. We then incorporated chromatin accessibility information to create the better-performing DeepCpf1 algorithm for cell lines for which such information is available and show that both algorithms outperform previous machine learning algorithms on our own and published data sets.


Asunto(s)
Sistemas CRISPR-Cas/genética , Endonucleasas/genética , ARN Guía de Kinetoplastida/genética , Algoritmos , Línea Celular , Aprendizaje Profundo , Redes Neurales de la Computación
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA