RESUMO
The ability to sequence single protein molecules in their native, full-length form would enable a more comprehensive understanding of proteomic diversity. Current technologies, however, are limited in achieving this goal1,2. Here, we establish a method for the long-range, single-molecule reading of intact protein strands on a commercial nanopore sensor array. By using the ClpX unfoldase to ratchet proteins through a CsgG nanopore3,4, we provide single-molecule evidence that ClpX translocates substrates in two-residue steps. This mechanism achieves sensitivity to single amino acids on synthetic protein strands hundreds of amino acids in length, enabling the sequencing of combinations of single-amino-acid substitutions and the mapping of post-translational modifications, such as phosphorylation. To enhance classification accuracy further, we demonstrate the ability to reread individual protein molecules multiple times, and we explore the potential for highly accurate protein barcode sequencing. Furthermore, we develop a biophysical model that can simulate raw nanopore signals a priori on the basis of residue volume and charge, enhancing the interpretation of raw signal data. Finally, we apply these methods to examine full-length, folded protein domains for complete end-to-end analysis. These results provide proof of concept for a platform that has the potential to identify and characterize full-length proteoforms at single-molecule resolution.
Assuntos
Endopeptidase Clp , Nanoporos , Endopeptidase Clp/química , Endopeptidase Clp/metabolismo , Imagem Individual de Molécula/métodos , Processamento de Proteína Pós-Traducional , Domínios Proteicos , Proteínas/química , Proteínas/metabolismo , Fosforilação , Substituição de Aminoácidos , Análise de Sequência de Proteína/métodosRESUMO
As global demand for digital storage capacity grows, storage technologies based on synthetic DNA have emerged as a dense and durable alternative to traditional media. Existing approaches leverage robust error correcting codes and precise molecular mechanisms to reliably retrieve specific files from large databases. Typically, files are retrieved using a pre-specified key, analogous to a filename. However, these approaches lack the ability to perform more complex computations over the stored data, such as similarity search: e.g., finding images that look similar to an image of interest without prior knowledge of their file names. Here we demonstrate a technique for executing similarity search over a DNA-based database of 1.6 million images. Queries are implemented as hybridization probes, and a key step in our approach was to learn an image-to-sequence encoding ensuring that queries preferentially bind to targets representing visually similar images. Experimental results show that our molecular implementation performs comparably to state-of-the-art in silico algorithms for similarity search.
Assuntos
Biologia Computacional/métodos , DNA/química , Bases de Dados Genéticas , Armazenamento e Recuperação da Informação , Algoritmos , Sequência de Bases , Simulação por Computador , DNA/genética , Sondas de DNA , Bases de Dados Factuais , Redes Neurais de ComputaçãoRESUMO
Molecular tagging is an approach to labeling physical objects using DNA or other molecules that can be used when methods such as RFID tags and QR codes are unsuitable. No molecular tagging method exists that is inexpensive, fast and reliable to decode, and usable in minimal resource environments to create or read tags. To address this, we present Porcupine, an end-user molecular tagging system featuring DNA-based tags readable within seconds using a portable nanopore device. Porcupine's digital bits are represented by the presence or absence of distinct DNA strands, called molecular bits (molbits). We classify molbits directly from raw nanopore signal, avoiding basecalling. To extend shelf life, decrease readout time, and make tags robust to environmental contamination, molbits are prepared for readout during tag assembly and can be stabilized by dehydration. The result is an extensible, real-time, high accuracy tagging system that includes an approach to developing highly separable barcodes.