RESUMO
The selection of a DNA aptamer through the Systematic Evolution of Ligands by EXponential enrichment (SELEX) method involves multiple binding steps, in which a target and a library of randomized DNA sequences are mixed for selection of a single, nucleotide-specific molecule. Usually, 10 to 20 steps are required for SELEX to be completed. Throughout this process it is necessary to discriminate between true DNA aptamers and unspecified DNA-binding sequences. Thus, a novel machine learning-based approach was developed to support and simplify the early steps of the SELEX process, to help discriminate binding between DNA aptamers from those unspecified targets of DNA-binding sequences. An Artificial Intelligence (AI) approach to identify aptamers were implemented based on Natural Language Processing (NLP) and Machine Learning (ML). NLP method (CountVectorizer) was used to extract information from the nucleotide sequences. Four ML algorithms (Logistic Regression, Decision Tree, Gaussian Naïve Bayes, Support Vector Machines) were trained using data from the NLP method along with sequence information. The best performing model was Support Vector Machines because it had the best ability to discriminate between positive and negative classes. In our model, an Accuracy (A) of 0.995, the fraction of samples that the model correctly classified, and an Area Under the Receiving Operating Curve (AUROC) of 0.998, the degree by which a model is capable of distinguishing between classes, were observed. The developed AI approach is useful to identify potential DNA aptamers to reduce the amount of rounds in a SELEX selection. This new approach could be applied in the design of DNA libraries and result in a more efficient and faster process for DNA aptamers to be chosen during SELEX.
Assuntos
Aptâmeros de Nucleotídeos/metabolismo , Inteligência Artificial , Técnica de Seleção de Aptâmeros/métodos , Algoritmos , Aptâmeros de Nucleotídeos/química , Teorema de Bayes , Biologia Computacional , Árvores de Decisões , Biblioteca Gênica , Humanos , Ligantes , Modelos Logísticos , Aprendizado de Máquina , Processamento de Linguagem Natural , Ligação Proteica , Técnica de Seleção de Aptâmeros/estatística & dados numéricos , Máquina de Vetores de SuporteRESUMO
The search for high-affinity aptamers for targets such as proteins, small molecules, or cancer cells remains a formidable endeavor. Systematic Evolution of Ligands by EXponential Enrichment (SELEX) offers an iterative process to discover these aptamers through evolutionary selection of high-affinity candidates from a highly diverse random pool. This randomness dictates an unknown population distribution of fitness parameters, encoded by the binding affinities, toward SELEX targets. Adding to this uncertainty, repeating SELEX under identical conditions may lead to variable outcomes. These uncertainties pose a challenge when tuning selection pressures to isolate high-affinity ligands. Here, we present a stochastic hybrid model that describes the evolutionary selection of aptamers to explore the impact of these unknowns. To our surprise, we find that even single copies of high-affinity ligands in a pool of billions can strongly influence population dynamics, yet their survival is highly dependent on chance. We perform Monte Carlo simulations to explore the impact of environmental parameters, such as the target concentration, on selection efficiency in SELEX and identify strategies to control these uncertainties to ultimately improve the outcome and speed of this time- and resource-intensive process.
Assuntos
Aptâmeros de Nucleotídeos/química , Ácidos Nucleicos/química , Proteínas/química , Técnica de Seleção de Aptâmeros/estatística & dados numéricos , Bibliotecas de Moléculas Pequenas/química , Sítios de Ligação , Ligação Competitiva , Humanos , Cinética , Ligantes , Método de Monte Carlo , Processos Estocásticos , IncertezaRESUMO
Long non-coding RNAs (lncRNAs) are associated to a plethora of cellular functions, most of which require the interaction with one or more RNA-binding proteins (RBPs); similarly, RBPs are often able to bind a large number of different RNAs. The currently available knowledge is already drawing an intricate network of interactions, whose deregulation is frequently associated to pathological states. Several different techniques were developed in the past years to obtain protein-RNA binding data in a high-throughput fashion. In parallel, in silico inference methods were developed for the accurate computational prediction of the interaction of RBP-lncRNA pairs. The field is growing rapidly, and it is foreseeable that in the near future, the protein-lncRNA interaction network will rise, offering essential clues for a better understanding of lncRNA cellular mechanisms and their disease-associated perturbations.
Assuntos
RNA Longo não Codificante/metabolismo , Proteínas de Ligação a RNA/metabolismo , Biologia Computacional/métodos , Simulação por Computador , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Modelos Moleculares , Conformação de Ácido Nucleico , Conformação Proteica , Mapas de Interação de Proteínas/genética , RNA Longo não Codificante/química , RNA Longo não Codificante/genética , Proteínas de Ligação a RNA/química , Proteínas de Ligação a RNA/genética , Técnica de Seleção de Aptâmeros/estatística & dados numéricosRESUMO
In aptamer-facilitated biomarker discovery (AptaBiD), aptamers are selected from a library of random DNA (or RNA) sequences for their ability to specifically bind cell-surface biomarkers. The library is incubated with intact cells, and cell-bound DNA molecules are separated from those unbound and amplified by the polymerase chain reaction (PCR). The partitioning/amplification cycle is repeated multiple times while alternating target cells and control cells. Efficient aptamer selection in AptaBiD relies on the inclusion of masking DNA within the cell and library mixture. Masking DNA lacks primer regions for PCR amplification and is typically taken in excess to the library. The role of masking DNA within the selection mixture is to outcompete any nonspecific binding sequences within the initial library, thus allowing specific DNA sequences (i.e., aptamers) to be selected more efficiently. Efficient AptaBiD requires an optimum ratio of masking DNA to library DNA, at which aptamers still bind specific binding sites but nonaptamers within the library do not bind nonspecific binding sites. Here, we have developed a mathematical model that describes the binding processes taking place within the equilibrium mixture of masking DNA, library DNA, and target cells. An obtained mathematical solution allows one to estimate the concentration of masking DNA that is required to outcompete the library DNA at a desirable ratio of bound masking DNA to bound library DNA. The required concentration depends on concentrations of the library and cells as well as on unknown cell characteristics. These characteristics include the concentration of total binding sites on the cell surface, N, and equilibrium dissociation constants, K(nsL) and K(nsM), for nonspecific binding of the library DNA and masking DNA, respectively. We developed a theory that allows the determination of N, K(nsL), and K(nsM) based on measurements of EC50 values for cells mixed separately with the library and masking DNA (EC50 is the concentration of fluorescently labeled DNA at which half of the maximum fluorescence signal from DNA-bound cells is reached). We also obtained expressions for signals from bound DNA (measured by flow cytometry) in terms of N, K(nsL), and K(nsM). These expressions can be used for the verification of N, K(nsL), and K(nsM) values found from EC50 measurements. The developed procedure was applied to MCF-7 breast cancer cells, and corresponding values of N, K(nsL), and K(nsM) were established for the first time. The concentration of masking DNA required for AptaBiD with MCF-7 breast cancer cells was also estimated.
Assuntos
Aptâmeros de Nucleotídeos/genética , DNA de Neoplasias/análise , Citometria de Fluxo/estatística & dados numéricos , Modelos Químicos , Técnica de Seleção de Aptâmeros/estatística & dados numéricos , Sítios de Ligação , Ligação Competitiva , Biomarcadores/análise , Linhagem Celular Tumoral , Primers do DNA/genética , DNA de Neoplasias/genética , Feminino , Biblioteca Gênica , Humanos , Cinética , Reação em Cadeia da Polimerase , Técnica de Seleção de Aptâmeros/métodosRESUMO
Systematic evolution of ligands by exponential (SELEX) is a revolutionary technology that integrates combinatorial chemistry with high throughput screening to generate from synthesized nucleic acid ligand libraries the high affinity nucleic acid ligands (aptamers) for interesting targets. Recently, the SELEX experiments have advanced from targeting the ligand libraries by a single purified target to multiple heterogeneous target samples. Having the potential of bringing enormous technical and economical advantages to drug discovery, the new application suffers from unpredictable performances. To gain an insight of the new method, we develop a computer model to numerically analyze the subtractive SELEX alternatively against two distinct heterogeneous samples of unknown targets. The model features the discretization of ligand library, the ligand-target binding equilibrium equations, and the separation efficiency of bound and unbound ligands in experiments. By computer simulations, we investigate how aptamers for desired targets embedded in undefined target mixtures are generated under different experimental conditions. We find the iterative screening scheme is fundamentally capable of developing desired aptamers. On the other hand, target sample configuration and separation efficiency may all together significantly diversify the screening dynamics and results.
Assuntos
Técnica de Seleção de Aptâmeros/estatística & dados numéricos , Aptâmeros de Nucleotídeos/síntese química , Aptâmeros de Nucleotídeos/metabolismo , Simulação por Computador , Cinética , Ligantes , Modelos EstatísticosRESUMO
Disrupted or abnormal biological processes responsible for cancers often quantitatively manifest as disrupted additive and multiplicative interactions of gene/protein expressions correlating with cancer progression. However, the examination of all possible combinatorial interactions between gene features in most case-control studies with limited training data is computationally infeasible. In this paper, we propose a practically feasible data integration approach, QUIRE (QUadratic Interactions among infoRmative fEatures), to identify discriminative complex interactions among informative gene features for cancer diagnosis and biomarker discovery directly based on patient blood samples. QUIRE works in two stages, where it first identifies functionally relevant gene groups for the disease with the help of gene functional annotations and available physical protein interactions, then it explores the combinatorial relationships among the genes from the selected informative groups. Based on our private experimentally generated data from patient blood samples using a novel SOMAmer (Slow Off-rate Modified Aptamer) technology, we apply QUIRE to cancer diagnosis and biomarker discovery for Renal Cell Carcinoma (RCC) and Ovarian Cancer (OVC). To further demonstrate the general applicability of our approach, we also apply QUIRE to a publicly available Colorectal Cancer (CRC) dataset that can be used to prioritize our SOMAmer design. Our experimental results show that QUIRE identifies gene-gene interactions that can better identify the different cancer stages of samples, as compared to other state-of-the-art feature selection methods. A literature survey shows that many of the interactions identified by QUIRE play important roles in the development of cancer.
Assuntos
Biomarcadores/sangue , Neoplasias/sangue , Neoplasias/diagnóstico , Inteligência Artificial , Carcinoma de Células Renais/sangue , Carcinoma de Células Renais/diagnóstico , Carcinoma de Células Renais/genética , Neoplasias Colorretais/sangue , Neoplasias Colorretais/diagnóstico , Neoplasias Colorretais/genética , Biologia Computacional , Progressão da Doença , Epistasia Genética , Feminino , Marcadores Genéticos , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Neoplasias Renais/sangue , Neoplasias Renais/diagnóstico , Neoplasias Renais/genética , Modelos Genéticos , Neoplasias/genética , Neoplasias Ovarianas/sangue , Neoplasias Ovarianas/diagnóstico , Neoplasias Ovarianas/genética , Técnica de Seleção de Aptâmeros/estatística & dados numéricosRESUMO
Finding a highly sensitive diagnostic technique for malaria has challenged scientists for the last century. In the present study, we identified versatile single-strand DNA aptamers for Plasmodium lactate dehydrogenase (pLDH), a biomarker for malaria, via the Systematic Evolution of Ligands by EXponential enrichment (SELEX). The pLDH aptamers selectively bound to the target proteins with high sensitivity (K(d)=16.8-49.6 nM). The selected aptamers were characterized using an electrophoretic mobility shift assay, a quartz crystal microbalance, a fluorescence assay, and circular dichroism spectroscopy. We also designed a simple aptasensor using electrochemical impedance spectroscopy; both Plasmodium vivax LDH and Plasmodium falciparum LDH were selectively detected with a detection limit of 1 pM. Furthermore, the pLDH aptasensor clearly distinguished between malaria-positive blood samples of two major species (P. vivax and P. falciparum) and a negative control, indicating that it may be a useful tool for the diagnosis, monitoring, and surveillance of malaria.
Assuntos
Aptâmeros de Nucleotídeos , Técnicas Biossensoriais/métodos , L-Lactato Desidrogenase/sangue , Malária/diagnóstico , Plasmodium/enzimologia , Técnica de Seleção de Aptâmeros/métodos , Aptâmeros de Nucleotídeos/química , Sequência de Bases , Biomarcadores/sangue , Técnicas Biossensoriais/estatística & dados numéricos , Dicroísmo Circular , Espectroscopia Dielétrica , Ensaio de Desvio de Mobilidade Eletroforética , Humanos , Limite de Detecção , Malária/enzimologia , Malária/parasitologia , Malária Falciparum/diagnóstico , Malária Vivax/diagnóstico , Conformação de Ácido Nucleico , Técnicas de Microbalança de Cristal de Quartzo , Técnica de Seleção de Aptâmeros/estatística & dados numéricosRESUMO
The possibility of introducing a computationally assisted method to study aptamer-protein interaction was evaluated with the aim of streamlining the screening and selection of new aptamers. Starting from information on the 15-mer (5'-GGTTGGTGTGGTTGG-3') thrombin binding aptamer (TBA), a library of mutated DNA sequences (994 elements) was generated and screened using shapegauss a shape-based scoring function from openeye software to generate computationally derived binding scores. The TBA and three other mutated oligonucleotides, selected on the basis of their binding score (best, medium, worst), were incorporated into surface plasmon resonance (SPR) biosensors. By reducing the ionic strength (binding buffer, 50 mM TrisHCl pH 7.4, 140 mM NaCl, 1mM MgCl2, diluted 1:50) in order to match the simulated condition, the analytical performances of the four oligonucleotide sequences were compared using signal amplitude, sensitivity (slope), linearity (R²) and reproducibility (CVav %). The experimental results were in agreement with the simulation findings.
Assuntos
Aptâmeros de Nucleotídeos , Técnicas Biossensoriais/métodos , Aptâmeros de Nucleotídeos/química , Aptâmeros de Nucleotídeos/genética , Sequência de Bases , Sítios de Ligação , Técnicas Biossensoriais/estatística & dados numéricos , Biologia Computacional , Biblioteca Gênica , Morfolinas , Ressonância Magnética Nuclear Biomolecular , Conformação de Ácido Nucleico , Conformação Proteica , Técnica de Seleção de Aptâmeros/estatística & dados numéricos , Ressonância de Plasmônio de Superfície , Trombina/químicaRESUMO
An aptamer-based chromatographic strip assay method for rapid toxin detection was developed. The aptamer-based strip assay was based on the competition for the aptamer between ochratoxin A and DNA probes. The sensing results indicated that the sensitivity of the aptamer-based strip was better than that of conventional antibody-based strips. The visual limit of detection of the strip for qualitative detection was 1 ng/mL while the LOD for semi-quantitative detection could down to 0.18 ng/mL by using scanning reader. The recoveries of test samples were from 96% to 110%. All detections could be achieved in less than 10 min, indicating that the aptamer-based strip could be a potential useful tool for rapid on-site detections.