Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 24(1): 111, 2023 Mar 23.
Artigo em Inglês | MEDLINE | ID: mdl-36959531

RESUMO

Synchronization (insertions-deletions) errors are still a major challenge for reliable information retrieval in DNA storage. Unlike traditional error correction codes (ECC) that add redundancy in the stored information, multiple sequence alignment (MSA) solves this problem by searching the conserved subsequences. In this paper, we conduct a comprehensive simulation study on the error correction capability of a typical MSA algorithm, MAFFT. Our results reveal that its capability exhibits a phase transition when there are around 20% errors. Below this critical value, increasing sequencing depth can eventually allow it to approach complete recovery. Otherwise, its performance plateaus at some poor levels. Given a reasonable sequencing depth (≤ 70), MSA could achieve complete recovery in the low error regime, and effectively correct 90% of the errors in the medium error regime. In addition, MSA is robust to imperfect clustering. It could also be combined with other means such as ECC, repeated markers, or any other code constraints. Furthermore, by selecting an appropriate sequencing depth, this strategy could achieve an optimal trade-off between cost and reading speed. MSA could be a competitive alternative for future DNA storage.


Assuntos
Algoritmos , DNA , Alinhamento de Sequência , DNA/genética , Simulação por Computador , Análise de Sequência de DNA
2.
J Chem Inf Model ; 63(12): 3967-3976, 2023 06 26.
Artigo em Inglês | MEDLINE | ID: mdl-37289182

RESUMO

Synthetic DNA has been widely considered an attractive medium for digital data storage. However, the random insertion-deletion-substitution (IDS) errors in the sequenced reads still remain a critical challenge to reliable data recovery. Motivated by the modulation technique in the communication field, we propose a new DNA storage architecture to solve this problem. The main idea is that all binary data are modulated into DNA sequences with the same AT/GC patterns, which facilitate the detection of indels in noisy reads. The modulation signal could not only satisfy the encoding constraints but also serve as prior information to detect the potential positions of errors. Experiments on simulation and real data sets demonstrate that modulation encoding provides a simple way to comply with biological constraints for sequence encoding (i.e., balanced GC content and avoiding homopolymers). Furthermore, modulation decoding is highly efficient and extremely robust, which can correct up to ∼40% of errors. In addition, it is robust to imperfect clustering reconstruction, which is very common in practice. Although our method has a relatively low logical density of 1.0 bits/nt, its high robustness may provide a wide space for developing low-cost synthetic technologies. We believe this new architecture may boost the early coming of large-scale DNA storage applications in the future.


Assuntos
DNA , Armazenamento e Recuperação da Informação , Análise de Sequência de DNA/métodos , DNA/genética , Simulação por Computador
3.
iScience ; 27(7): 110025, 2024 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-38974972

RESUMO

Drug repurposing is a promising approach to find new therapeutic indications for approved drugs. Many computational approaches have been proposed to prioritize candidate anticancer drugs by gene or pathway level. However, these methods neglect the changes in gene interactions at the edge level. To address the limitation, we develop a computational drug repurposing method (iEdgePathDDA) based on edge information and pathway topology. First, we identify drug-induced and disease-related edges (the changes in gene interactions) within pathways by using the Pearson correlation coefficient. Next, we calculate the inhibition score between drug-induced edges and disease-related edges. Finally, we prioritize drug candidates according to the inhibition score on all disease-related edges. Case studies show that our approach successfully identifies new drug-disease pairs based on CTD database. Compared to the state-of-the-art approaches, the results demonstrate our method has the superior performance in terms of five metrics across colorectal, breast, and lung cancer datasets.

4.
Interdiscip Sci ; 15(3): 419-432, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37016040

RESUMO

With the rapid development of DNA (deoxyribonucleic acid) storage technologies, storing digital images in DNA is feasible. Meanwhile, the information security in DNA storage system is still a problem to solve. Therefore, in this paper, we propose a DNA storage-oriented image encryption algorithm utilizing the information processing mechanisms in molecule biology. The basic idea is to perform pixel replacement by gene hybridization, and implement dual diffusion by pixel diffusion and gene mutation. The ciphertext DNA image can be synthesized and stored in DNA storage system after encryption. Experimental results demonstrate it can resist common attacks, and shows a strong robustness against sequence loss and base substitution errors in the DNA storage channel. A DNA storage-oriented image encryption algorithm based on gene hybridization and gene mutation, First, we scramble rows and columns of the plaintext image by dynamic Josephus traversing. Second, we replace the pixels by gene hybridization. Finally, we diffuse the image matrix in binary domain and encode pixels into 8-base strands which are later further diffused by gene mutation. The ciphertext image can be synthesized according to the mutant gene codes and stored in any DNA storage system.


Assuntos
Algoritmos , Segurança Computacional , Mutação/genética , Difusão , DNA/genética
5.
Front Bioeng Biotechnol ; 11: 1173763, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37152655

RESUMO

Introduction: Rapid development in synthetic technologies has boosted DNA as a potential medium for large-scale data storage. Meanwhile, how to implement data security in the DNA storage system is still an unsolved problem. Methods: In this article, we propose an image encryption method based on the modulation-based storage architecture. The key idea is to take advantage of the unpredictable modulation signals to encrypt images in highly error-prone DNA storage channels. Results and Discussion: Numerical results have demonstrated that our image encryption method is feasible and effective with excellent security against various attacks (statistical, differential, noise, and data loss). When compared with other methods such as the hybridization reactions of DNA molecules, the proposed method is more reliable and feasible for large-scale applications.

6.
Comput Biol Med ; 166: 107548, 2023 Oct 02.
Artigo em Inglês | MEDLINE | ID: mdl-37801922

RESUMO

BACKGROUND: In single-stranded DNAs/RNAs, secondary structures are very common especially in long sequences. It has been recognized that the high degree of secondary structures in DNA sequences could interfere with the correct writing and reading of information in DNA storage. However, how to circumvent its side-effect is seldom studied. METHOD: As the degree of secondary structures of DNA sequences is closely related to the magnitude of the free energy released in the complicated folding process, we first investigate the free-energy distribution at different encoding lengths based on randomly generated DNA sequences. Then, we construct a bidirectional long short-term (BiLSTM)-attention deep learning model to predict the free energy of sequences. RESULTS: Our simulation results indicate that the free energy of DNA sequences at a specific length follows a right skewed distribution and the mean increases as the length increases. Given a tolerable free energy threshold of 20 kcal/mol, we could control the ratio of serious secondary structures in the encoding sequences to within 1% of the significant level through selecting a feasible encoding length of 100 nt. Compared with traditional deep learning models, the proposed model could achieve a better prediction performance both in the mean relative error (MRE) and the coefficient of determination (R2). It achieved MRE = 0.109 and R2 = 0.918 respectively in the simulation experiment. The combination of the BiLSTM and attention module can handle the long-term dependencies and capture the feature of base pairing. Further, the prediction has a linear time complexity which is suitable for detecting sequences with severe secondary structures in future large-scale applications. Finally, 70 of 94 predicted free energy can be screened out on a real dataset. It demonstrates that the proposed model could screen out some highly suspicious sequences which are prone to produce more errors and low sequencing copies.

7.
Comput Struct Biotechnol J ; 21: 4446-4455, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37731599

RESUMO

Numerous computational drug repurposing methods have emerged as efficient alternatives to costly and time-consuming traditional drug discovery approaches. Some of these methods are based on the assumption that the candidate drug should have a reversal effect on disease-associated genes. However, such methods are not applicable in the case that there is limited overlap between disease-related genes and drug-perturbed genes. In this study, we proposed a novel Drug Repurposing method based on the Inhibition Effect on gene regulatory network (DRIE) to identify potential drugs for cancer treatment. DRIE integrated gene expression profile and gene regulatory network to calculate inhibition score by using the shortest path in the disease-specific network. The results on eleven datasets indicated the superior performance of DRIE when compared to other state-of-the-art methods. Case studies showed that our method effectively discovered novel drug-disease associations. Our findings demonstrated that the top-ranked drug candidates had been already validated by CTD database. Additionally, it clearly identified potential agents for three cancers (colorectal, breast, and lung cancer), which was beneficial when annotating drug-disease relationships in the CTD. This study proposed a novel framework for drug repurposing, which would be helpful for drug discovery and development.

8.
Interdiscip Sci ; 14(1): 141-150, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-34463928

RESUMO

DNA storage has been a thriving interdisciplinary research area because of its high density, low maintenance cost, and long durability for information storage. However, the complexity of errors in DNA sequences including substitutions, insertions and deletions hinders its application for massive data storage. Motivated by the divide-and-conquer algorithm, we propose a hierarchical error correction strategy for text DNA storage. The basic idea is to design robust codes for common characters which have one-base error correction ability including insertion and/or deletion. The errors are gradually corrected by the codes in DNA reads, multiple alignment of character lines, and finally word spelling. On one hand, the proposed encoding method provides a systematic way to design storage friendly codes, such as 50% GC content, no more than 2-base homopolymers, and robustness against secondary structures. On the other hand, the proposed error correction method not only corrects single insertion or deletion, but also deals with multiple insertions or deletions. Simulation results demonstrate that the proposed method can correct more than 98% errors when error rate is less than or equal to 0.05. Thus, it is more powerful and adaptable to the complicated DNA storage applications.


Assuntos
Algoritmos , DNA , Sequência de Bases , Simulação por Computador , DNA/química , Análise de Sequência de DNA/métodos
9.
IET Syst Biol ; 12(6): 273-278, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-30472691

RESUMO

MicroRNAs (miRNAs) are a class of small endogenous non-coding genes that play important roles in post-transcriptional regulation as well as other important biological processes. Accumulating evidence indicated that miRNAs were extensively involved in the pathology of cancer. However, determining which miRNAs are related to a specific cancer is problematic because one miRNA may target multiple genes and one gene may be targeted by multiple miRNAs. The authors proposed a new approach, named miR_SubPath, to identify cancer-associated miRNAs by three steps. The targeted genes were determined based on differentially expressed genes in significant dysfunctional subpathways. Then the candidate miRNAs were determined according to miRNA-genes associations. Finally, these candidate miRNAs were ranked based on their relations with some seed miRNAs in a functional similarity network. Results on real-world datasets showed that the proposed miR_SubPath method was more robust and could identify more cancer-related miRNAs than a prior approach, miR_Path, miR_Clust and Zhang's method.


Assuntos
Biologia Computacional/métodos , MicroRNAs/genética , Neoplasias/genética , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Humanos
10.
IET Syst Biol ; 12(4): 148-153, 2018 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-33451179

RESUMO

Boolean networks are widely used to model gene regulatory networks and to design therapeutic intervention strategies to affect the long-term behavior of systems. Here, the authors investigate the 1 bit perturbation, which falls under the category of structural intervention. The authors' idea is that, if and only if a perturbed state evolves from a desirable attractor to an undesirable attractor or from an undesirable attractor to a desirable attractor, then the size of basin of attractor of a desirable attractor may decrease or increase. In this case, if the authors obtain the net BOS of the perturbed states, they can quickly obtain the optimal 1 bit perturbation by finding the maximum value of perturbation gain. Results from both synthetic and real biological networks show that the proposed algorithm is not only simpler and but also performs better than the previous basin-of-states (BOS)-based algorithm by Hu et al..

11.
Comput Biol Chem ; 71: 236-244, 2017 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-28988640

RESUMO

rlying biology of differentially expressed genes and proteins. Although various approaches have been proposed to identify cancer-related pathways, most of them only partially consider the influence of those differentially expressed genes, such as the gene numbers, their perturbation in the signaling transduction, and the interaction between genes. Signaling-pathway impact analysis (SPIA) provides a convenient framework which considers both the classical enrichment analysis and the actual perturbation on a given pathway. In this study, we extended previous proposed SPIA by incorporating the importance and specificity of genes (SPIA-IS). We applied this approach to six datasets for colorectal cancer, lung cancer, and pancreatic cancer. Results from these datasets showed that the proposed SPIA-IS could effectively improve the performance of the original SPIA in identifying cancer-related pathways.


Assuntos
Neoplasias Colorretais/genética , Biologia Computacional , Neoplasias Pulmonares/genética , Neoplasias Pancreáticas/genética , Transdução de Sinais/genética , Neoplasias Colorretais/metabolismo , Bases de Dados Genéticas , Redes Reguladoras de Genes , Humanos , Neoplasias Pulmonares/metabolismo , Neoplasias Pancreáticas/metabolismo
12.
IET Syst Biol ; 10(4): 147-52, 2016 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-27444024

RESUMO

Signalling pathway analysis is a popular approach that is used to identify significant cancer-related pathways based on differentially expressed genes (DEGs) from biological experiments. The main advantage of signalling pathway analysis lies in the fact that it assesses both the number of DEGs and the propagation of signal perturbation in signalling pathways. However, this method simplifies the interactions between genes by categorising them only as activation (+1) and suppression (-1), which does not encompass the range of interactions in real pathways, where interaction strength between genes may vary. In this study, the authors used newly developed signalling pathway impact analysis (SPIA) methods, SPIA based on Pearson correlation coefficient (PSPIA), and mutual information (MSPIA), to measure the interaction strength between pairs of genes. In analyses of a colorectal cancer dataset, a lung cancer dataset, and a pancreatic cancer dataset, PSPIA and MSPIA identified more candidate cancer-related pathways than were identified by SPIA. Generally, MSPIA performed better than PSPIA.


Assuntos
Neoplasias Colorretais/genética , Redes Reguladoras de Genes , Neoplasias Pulmonares/genética , Neoplasias Pancreáticas/genética , Transdução de Sinais , Biologia Computacional/métodos , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Humanos
13.
Sci Rep ; 6: 26247, 2016 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-27196530

RESUMO

Boolean networks are widely used to model gene regulatory networks and to design therapeutic intervention strategies to affect the long-term behavior of systems. In this paper, we investigate the less-studied one-bit perturbation, which falls under the category of structural intervention. Previous works focused on finding the optimal one-bit perturbation to maximally alter the steady-state distribution (SSD) of undesirable states through matrix perturbation theory. However, the application of the SSD is limited to Boolean networks with about ten genes. In 2007, Xiao et al. proposed to search the optimal one-bit perturbation by altering the sizes of the basin of attractions (BOAs). However, their algorithm requires close observation of the state-transition diagram. In this paper, we propose an algorithm that efficiently determines the BOA size after a perturbation. Our idea is that, if we construct the basin of states for all states, then the size of the BOA of perturbed networks can be obtained just by updating the paths of the states whose transitions have been affected. Results from both synthetic and real biological networks show that the proposed algorithm performs better than the exhaustive SSD-based algorithm and can be applied to networks with about 25 genes.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa