Pesquisa | Secretaria de Estado da Saúde

SLHSD: hybrid scaffolding method based on short and long reads.

Luo, Junwei; Guan, Ting; Chen, Guolin; Yu, Zhonghua; Zhai, Haixia; Yan, Chaokun; Luo, Huimin.

Brief Bioinform ; 24(3)2023 05 19.

Artigo em Inglês | MEDLINE | ID: mdl-37141142

RESUMO

In genome assembly, scaffolding can obtain more complete and continuous scaffolds. Current scaffolding methods usually adopt one type of read to construct a scaffold graph and then orient and order contigs. However, scaffolding with the strengths of two or more types of reads seems to be a better solution to some tricky problems. Combining the advantages of different types of data is significant for scaffolding. Here, a hybrid scaffolding method (SLHSD) is present that simultaneously leverages the precision of short reads and the length advantage of long reads. Building an optimal scaffold graph is an important foundation for getting scaffolds. SLHSD uses a new algorithm that combines long and short read alignment information to determine whether to add an edge and how to calculate the edge weight in a scaffold graph. In addition, SLHSD develops a strategy to ensure that edges with high confidence can be added to the graph with priority. Then, a linear programming model is used to detect and remove remaining false edges in the graph. We compared SLHSD with other scaffolding methods on five datasets. Experimental results show that SLHSD outperforms other methods. The open-source code of SLHSD is available at https://github.com/luojunwei/SLHSD.

Assuntos

Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Modelos Lineares

INSnet: a method for detecting insertions based on deep learning network.

Gao, Runtian; Luo, Junwei; Ding, Hongyu; Zhai, Haixia.

BMC Bioinformatics ; 24(1): 80, 2023 Mar 06.

Artigo em Inglês | MEDLINE | ID: mdl-36879189

RESUMO

BACKGROUND: Many studies have shown that structural variations (SVs) strongly impact human disease. As a common type of SV, insertions are usually associated with genetic diseases. Therefore, accurately detecting insertions is of great significance. Although many methods for detecting insertions have been proposed, these methods often generate some errors and miss some variants. Hence, accurately detecting insertions remains a challenging task. RESULTS: In this paper, we propose a method named INSnet to detect insertions using a deep learning network. First, INSnet divides the reference genome into continuous sub-regions and takes five features for each locus through alignments between long reads and the reference genome. Next, INSnet uses a depthwise separable convolutional network. The convolution operation extracts informative features through spatial information and channel information. INSnet uses two attention mechanisms, the convolutional block attention module (CBAM) and efficient channel attention (ECA) to extract key alignment features in each sub-region. In order to capture the relationship between adjacent subregions, INSnet uses a gated recurrent unit (GRU) network to further extract more important SV signatures. After predicting whether a sub-region contains an insertion through the previous steps, INSnet determines the precise site and length of the insertion. The source code is available from GitHub at https://github.com/eioyuou/INSnet . CONCLUSION: Experimental results show that INSnet can achieve better performance than other methods in terms of F1 score on real datasets.

Assuntos

Aprendizado Profundo , Humanos , Software

DGDTA: dynamic graph attention network for predicting drug-target binding affinity.

Zhai, Haixia; Hou, Hongli; Luo, Junwei; Liu, Xiaoyan; Wu, Zhengjiang; Wang, Junfeng.

BMC Bioinformatics ; 24(1): 367, 2023 Sep 30.

Artigo em Inglês | MEDLINE | ID: mdl-37777712

RESUMO

BACKGROUND: Obtaining accurate drug-target binding affinity (DTA) information is significant for drug discovery and drug repositioning. Although some methods have been proposed for predicting DTA, the features of proteins and drugs still need to be further analyzed. Recently, deep learning has been successfully used in many fields. Hence, designing a more effective deep learning method for predicting DTA remains attractive. RESULTS: Dynamic graph DTA (DGDTA), which uses a dynamic graph attention network combined with a bidirectional long short-term memory (Bi-LSTM) network to predict DTA is proposed in this paper. DGDTA adopts drug compound as input according to its corresponding simplified molecular input line entry system (SMILES) and protein amino acid sequence. First, each drug is considered a graph of interactions between atoms and edges, and dynamic attention scores are used to consider which atoms and edges in the drug are most important for predicting DTA. Then, Bi-LSTM is used to better extract the contextual information features of protein amino acid sequences. Finally, after combining the obtained drug and protein feature vectors, the DTA is predicted by a fully connected layer. The source code is available from GitHub at https://github.com/luojunwei/DGDTA . CONCLUSIONS: The experimental results show that DGDTA can predict DTA more accurately than some other methods.

Assuntos

Sistemas de Liberação de Medicamentos , Descoberta de Drogas , Sequência de Aminoácidos , Reposicionamento de Medicamentos , Domínios Proteicos

Deep learning approach for cancer subtype classification using high-dimensional gene expression data.

Shen, Jiquan; Shi, Jiawei; Luo, Junwei; Zhai, Haixia; Liu, Xiaoyan; Wu, Zhengjiang; Yan, Chaokun; Luo, Huimin.

BMC Bioinformatics ; 23(1): 430, 2022 Oct 17.

Artigo em Inglês | MEDLINE | ID: mdl-36253710

RESUMO

MOTIVATION: Studies have shown that classifying cancer subtypes can provide valuable information for a range of cancer research, from aetiology and tumour biology to prognosis and personalized treatment. Current methods usually adopt gene expression data to perform cancer subtype classification. However, cancer samples are scarce, and the high-dimensional features of their gene expression data are too sparse to allow most methods to achieve desirable classification results. RESULTS: In this paper, we propose a deep learning approach by combining a convolutional neural network (CNN) and bidirectional gated recurrent unit (BiGRU): our approach, DCGN, aims to achieve nonlinear dimensionality reduction and learn features to eliminate irrelevant factors in gene expression data. Specifically, DCGN first uses the synthetic minority oversampling technique algorithm to equalize data. The CNN can handle high-dimensional data without stress and extract important local features, and the BiGRU can analyse deep features and retain their important information; the DCGN captures key features by combining both neural networks to overcome the challenges of small sample sizes and sparse, high-dimensional features. In the experiments, we compared the DCGN to seven other cancer subtype classification methods using breast and bladder cancer gene expression datasets. The experimental results show that the DCGN performs better than the other seven methods and can provide more satisfactory classification results.

Assuntos

Aprendizado Profundo , Neoplasias , Algoritmos , Expressão Gênica , Neoplasias/genética , Redes Neurais de Computação

BreakNet: detecting deletions using long reads and a deep learning approach.

Luo, Junwei; Ding, Hongyu; Shen, Jiquan; Zhai, Haixia; Wu, Zhengjiang; Yan, Chaokun; Luo, Huimin.

BMC Bioinformatics ; 22(1): 577, 2021 Dec 02.

Artigo em Inglês | MEDLINE | ID: mdl-34856923

RESUMO

BACKGROUND: Structural variations (SVs) occupy a prominent position in human genetic diversity, and deletions form an important type of SV that has been suggested to be associated with genetic diseases. Although various deletion calling methods based on long reads have been proposed, a new approach is still needed to mine features in long-read alignment information. Recently, deep learning has attracted much attention in genome analysis, and it is a promising technique for calling SVs. RESULTS: In this paper, we propose BreakNet, a deep learning method that detects deletions by using long reads. BreakNet first extracts feature matrices from long-read alignments. Second, it uses a time-distributed convolutional neural network (CNN) to integrate and map the feature matrices to feature vectors. Third, BreakNet employs a bidirectional long short-term memory (BLSTM) model to analyse the produced set of continuous feature vectors in both the forward and backward directions. Finally, a classification module determines whether a region refers to a deletion. On real long-read sequencing datasets, we demonstrate that BreakNet outperforms Sniffles, SVIM and cuteSV in terms of their F1 scores. The source code for the proposed method is available from GitHub at https://github.com/luojunwei/BreakNet . CONCLUSIONS: Our work shows that deep learning can be combined with long reads to call deletions more effectively than existing methods.

Assuntos

Aprendizado Profundo , Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA , Software

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa