Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 83
Filtrar
1.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-37130580

RESUMO

Combination therapy is widely used to treat complex diseases, particularly in patients who respond poorly to monotherapy. For example, compared with the use of a single drug, drug combinations can reduce drug resistance and improve the efficacy of cancer treatment. Thus, it is vital for researchers and society to help develop effective combination therapies through clinical trials. However, high-throughput synergistic drug combination screening remains challenging and expensive in the large combinational space, where an array of compounds are used. To solve this problem, various computational approaches have been proposed to effectively identify drug combinations by utilizing drug-related biomedical information. In this study, considering the implications of various types of neighbor information of drug entities, we propose a novel end-to-end Knowledge Graph Attention Network to predict drug synergy (KGANSynergy), which utilizes neighbor information of known drugs/cell lines effectively. KGANSynergy uses knowledge graph (KG) hierarchical propagation to find multi-source neighbor nodes for drugs and cell lines. The knowledge graph attention network is designed to distinguish the importance of neighbors in a KG through a multi-attention mechanism and then aggregate the entity's neighbor node information to enrich the entity. Finally, the learned drug and cell line embeddings can be utilized to predict the synergy of drug combinations. Experiments demonstrated that our method outperformed several other competing methods, indicating that our method is effective in identifying drug combinations.


Assuntos
Ensaios de Triagem em Larga Escala , Reconhecimento Automatizado de Padrão , Humanos , Linhagem Celular , Terapia Combinada , Aprendizagem
2.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-37141142

RESUMO

In genome assembly, scaffolding can obtain more complete and continuous scaffolds. Current scaffolding methods usually adopt one type of read to construct a scaffold graph and then orient and order contigs. However, scaffolding with the strengths of two or more types of reads seems to be a better solution to some tricky problems. Combining the advantages of different types of data is significant for scaffolding. Here, a hybrid scaffolding method (SLHSD) is present that simultaneously leverages the precision of short reads and the length advantage of long reads. Building an optimal scaffold graph is an important foundation for getting scaffolds. SLHSD uses a new algorithm that combines long and short read alignment information to determine whether to add an edge and how to calculate the edge weight in a scaffold graph. In addition, SLHSD develops a strategy to ensure that edges with high confidence can be added to the graph with priority. Then, a linear programming model is used to detect and remove remaining false edges in the graph. We compared SLHSD with other scaffolding methods on five datasets. Experimental results show that SLHSD outperforms other methods. The open-source code of SLHSD is available at https://github.com/luojunwei/SLHSD.


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Modelos Lineares
3.
Proc Natl Acad Sci U S A ; 119(28): e2122534119, 2022 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-35867737

RESUMO

Photoinduced phase transition (PIPT) is always treated as a coherent process, but ultrafast disordering in PIPT is observed in recent experiments. Utilizing the real-time time-dependent density functional theory method, here we track the motion of individual vanadium (V) ions during PIPT in VO2 and uncover that their coherent or disordered dynamics can be manipulated by tuning the laser fluence. We find that the photoexcited holes generate a force on each V-V dimer to drive their collective coherent motion, in competing with the thermal-induced vibrations. If the laser fluence is so weak that the photoexcited hole density is too low to drive the phase transition alone, the PIPT is a disordered process due to the interference of thermal phonons. We also reveal that the photoexcited holes populated by the V-V dimerized bonding states will become saturated if the laser fluence is too strong, limiting the timescale of photoinduced phase transition.

4.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35580841

RESUMO

Structural variations (SVs) play important roles in human genetic diversity; deletions and insertions are two common types of SVs that have been proven to be associated with genetic diseases. Hence, accurately detecting and genotyping SVs is significant for disease research. Despite the fact that long-read sequencing technologies have improved the field of SV detection and genotyping, there are still some challenges that prevent satisfactory results from being obtained. In this paper, we propose MAMnet, a fast and scalable SV detection and genotyping method based on long reads and a combination of convolutional neural network and long short-term network. MAMnet uses a deep neural network to implement sensitive SV detection with a novel prediction strategy. On real long-read sequencing datasets, we demonstrate that MAMnet outperforms Sniffles, SVIM, cuteSV and PBSV in terms of their F1 scores while achieving better scaling performance. The source code is available from https://github.com/micahvista/MAMnet.


Assuntos
Aprendizado Profundo , Sequenciamento de Nucleotídeos em Larga Escala , Genoma Humano , Genótipo , Técnicas de Genotipagem , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Análise de Sequência de DNA/métodos , Software
5.
BMC Bioinformatics ; 24(1): 80, 2023 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-36879189

RESUMO

BACKGROUND: Many studies have shown that structural variations (SVs) strongly impact human disease. As a common type of SV, insertions are usually associated with genetic diseases. Therefore, accurately detecting insertions is of great significance. Although many methods for detecting insertions have been proposed, these methods often generate some errors and miss some variants. Hence, accurately detecting insertions remains a challenging task. RESULTS: In this paper, we propose a method named INSnet to detect insertions using a deep learning network. First, INSnet divides the reference genome into continuous sub-regions and takes five features for each locus through alignments between long reads and the reference genome. Next, INSnet uses a depthwise separable convolutional network. The convolution operation extracts informative features through spatial information and channel information. INSnet uses two attention mechanisms, the convolutional block attention module (CBAM) and efficient channel attention (ECA) to extract key alignment features in each sub-region. In order to capture the relationship between adjacent subregions, INSnet uses a gated recurrent unit (GRU) network to further extract more important SV signatures. After predicting whether a sub-region contains an insertion through the previous steps, INSnet determines the precise site and length of the insertion. The source code is available from GitHub at https://github.com/eioyuou/INSnet . CONCLUSION: Experimental results show that INSnet can achieve better performance than other methods in terms of F1 score on real datasets.


Assuntos
Aprendizado Profundo , Humanos , Software
6.
BMC Bioinformatics ; 24(1): 289, 2023 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-37468832

RESUMO

BACKGROUND: Cancer subtype classification is helpful for personalized cancer treatment. Although, some approaches have been developed to classifying caner subtype based on high dimensional gene expression data, it is difficult to obtain satisfactory classification results. Meanwhile, some cancers have been well studied and classified to some subtypes, which are adopt by most researchers. Hence, this priori knowledge is significant for further identifying new meaningful subtypes. RESULTS: In this paper, we present a combined parallel random forest and autoencoder approach for cancer subtype identification based on high dimensional gene expression data, ForestSubtype. ForestSubtype first adopts the parallel RF and the priori knowledge of cancer subtype to train a module and extract significant candidate features. Second, ForestSubtype uses a random forest as the base module and ten parallel random forests to compute each feature weight and rank them separately. Then, the intersection of the features with the larger weights output by the ten parallel random forests is taken as our subsequent candidate features. Third, ForestSubtype uses an autoencoder to condenses the selected features into a two-dimensional data. Fourth, ForestSubtype utilizes k-means++ to obtain new cancer subtype identification results. In this paper, the breast cancer gene expression data obtained from The Cancer Genome Atlas are used for training and validation, and an independent breast cancer dataset from the Molecular Taxonomy of Breast Cancer International Consortium is used for testing. Additionally, we use two other cancer datasets for validating the generalizability of ForestSubtype. ForestSubtype outperforms the other two methods in terms of the distribution of clusters, internal and external metric results. The open-source code is available at https://github.com/lffyd/ForestSubtype . CONCLUSIONS: Our work shows that the combination of high-dimensional gene expression data and parallel random forests and autoencoder, guided by a priori knowledge, can identify new subtypes more effectively than existing methods of cancer subtype classification.


Assuntos
Neoplasias da Mama , Algoritmo Florestas Aleatórias , Humanos , Feminino , Genômica , Software
7.
BMC Bioinformatics ; 24(1): 367, 2023 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-37777712

RESUMO

BACKGROUND: Obtaining accurate drug-target binding affinity (DTA) information is significant for drug discovery and drug repositioning. Although some methods have been proposed for predicting DTA, the features of proteins and drugs still need to be further analyzed. Recently, deep learning has been successfully used in many fields. Hence, designing a more effective deep learning method for predicting DTA remains attractive. RESULTS: Dynamic graph DTA (DGDTA), which uses a dynamic graph attention network combined with a bidirectional long short-term memory (Bi-LSTM) network to predict DTA is proposed in this paper. DGDTA adopts drug compound as input according to its corresponding simplified molecular input line entry system (SMILES) and protein amino acid sequence. First, each drug is considered a graph of interactions between atoms and edges, and dynamic attention scores are used to consider which atoms and edges in the drug are most important for predicting DTA. Then, Bi-LSTM is used to better extract the contextual information features of protein amino acid sequences. Finally, after combining the obtained drug and protein feature vectors, the DTA is predicted by a fully connected layer. The source code is available from GitHub at https://github.com/luojunwei/DGDTA . CONCLUSIONS: The experimental results show that DGDTA can predict DTA more accurately than some other methods.


Assuntos
Sistemas de Liberação de Medicamentos , Descoberta de Drogas , Sequência de Aminoácidos , Reposicionamento de Medicamentos , Domínios Proteicos
8.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33634311

RESUMO

In the field of genome assembly, scaffolding methods make it possible to obtain a more complete and contiguous reference genome, which is the cornerstone of genomic research. Scaffolding methods typically utilize the alignments between contigs and sequencing data (reads) to determine the orientation and order among contigs and to produce longer scaffolds, which are helpful for genomic downstream analysis. With the rapid development of high-throughput sequencing technologies, diverse types of reads have emerged over the past decade, especially in long-range sequencing, which have greatly enhanced the assembly quality of scaffolding methods. As the number of scaffolding methods increases, biology and bioinformatics researchers need to perform in-depth analyses of state-of-the-art scaffolding methods. In this article, we focus on the difficulties in scaffolding, the differences in characteristics among various kinds of reads, the methods by which current scaffolding methods address these difficulties, and future research opportunities. We hope this work will benefit the design of new scaffolding methods and the selection of appropriate scaffolding methods for specific biological studies.


Assuntos
Biologia Computacional/métodos , Mapeamento de Sequências Contíguas/métodos , Genoma , Software , Animais , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA
9.
Opt Express ; 31(11): 17921-17929, 2023 May 22.
Artigo em Inglês | MEDLINE | ID: mdl-37381513

RESUMO

Germanium-on-insulator (GOI) has emerged as a novel platform for Ge-based electronic and photonic applications. Discrete photonic devices, such as waveguides, photodetectors, modulators, and optical pumping lasers, have been successfully demonstrated on this platform. However, there is almost no report on the electrically injected Ge light source on the GOI platform. In this study, we present the first fabrication of vertical Ge p-i-n light-emitting diodes (LEDs) on a 150 mm GOI substrate. The high-quality Ge LED on a 150-mm diameter GOI substrate was fabricated via direct wafer bonding followed by ion implantations. As a tensile strain of 0.19% has been introduced during the GOI fabrication process resulting from the thermal mismatch, the LED devices exhibit a dominant direct bandgap transition peak near 0.785 eV (∼1580 nm) at room temperature. In sharp contrast to conventional III-V LEDs, we found that the electroluminescence (EL)/photoluminescence (PL) spectra show enhanced intensities as the temperature is raised from 300 to 450 K as a consequence of the higher occupation of the direct bandgap. The maximum enhancement in EL intensity is a factor of 140% near 1635 nm due to the improved optical confinement offered by the bottom insulator layer. This work potentially broadens the GOI's functional variety for applications in near-infrared sensing, electronics, and photonics.

10.
Phys Rev Lett ; 130(14): 146901, 2023 Apr 07.
Artigo em Inglês | MEDLINE | ID: mdl-37084436

RESUMO

In stark contrast to the conventional charge density wave (CDW) materials, the one-dimensional CDW on the In/Si(111) surface exhibits immediate damping of the CDW oscillation during the photoinduced phase transition. Here, we successfully reproduce the experimental observation of the photoinduced CDW transition on the In/Si(111) surface by performing real-time time-dependent density functional theory (rt-TDDFT) simulations. We show that photoexcitation promotes valence electrons from the Si substrate to the empty surface bands composed primarily of the covalent p-p bonding states of the long In-In bonds. Such photoexcitation generates interatomic forces to shorten the long In-In bonds and thus drives the structural transition. After the structural transition, these surface bands undergo a switch among different In-In bonds, causing a rotation of the interatomic forces by about π/6 and thus quickly damping the oscillations in feature CDW modes. These findings provide a deeper understanding of photoinduced phase transitions.

11.
BMC Bioinformatics ; 23(1): 430, 2022 Oct 17.
Artigo em Inglês | MEDLINE | ID: mdl-36253710

RESUMO

MOTIVATION: Studies have shown that classifying cancer subtypes can provide valuable information for a range of cancer research, from aetiology and tumour biology to prognosis and personalized treatment. Current methods usually adopt gene expression data to perform cancer subtype classification. However, cancer samples are scarce, and the high-dimensional features of their gene expression data are too sparse to allow most methods to achieve desirable classification results. RESULTS: In this paper, we propose a deep learning approach by combining a convolutional neural network (CNN) and bidirectional gated recurrent unit (BiGRU): our approach, DCGN, aims to achieve nonlinear dimensionality reduction and learn features to eliminate irrelevant factors in gene expression data. Specifically, DCGN first uses the synthetic minority oversampling technique algorithm to equalize data. The CNN can handle high-dimensional data without stress and extract important local features, and the BiGRU can analyse deep features and retain their important information; the DCGN captures key features by combining both neural networks to overcome the challenges of small sample sizes and sparse, high-dimensional features. In the experiments, we compared the DCGN to seven other cancer subtype classification methods using breast and bladder cancer gene expression datasets. The experimental results show that the DCGN performs better than the other seven methods and can provide more satisfactory classification results.


Assuntos
Aprendizado Profundo , Neoplasias , Algoritmos , Expressão Gênica , Neoplasias/genética , Redes Neurais de Computação
12.
Proc Natl Acad Sci U S A ; 116(39): 19258-19263, 2019 Sep 24.
Artigo em Inglês | MEDLINE | ID: mdl-31501328

RESUMO

Ultrafast control of magnetic order by light provides a promising realization for spintronic devices beyond Moore's Law and has stimulated intense research interest in recent years. Yet, despite 2 decades of debates, the key question of how the spin angular momentum flows on the femtosecond timescale remains open. The lack of direct first-principle methods and pictures for such process exacerbates the issue. Here, we unravel the laser-induced demagnetization mechanism of ferromagnetic semiconductor GaMnAs, using an efficient time-dependent density functional theory approach that enables the direct real-time snapshot of the demagnetization process. Our results show a clear spin-transfer trajectory from the localized Mn-d electrons to itinerant carriers within 20 fs, illustrating the dominant role of [Formula: see text] interaction. We find that the total spin of localized electrons and itinerant carriers is not conserved in the presence of spin-orbit coupling (SOC). Immediately after laser excitation, a growing percentage of spin-angular momentum is quickly transferred to the electron orbital via SOC in about 1 ps, then slowly to the lattice via electron-phonon coupling in a few picoseconds, responsible for the 2-stage process observed experimentally. The spin-relaxation time via SOC is about 300 fs for itinerant carriers and about 700 fs for Mn-d electrons. These results provide a quantum-mechanical microscopic picture for the long-standing questions regarding the channels and timescales of spin transfer, as well as the roles of different interactions underlying the GaMnAs demagnetization process.

13.
BMC Bioinformatics ; 22(1): 577, 2021 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-34856923

RESUMO

BACKGROUND: Structural variations (SVs) occupy a prominent position in human genetic diversity, and deletions form an important type of SV that has been suggested to be associated with genetic diseases. Although various deletion calling methods based on long reads have been proposed, a new approach is still needed to mine features in long-read alignment information. Recently, deep learning has attracted much attention in genome analysis, and it is a promising technique for calling SVs. RESULTS: In this paper, we propose BreakNet, a deep learning method that detects deletions by using long reads. BreakNet first extracts feature matrices from long-read alignments. Second, it uses a time-distributed convolutional neural network (CNN) to integrate and map the feature matrices to feature vectors. Third, BreakNet employs a bidirectional long short-term memory (BLSTM) model to analyse the produced set of continuous feature vectors in both the forward and backward directions. Finally, a classification module determines whether a region refers to a deletion. On real long-read sequencing datasets, we demonstrate that BreakNet outperforms Sniffles, SVIM and cuteSV in terms of their F1 scores. The source code for the proposed method is available from GitHub at https://github.com/luojunwei/BreakNet . CONCLUSIONS: Our work shows that deep learning can be combined with long reads to call deletions more effectively than existing methods.


Assuntos
Aprendizado Profundo , Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA , Software
14.
Angew Chem Int Ed Engl ; 60(17): 9421-9426, 2021 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-33554464

RESUMO

Spin polarisation is found in the centrosymmetric nonferromagnetic crystals, chiral mesostructured NiO films (CMNFs), fabricated through the symmetry-breaking effect of a chiral molecule. Two levels of chirality were identified: primary nanoflakes with atomically twisted crystal lattices and secondary helical stacking of the nanoflakes. Spin polarisation of the CMNFs was confirmed by chirality-dependent magnetic-tip conducting atomic force microscopy (mc-AFM) and magnetic field-independent magnetic circular dichroism (MCD). Electron transfer in the symmetry-breaking electric field was speculated to create chirality-dependent effective magnetic fields. The asymmetric spin-orbit coupling (SOC) generated by effective magnetic fields selectively modifies the opposite spin motion in the antipodal CMNFs. Our findings provide fundamental insights for directional spin control in unprecedented functional inorganic materials.

15.
BMC Bioinformatics ; 21(1): 50, 2020 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-32039691

RESUMO

Following publication of the original article [1], the author reported that there is an error in the original article.

16.
Hum Hered ; 84(1): 34-46, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31466062

RESUMO

In the biomedical field, large amounts of biological and clinical data have been accumulated rapidly, which can be analyzed to emphasize the assessment of at-risk patients and improve diagnosis. However, a major challenge encountered associated with biomedical data analysis is the so-called "curse of dimensionality." For this issue, a novel feature selection method based on an improved binary clonal flower pollination algorithm is proposed to eliminate unnecessary features and ensure a highly accurate classification of disease. The absolute balance group strategy and adaptive Gaussian mutation are adopted, which can increase the diversity of the population and improve the search performance. The KNN classifier is used to evaluate the classification accuracy. Extensive experimental results in six, publicly available, high-dimensional, biomedical datasets show that the proposed method can obtain high classification accuracy and outperforms other state-of-the-art methods.


Assuntos
Algoritmos , Flores/fisiologia , Humanos , Neoplasias/classificação , Neoplasias/genética , Sistema Nervoso , Polinização
17.
BMC Bioinformatics ; 20(1): 539, 2019 Oct 30.
Artigo em Inglês | MEDLINE | ID: mdl-31666010

RESUMO

BACKGROUND: Scaffolding is an important step in genome assembly that orders and orients the contigs produced by assemblers. However, repetitive regions in contigs usually prevent scaffolding from producing accurate results. How to solve the problem of repetitive regions has received a great deal of attention. In the past few years, long reads sequenced by third-generation sequencing technologies (Pacific Biosciences and Oxford Nanopore) have been demonstrated to be useful for sequencing repetitive regions in genomes. Although some stand-alone scaffolding algorithms based on long reads have been presented, scaffolding still requires a new strategy to take full advantage of the characteristics of long reads. RESULTS: Here, we present a new scaffolding algorithm based on long reads and contig classification (SLR). Through the alignment information of long reads and contigs, SLR classifies the contigs into unique contigs and ambiguous contigs for addressing the problem of repetitive regions. Next, SLR uses only unique contigs to produce draft scaffolds. Then, SLR inserts the ambiguous contigs into the draft scaffolds and produces the final scaffolds. We compare SLR to three popular scaffolding tools by using long read datasets sequenced with Pacific Biosciences and Oxford Nanopore technologies. The experimental results show that SLR can produce better results in terms of accuracy and completeness. The open-source code of SLR is available at https://github.com/luojunwei/SLR. CONCLUSION: In this paper, we describes SLR, which is designed to scaffold contigs using long reads. We conclude that SLR can improve the completeness of genome assembly.


Assuntos
Algoritmos , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequências Repetitivas de Ácido Nucleico , Software
18.
Microvasc Res ; 124: 37-42, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-30867134

RESUMO

OBJECTIVE: The association between the shedding of the endothelial glycocalyx (EG) and the pathogenesis of microcirculatory perfusion disturbances has been discussed in experimental studies. This discussion, however, has limited relevance in a clinical setting. We investigated EG shedding in patients undergoing cardiopulmonary bypass (CPB) and its association with alterations in microvascular perfusion. METHODS: The plasma levels of syndecan-1, heparan sulfate, and hyaluronan were used as markers of glycocalyx degradation. Microcirculatory parameters included perfused vessel density (PVD) and De Backer Scores. Sidestream dark field imaging (SDF) was applied to visualize sublingual microcirculation during the preoperative resting state (T0), after sternum splitting, after aortic clamping, 5 min before aortal declamping, 1 h after CPB (T4), 4 h after CPB, 24 h after CPB (T6), and 48 h after CPB. RESULTS: Thirty patients undergoing cardiac surgery were recruited. The plasma levels of glycocalyx degradation markers increased after CPB. This increase indicated severe glycocalyx shedding at T4 relative to that at T0. By T6, the plasma levels of glycocalyx degradation markers had decreased to baseline levels in a stepwise manner. PVD and the De Backer Scores decreased at T4 and recovered at T6. Glycocalyx marker concentrations were correlated with microvascular alterations during cardiac surgery. CONCLUSIONS: Glycocalyx components are closely related to microcirculation perfusion disorders. Damage to the glycocalyx during surgery with CPB may play a key role in microcirculation perfusion dysfunction.


Assuntos
Procedimentos Cirúrgicos Cardíacos/efeitos adversos , Ponte Cardiopulmonar/efeitos adversos , Células Endoteliais/metabolismo , Glicocálix/metabolismo , Microcirculação , Mucosa Bucal/irrigação sanguínea , Idoso , Biomarcadores/sangue , Velocidade do Fluxo Sanguíneo , Células Endoteliais/patologia , Feminino , Glicocálix/patologia , Heparitina Sulfato/sangue , Humanos , Ácido Hialurônico/sangue , Masculino , Pessoa de Meia-Idade , Período Perioperatório , Estudos Prospectivos , Fluxo Sanguíneo Regional , Sindecana-1/sangue , Fatores de Tempo
19.
Nano Lett ; 18(5): 2937-2942, 2018 05 09.
Artigo em Inglês | MEDLINE | ID: mdl-29601201

RESUMO

The atomic structures of self-assembled silicon nanoribbons and magic clusters on Ag(110) substrate have been studied by high-resolution noncontact atomic force microscopy (nc-AFM) and tip-enhanced Raman spectroscopy (TERS). Pentagon-ring structures in Si nanoribbons and clusters have been directly visualized. Moreover, the vibrational fingerprints of individual Si nanoribbon and cluster retrieved by subnanometer resolution TERS confirm the pentagonal nature of both Si nanoribbons and clusters. This work demonstrates that Si pentagon can be an important element in building silicon nanostructures, which may find important applications for future nanoelectronic devices based on silicon.

20.
Bioinformatics ; 33(2): 169-176, 2017 01 15.
Artigo em Inglês | MEDLINE | ID: mdl-27634951

RESUMO

MOTIVATION: While aiming to determine orientations and orders of fragmented contigs, scaffolding is an essential step of assembly pipelines and can make assembly results more complete. Most existing scaffolding tools adopt scaffold graph approaches. However, due to repetitive regions in genome, sequencing errors and uneven sequencing depth, constructing an accurate scaffold graph is still a challenge task. RESULTS: In this paper, we present a novel algorithm (called BOSS), which employs paired reads for scaffolding. To construct a scaffold graph, BOSS utilizes the distribution of insert size to decide whether an edge between two vertices (contigs) should be added and how an edge should be weighed. Moreover, BOSS adopts an iterative strategy to detect spurious edges whose removal can guarantee no contradictions in the scaffold graph. Based on the scaffold graph constructed, BOSS employs a heuristic algorithm to sort vertices (contigs) and then generates scaffolds. The experimental results demonstrate that BOSS produces more satisfactory scaffolds, compared with other popular scaffolding tools on real sequencing data of four genomes. AVAILABILITY AND IMPLEMENTATION: BOSS is publicly available for download at https://github.com/bioinfomaticsCSU/BOSS CONTACT: jxwang@mail.csu.edu.cnSupplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma , Sequências Repetitivas de Ácido Nucleico , Análise de Sequência de DNA/métodos , Software , Algoritmos , Humanos , Plasmodium falciparum/genética , Rhodobacter sphaeroides/genética , Staphylococcus aureus/genética
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa