Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 52(3): e15, 2024 Feb 09.
Artigo em Inglês | MEDLINE | ID: mdl-38084888

RESUMO

Whole genome sequencing has increasingly become the essential method for studying the genetic mechanisms of antimicrobial resistance and for surveillance of drug-resistant bacterial pathogens. The majority of bacterial genomes sequenced to date have been sequenced with Illumina sequencing technology, owing to its high-throughput, excellent sequence accuracy, and low cost. However, because of the short-read nature of the technology, these assemblies are fragmented into large numbers of contigs, hindering the obtaining of full information of the genome. We develop Pasa, a graph-based algorithm that utilizes the pangenome graph and the assembly graph information to improve scaffolding quality. By leveraging the population information of the bacteria species, Pasa is able to utilize the linkage information of the gene families of the species to resolve the contig graph of the assembly. We show that our method outperforms the current state of the arts in terms of accuracy, and at the same time, is computationally efficient to be applied to a large number of existing draft assemblies.


Assuntos
Algoritmos , Bactérias , Genoma Bacteriano , Bactérias/classificação , Bactérias/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos
2.
BMC Bioinformatics ; 25(1): 193, 2024 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-38755527

RESUMO

We have developed AMRViz, a toolkit for analyzing, visualizing, and managing bacterial genomics samples. The toolkit is bundled with the current best practice analysis pipeline allowing researchers to perform comprehensive analysis of a collection of samples directly from raw sequencing data with a single command line. The analysis results in a report showing the genome structure, genome annotations, antibiotic resistance and virulence profile for each sample. The pan-genome of all samples of the collection is analyzed to identify core- and accessory-genes. Phylogenies of the whole genome as well as all gene clusters are also generated. The toolkit provides a web-based visualization dashboard allowing researchers to interactively examine various aspects of the analysis results. Availability: AMRViz is implemented in Python and NodeJS, and is publicly available under open source MIT license at https://github.com/amromics/amrviz .


Assuntos
Genoma Bacteriano , Genômica , Software , Genômica/métodos , Farmacorresistência Bacteriana/genética , Filogenia , Bactérias/genética , Bactérias/efeitos dos fármacos , Antibacterianos/farmacologia
3.
BMC Cancer ; 22(1): 85, 2022 Jan 20.
Artigo em Inglês | MEDLINE | ID: mdl-35057759

RESUMO

BACKGROUND: Circulating cell-free DNA (cfDNA) in the plasma of cancer patients contains cell-free tumour DNA (ctDNA) derived from tumour cells and it has been widely recognized as a non-invasive source of tumour DNA for diagnosis and prognosis of cancer. Molecular profiling of ctDNA is often performed using targeted sequencing or low-coverage whole genome sequencing (WGS) to identify tumour specific somatic mutations or somatic copy number aberrations (sCNAs). However, these approaches cannot efficiently detect all tumour-derived genomic changes in ctDNA. METHODS: We performed WGS analysis of cfDNA from 4 breast cancer patients and 2 patients with benign tumours. We sequenced matched germline DNA for all 6 patients and tumour samples from the breast cancer patients. All samples were sequenced on Illumina HiSeqXTen sequencing platform and achieved approximately 30x, 60x and 100x coverage on germline, tumour and plasma DNA samples, respectively. RESULTS: The mutational burden of the plasma samples (1.44 somatic mutations/Mb of genome) was higher than the matched tumour samples. However, 90% of high confidence somatic cfDNA variants were not detected in matched tumour samples and were found to comprise two background plasma mutational signatures. In contrast, cfDNA from the di-nucleosome fraction (300 bp-350 bp) had much higher proportion (30%) of variants shared with tumour. Despite high coverage sequencing we were unable to detect sCNAs in plasma samples. CONCLUSIONS: Deep sequencing analysis of plasma samples revealed higher fraction of unique somatic mutations in plasma samples, which were not detected in matched tumour samples. Sequencing of di-nucleosome bound cfDNA fragments may increase recovery of tumour mutations from plasma.


Assuntos
Neoplasias da Mama/genética , DNA Tumoral Circulante/sangue , Análise Mutacional de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento Completo do Genoma/métodos , Adulto , Biomarcadores Tumorais/genética , Neoplasias da Mama/sangue , Feminino , Humanos , Mutação , Prognóstico
4.
PLoS Comput Biol ; 17(1): e1008586, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33471816

RESUMO

A streaming assembly pipeline utilising real-time Oxford Nanopore Technology (ONT) sequencing data is important for saving sequencing resources and reducing time-to-result. A previous approach implemented in npScarf provided an efficient streaming algorithm for hybrid assembly but was relatively prone to mis-assemblies compared to other graph-based methods. Here we present npGraph, a streaming hybrid assembly tool using the assembly graph instead of the separated pre-assembly contigs. It is able to produce more complete genome assembly by resolving the path finding problem on the assembly graph using long reads as the traversing guide. Application to synthetic and real data from bacterial isolate genomes show improved accuracy while still maintaining a low computational cost. npGraph also provides a graphical user interface (GUI) which provides a real-time visualisation of the progress of assembly. The tool and source code is available at https://github.com/hsnguyen/assembly.


Assuntos
Nanoporos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Biologia Computacional , DNA Bacteriano/análise , DNA Bacteriano/genética , Genoma Bacteriano/genética , Nanotecnologia , Software , Interface Usuário-Computador
5.
J Antimicrob Chemother ; 74(3): 582-593, 2019 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-30445429

RESUMO

BACKGROUND: Polymyxin B and E (colistin) have been pivotal in the treatment of XDR Gram-negative bacterial infections; however, resistance has emerged. A structurally related lipopeptide, octapeptin C4, has shown significant potency against XDR bacteria, including polymyxin-resistant strains, but its mode of action remains undefined. OBJECTIVES: We sought to compare and contrast the acquisition of resistance in an XDR Klebsiella pneumoniae (ST258) clinical isolate in vitro with all three lipopeptides to potentially unveil variations in their mode of action. METHODS: The isolate was exposed to increasing concentrations of polymyxins and octapeptin C4 over 20 days. Day 20 strains underwent WGS, complementation assays, antimicrobial susceptibility testing and lipid A analysis. RESULTS: Twenty days of exposure to the polymyxins resulted in a 1000-fold increase in the MIC, whereas for octapeptin C4 a 4-fold increase was observed. There was no cross-resistance observed between the polymyxin- and octapeptin-resistant strains. Sequencing of polymyxin-resistant isolates revealed mutations in previously known resistance-associated genes, including crrB, mgrB, pmrB, phoPQ and yciM, along with novel mutations in qseC. Octapeptin C4-resistant isolates had mutations in mlaDF and pqiB, genes related to phospholipid transport. These genetic variations were reflected in distinct phenotypic changes to lipid A. Polymyxin-resistant isolates increased 4-amino-4-deoxyarabinose fortification of lipid A phosphate groups, whereas the lipid A of octapeptin C4-resistant strains harboured a higher abundance of hydroxymyristate and palmitoylate. CONCLUSIONS: Octapeptin C4 has a distinct mode of action compared with the polymyxins, highlighting its potential as a future therapeutic agent to combat the increasing threat of XDR bacteria.


Assuntos
Antibacterianos/farmacologia , Colistina/farmacologia , Farmacorresistência Bacteriana Múltipla , Klebsiella pneumoniae/efeitos dos fármacos , Lipopeptídeos/farmacologia , Peptídeos Cíclicos/farmacologia , Polimixina B/farmacologia , Humanos , Infecções por Klebsiella/microbiologia , Klebsiella pneumoniae/isolamento & purificação , Testes de Sensibilidade Microbiana , Mutação , Sequenciamento Completo do Genoma
6.
Bioinformatics ; 34(5): 873-874, 2018 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-29092025

RESUMO

Motivation: Targeted sequencing using capture probes has become increasingly popular in clinical applications due to its scalability and cost-effectiveness. The approach also allows for higher sequencing coverage of the targeted regions resulting in better analysis statistical power. However, because of the dynamics of the hybridization process, it is difficult to evaluate the efficiency of the probe design prior to the experiments which are time consuming and costly. Results: We developed CapSim, a software package for simulation of targeted sequencing. Given a genome sequence and a set of probes, CapSim simulates the fragmentation, the dynamics of probe hybridization and the sequencing of the captured fragments on Illumina and PacBio sequencing platforms. The simulated data can be used for evaluating the performance of the analysis pipeline, as well as the efficiency of the probe design. Parameters of the various stages in the sequencing process can also be evaluated in order to optimize the experiments. Availability and implementation: CapSim is publicly available under BSD license at https://github.com/Devika1/capsim. Contact: l.coin@imb.uq.edu.au. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Genômica/métodos , Software
7.
BMC Bioinformatics ; 19(1): 267, 2018 07 16.
Artigo em Inglês | MEDLINE | ID: mdl-30012093

RESUMO

BACKGROUND: Tandem repeats comprise significant proportion of the human genome including coding and regulatory regions. They are highly prone to repeat number variation and nucleotide mutation due to their repetitive and unstable nature, making them a major source of genomic variation between individuals. Despite recent advances in high throughput sequencing, analysis of tandem repeats in the context of complex diseases is still hindered by technical limitations. We report a novel targeted sequencing approach, which allows simultaneous analysis of hundreds of repeats. We developed a Bayesian algorithm, namely - GtTR - which combines information from a reference long-read dataset with a short read counting approach to genotype tandem repeats at population scale. PCR sizing analysis was used for validation. RESULTS: We used a PacBio long-read sequenced sample to generate a reference tandem repeat genotype dataset with on average 13% absolute deviation from PCR sizing results. Using this reference dataset GtTR generated estimates of VNTR copy number with accuracy within 95% high posterior density (HPD) intervals of 68 and 83% for capture sequence data and 200X WGS data respectively, improving to 87 and 94% with use of a PCR reference. We show that the genotype resolution increases as a function of depth, such that the median 95% HPD interval lies within 25, 14, 12 and 8% of the its midpoint copy number value for 30X, 200X WGS, 395X and 800X capture sequence data respectively. We validated nine targets by PCR sizing analysis and genotype estimates from sequencing results correlated well with PCR results. CONCLUSIONS: The novel genotyping approach described here presents a new cost-effective method to explore previously unrecognized class of repeat variation in GWAS studies of complex diseases at the population level. Further improvements in accuracy can be obtained by improving accuracy of the reference dataset.


Assuntos
Algoritmos , Dosagem de Genes , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequências de Repetição em Tandem/genética , Alelos , Sequência de Bases , Teorema de Bayes , Simulação por Computador , Genoma Humano , Genótipo , Humanos , Repetições Minissatélites/genética , Sequenciamento Completo do Genoma
8.
BMC Bioinformatics ; 19(1): 261, 2018 07 13.
Artigo em Inglês | MEDLINE | ID: mdl-30001702

RESUMO

BACKGROUND: Detection of genomic inversions remains challenging. Many existing methods primarily target inzversions with a non repetitive breakpoint, leaving inverted repeat (IR) mediated non-allelic homologous recombination (NAHR) inversions largely unexplored. RESULT: We present npInv, a novel tool specifically for detecting and genotyping NAHR inversion using long read sub-alignment of long read sequencing data. We benchmark npInv with other tools in both simulation and real data. We use npInv to generate a whole-genome inversion map for NA12878 consisting of 30 NAHR inversions (of which 15 are novel), including all previously known NAHR mediated inversions in NA12878 with flanking IR less than 7kb. Our genotyping accuracy on this dataset was 94%. We used PCR to confirm the presence of two of these novel inversions. We show that there is a near linear relationship between the length of flanking IR and the minimum inversion size, without inverted repeats. CONCLUSION: The application of npInv shows high accuracy in both simulation and real data. The results give deeper insight into understanding inversion.


Assuntos
Inversão Cromossômica/genética , Genótipo , Humanos
9.
Bioinformatics ; 33(24): 3988-3990, 2017 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-28961965

RESUMO

MOTIVATION: The recent introduction of a barcoding protocol for Oxford Nanopore sequencing has increased the versatility of the technology. Several bioinformatics tools have been developed to demultiplex barcoded reads, but none of them supports streaming analysis. This limits the use of multiplexed sequencing in real-time applications, which is one of the main advantages of the technology. RESULTS: We introduced npBarcode, an open source and cross-platform tool for barcode demultiplexing in streaming fashion that can be used to pipe data to further real-time analyses. The tool also provides a friendly graphical user interface by integrating the module into npReader, making possible to monitor the progress concurrently when the sequencing is still in progress. We show that our algorithm achieves accuracies at least as good as competing tools. AVAILABILITY AND IMPLEMENTATION: npBarcode is bundled in Japsa-a Java tools kit for genome analysis, and is freely available at https://github.com/mdcao/japsa. CONTACT: s.nguyen@uq.edu.au or l.coin@imb.uq.edu.au. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Processamento Eletrônico de Dados , Nanoporos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Reprodutibilidade dos Testes
10.
Mol Biol Evol ; 33(5): 1349-57, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-26912811

RESUMO

Methods for measuring genetic distances in phylogenetics are known to be sensitive to the evolutionary model assumed. However, there is a lack of established methodology to accommodate the trade-off between incorporating sufficient biological reality and avoiding model overfitting. In addition, as traditional methods measure distances based on the observed number of substitutions, their tend to underestimate distances between diverged sequences due to backward and parallel substitutions. Various techniques were proposed to correct this, but they lack the robustness against sequences that are distantly related and of unequal base frequencies. In this article, we present a novel genetic distance estimate based on information theory that overcomes the above two hurdles. Instead of examining the observed number of substitutions, this method estimates genetic distances using Shannon's mutual information. This naturally provides an effective framework for balancing model complexity and goodness of fit. Our distance estimate is shown to be approximately linear to elapsed time and hence is less sensitive to the divergence of sequence data and compositional biased sequences. Using extensive simulation data, we show that our method 1) consistently reconstructs more accurate phylogeny topologies than existing methods, 2) is robust in extreme conditions such as diverged phylogenies, unequal base frequencies data, and heterogeneous mutation patterns, and 3) scales well with large phylogenies.


Assuntos
Evolução Biológica , Modelos Genéticos , Análise de Sequência/métodos , Algoritmos , Composição de Bases , Simulação por Computador , Evolução Molecular , Variação Genética , Teoria da Informação , Filogenia
11.
Brief Bioinform ; 16(2): 193-204, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-24504770

RESUMO

Short tandem repeats are highly polymorphic and associated with a wide range of phenotypic variation, some of which cause neurodegenerative disease in humans. With advances in high-throughput sequencing technologies, there are novel opportunities to study genetic variation. While available sequencing technologies and bioinformatics tools provide options for mining high-throughput sequencing data, their suitability for analysis of repeat variation is an open question, with tools for quantifying variability in repetitive sequence still in their infancy. We present here a comprehensive survey and empirical evaluation of current sequencing technologies and bioinformatics tools in all stages of an analysis pipeline. While there is not one optimal pipeline to suit all circumstances, we find that the choice of alignment and repeat genotyping tools greatly impacts the accuracy and efficiency by which short tandem repeat variation can be detected. We further note that to detect variation relevant to many repeat diseases, it is essential to choose technologies that offer either long read-lengths or paired-end sequencing, coupled with specific genotyping tools.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Repetições de Microssatélites , Biologia Computacional/métodos , Variação Genética , Humanos , Alinhamento de Sequência/estatística & dados numéricos
12.
Bioinformatics ; 32(5): 764-6, 2016 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-26556383

RESUMO

MOTIVATION: The recently released Oxford Nanopore MinION sequencing platform presents many innovative features opening up potential for a range of applications not previously possible. Among these features, the ability to sequence in real-time provides a unique opportunity for many time-critical applications. While many software packages have been developed to analyze its data, there is still a lack of toolkits that support the streaming and real-time analysis of MinION sequencing data. RESULTS: We developed npReader, an open-source software package to facilitate real-time analysis of MinION sequencing data. npReader can simultaneously extract sequence reads and stream them to downstream analysis pipelines while the samples are being sequenced on the MinION device. It provides a command line interface for easy integration into a bioinformatics work flow, as well as a graphical user interface which concurrently displays the statistics of the run. It also provides an application programming interface for development of streaming algorithms in order to fully utilize the extent of nanopore sequencing potential. AVAILABILITY AND IMPLEMENTATION: npReader is written in Java and is freely available at https://github.com/mdcao/npReader CONTACT: m.cao1@uq.edu.au or l.coin@imb.uq.edu.au.


Assuntos
Software , Algoritmos , Biologia Computacional , Nanoporos , Análise de Sequência de DNA
13.
Nucleic Acids Res ; 42(3): e16, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24353318

RESUMO

The advances of high-throughput sequencing offer an unprecedented opportunity to study genetic variation. This is challenged by the difficulty of resolving variant calls in repetitive DNA regions. We present a Bayesian method to estimate repeat-length variation from paired-end sequence read data. The method makes variant calls based on deviations in sequence fragment sizes, allowing the analysis of repeats at lengths of relevance to a range of phenotypes. We demonstrate the method's ability to detect and quantify changes in repeat lengths from short read genomic sequence data across genotypes. We use the method to estimate repeat variation among 12 strains of Arabidopsis thaliana and demonstrate experimentally that our method compares favourably against existing methods. Using this method, we have identified all repeats across the genome, which are likely to be polymorphic. In addition, our predicted polymorphic repeats also included the only known repeat expansion in A. thaliana, suggesting an ability to discover potential unstable repeats.


Assuntos
Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Sequências de Repetição em Tandem , Arabidopsis/genética , Teorema de Bayes , Software
14.
BMC Genomics ; 14: 76, 2013 Feb 02.
Artigo em Inglês | MEDLINE | ID: mdl-23374135

RESUMO

BACKGROUND: Among repetitive genomic sequence, the class of tri-nucleotide repeats has received much attention due to their association with human diseases. Tri-nucleotide repeat diseases are caused by excessive sequence length variability; diseases such as Huntington's disease and Fragile X syndrome are tied to an increase in the number of repeat units in a tract. Motivated by the recent discovery of a tri-nucleotide repeat associated genetic defect in Arabidopsis thaliana, this study takes a cross-species approach to investigating these repeat tracts, with the goal of using commonalities between species to identify potential disease-related properties. RESULTS: We find that statistical enrichment in regulatory function associations for coding region repeats - previously observed in human - is consistent across multiple organisms. By distinguishing between homo-amino acid tracts that are encoded by tri-nucleotide repeats, and those encoded by varying codons, we show that amino acid repeats - not tri-nucleotide repeats - fully explain these regulatory associations. Using this same separation between repeat- and non-repeat-encoded homo-amino acid tracts, we show that poly-glutamine tracts are disproportionately encoded by tri-nucleotide repeats, and those tracts that are encoded by tri-nucleotide repeats are also significantly longer; these results are consistent across multiple species. CONCLUSION: These findings establish similarities in tri-nucleotide repeats across species at the level of protein functionality and protein sequence. The tendency of tri-nucleotide repeats to encode longer poly-glutamine tracts indicates a link with the poly-glutamine repeat diseases. The cross-species nature of this tendency suggests that unknown repeat diseases are yet to be uncovered in other species. Future discoveries of new non-human repeat associated defects may provide the breadth of information needed to unravel the mechanisms that underpin this class of human disease.


Assuntos
Peptídeos/genética , Animais , Arabidopsis/genética , Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Frequência do Gene , Genes , Instabilidade Genômica , Humanos , Camundongos , Fases de Leitura Aberta , Fenótipo , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Saccharomyces cerevisiae/genética , Repetições de Trinucleotídeos
15.
Nat Med ; 28(8): 1619-1629, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35970920

RESUMO

Checkpoint inhibitor (CPI) therapies provide limited benefit to patients with tumors of low immune reactivity. T cell-inducing vaccines hold promise to exert long-lasting disease control in combination with CPI therapy. Safety, tolerability and recommended phase 2 dose (RP2D) of an individualized, heterologous chimpanzee adenovirus (ChAd68) and self-amplifying mRNA (samRNA)-based neoantigen vaccine in combination with nivolumab and ipilimumab were assessed as primary endpoints in an ongoing phase 1/2 study in patients with advanced metastatic solid tumors (NCT03639714). The individualized vaccine regimen was safe and well tolerated, with no dose-limiting toxicities. Treatment-related adverse events (TRAEs) >10% included pyrexia, fatigue, musculoskeletal and injection site pain and diarrhea. Serious TRAEs included one count each of pyrexia, duodenitis, increased transaminases and hyperthyroidism. The RP2D was 1012 viral particles (VP) ChAd68 and 30 µg samRNA. Secondary endpoints included immunogenicity, feasibility of manufacturing and overall survival (OS). Vaccine manufacturing was feasible, with vaccination inducing long-lasting neoantigen-specific CD8 T cell responses. Several patients with microsatellite-stable colorectal cancer (MSS-CRC) had improved OS. Exploratory biomarker analyses showed decreased circulating tumor DNA (ctDNA) in patients with prolonged OS. Although small study size limits statistical and translational analyses, the increased OS observed in MSS-CRC warrants further exploration in larger randomized studies.


Assuntos
Neoplasias Colorretais , Pan troglodytes , Adenoviridae/genética , Animais , Neoplasias Colorretais/tratamento farmacológico , Febre , Humanos , RNA Mensageiro/uso terapêutico
16.
Adv Exp Med Biol ; 696: 657-66, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21431607

RESUMO

A biological compression model, expert model, is presented which is superior to existing compression algorithms in both compression performance and speed. The model is able to compress whole eukaryotic genomes. Most importantly, the model provides a framework for knowledge discovery from biological data. It can be used for repeat element discovery, sequence alignment and phylogenetic analysis. We demonstrate that the model can handle statistically biased sequences and distantly related sequences where conventional knowledge discovery tools often fail.


Assuntos
Algoritmos , Compressão de Dados/estatística & dados numéricos , Biologia Computacional , Sistemas Inteligentes , Genoma Humano , Genômica/estatística & dados numéricos , Humanos , Teoria da Informação , Bases de Conhecimento , Modelos Genéticos , Modelos Estatísticos , Filogenia , Sequências Repetitivas de Ácido Nucleico , Alinhamento de Sequência/estatística & dados numéricos
17.
BMC Bioinformatics ; 11: 599, 2010 Dec 16.
Artigo em Inglês | MEDLINE | ID: mdl-21159205

RESUMO

BACKGROUND: Traditional genome alignment methods consider sequence alignment as a variation of the string edit distance problem, and perform alignment by matching characters of the two sequences. They are often computationally expensive and unable to deal with low information regions. Furthermore, they lack a well-principled objective function to measure the performance of sets of parameters. Since genomic sequences carry genetic information, this article proposes that the information content of each nucleotide in a position should be considered in sequence alignment. An information-theoretic approach for pairwise genome local alignment, namely XMAligner, is presented. Instead of comparing sequences at the character level, XMAligner considers a pair of nucleotides from two sequences to be related if their mutual information in context is significant. The information content of nucleotides in sequences is measured by a lossless compression technique. RESULTS: Experiments on both simulated data and real data show that XMAligner is superior to conventional methods especially on distantly related sequences and statistically biased data. XMAligner can align sequences of eukaryote genome size with only a modest hardware requirement. Importantly, the method has an objective function which can obviate the need to choose parameter values for high quality alignment. The alignment results from XMAligner can be integrated into a visualisation tool for viewing purpose. CONCLUSIONS: The information-theoretic approach for sequence alignment is shown to overcome the mentioned problems of conventional character matching alignment methods. The article shows that, as genomic sequences are meant to carry information, considering the information content of nucleotides is helpful for genomic sequence alignment. AVAILABILITY: Downloadable binaries, documentation and data can be found at ftp://ftp.infotech.monash.edu.au/software/DNAcompress-XM/XMAligner/.


Assuntos
Algoritmos , Compressão de Dados , Alinhamento de Sequência/métodos , Sequência de Bases , Genômica/métodos , Modelos Teóricos , Software
18.
Nat Genet ; 52(11): 1256-1264, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-33128049

RESUMO

Despite advances in sequencing technologies, assembly of complex plant genomes remains elusive due to polyploidy and high repeat content. Here we report PolyGembler for grouping and ordering contigs into pseudomolecules by genetic linkage analysis. Our approach also provides an accurate method with which to detect and fix assembly errors. Using simulated data, we demonstrate that our approach is of high accuracy and outperforms three existing state-of-the-art genetic mapping tools. Particularly, our approach is more robust to the presence of missing genotype data and genotyping errors. We used our method to construct pseudomolecules for allotetraploid lawn grass utilizing PacBio long reads in combination with restriction site-associated DNA sequencing, and for diploid Ipomoea trifida and autotetraploid potato utilizing contigs assembled from Illumina reads in combination with genotype data generated by single-nucleotide polymorphism arrays and genotyping by sequencing, respectively. We resolved 13 assembly errors for a published I. trifida genome assembly and anchored eight unplaced scaffolds in the published potato genome.


Assuntos
Algoritmos , Cromossomos de Plantas , Ligação Genética , Genoma de Planta , Poliploidia , Simulação por Computador , Genótipo , Ipomoea/genética , Melhoramento Vegetal , Poaceae/genética , Análise Serial de Proteínas , Solanum tuberosum/genética
19.
Lab Chip ; 19(24): 4083-4092, 2019 12 21.
Artigo em Inglês | MEDLINE | ID: mdl-31712799

RESUMO

Phage display methodologies offer a versatile platform for the isolation of single-chain Fv (scFv) molecules which may be rebuilt into monoclonal antibodies. Herein, we report on a complete workflow termed PhageXpress, for rapid selection of single-chain Fv sequences by leveraging electrohydrodynamic-manipulation of a solution containing phage library particles to enhance target binding whilst minimizing non-specific interactions. Our PhageXpress technique is combined with Oxford Nanopore Technologies' MinION sequencer and custom bioinformatics to achieve high-throughput screening of phage libraries. We performed 4 rounds of biopanning against Dengue virus (DENV) non-structural protein 1 (NS1) using traditional methods (4 week turnaround), which resulted in the isolation of 19 unique scFv clones. We validated the feasibility and efficiency of the PhageXpress method utilizing the same phage library and antigen target. Notably, we successfully mapped 14 of the 19 anti-NS1 scFv sequences (∼74%) with our new method, despite using ∼30-fold less particles during screening and conducting only a single round of biopanning. We believe this approach supersedes traditional methods for the discovery of bio-recognition molecules such as antibodies by speeding up the process for the development of therapeutic and diagnostic biologics.


Assuntos
Anticorpos Antivirais , Sequenciamento por Nanoporos , Biblioteca de Peptídeos , Anticorpos de Cadeia Única , Anticorpos Antivirais/química , Anticorpos Antivirais/genética , Vírus da Dengue/química , Humanos , Anticorpos de Cadeia Única/química , Anticorpos de Cadeia Única/genética , Proteínas não Estruturais Virais/química
20.
Sci Rep ; 8(1): 16616, 2018 11 09.
Artigo em Inglês | MEDLINE | ID: mdl-30413723

RESUMO

The majority of human chromosome ends remain incompletely assembled due to their highly repetitive structure. In this study, we use BioNano data to anchor and extend chromosome ends from two European trios as well as two unrelated Asian genomes. At least 11 BioNano assembled chromosome ends are structurally divergent from the reference genome, including both missing sequence and extensions. These extensions are heritable and in some cases divergent between Asian and European samples. Six out of nine predicted extension sequences from NA12878 can be confirmed and filled by nanopore data. We identify two multi-kilobase sequence families both enriched more than 100-fold in extension sequence (p-values < 1e-5) whose origins can be traced to interstitial sequence on ancestral primate chromosome 7. Extensive sub-telomeric duplication of these families has occurred in the human lineage subsequent to divergence from chimpanzees.


Assuntos
Biotecnologia/métodos , Cromossomos Humanos , Genômica/métodos , Nanoporos , Telômero/genética , Bases de Dados Factuais , Humanos , Padrões de Referência
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa