Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 52
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 49(17): e102, 2021 09 27.
Artigo em Inglês | MEDLINE | ID: mdl-34214168

RESUMO

Rapidly evolving RNA viruses continuously produce minority haplotypes that can become dominant if they are drug-resistant or can better evade the immune system. Therefore, early detection and identification of minority viral haplotypes may help to promptly adjust the patient's treatment plan preventing potential disease complications. Minority haplotypes can be identified using next-generation sequencing, but sequencing noise hinders accurate identification. The elimination of sequencing noise is a non-trivial task that still remains open. Here we propose CliqueSNV based on extracting pairs of statistically linked mutations from noisy reads. This effectively reduces sequencing noise and enables identifying minority haplotypes with the frequency below the sequencing error rate. We comparatively assess the performance of CliqueSNV using an in vitro mixture of nine haplotypes that were derived from the mutation profile of an existing HIV patient. We show that CliqueSNV can accurately assemble viral haplotypes with frequencies as low as 0.1% and maintains consistent performance across short and long bases sequencing platforms.


Assuntos
Algoritmos , Biologia Computacional/métodos , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Infecções por Vírus de RNA/diagnóstico , Vírus de RNA/genética , COVID-19/diagnóstico , COVID-19/virologia , Frequência do Gene , Infecções por HIV/diagnóstico , Infecções por HIV/virologia , HIV-1/genética , Humanos , Mutação , Polimorfismo de Nucleotídeo Único , Infecções por Vírus de RNA/virologia , Reprodutibilidade dos Testes , SARS-CoV-2/genética , Sensibilidade e Especificidade
2.
PLoS Comput Biol ; 16(11): e1008454, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-33253159

RESUMO

One of the hallmarks of cancer is the extremely high mutability and genetic instability of tumor cells. Inherent heterogeneity of intra-tumor populations manifests itself in high variability of clone instability rates. Analogously to fitness landscapes, the instability rates of clonal populations form their mutability landscapes. Here, we present MULAN (MUtability LANdscape inference), a maximum-likelihood computational framework for inference of mutation rates of individual cancer subclones using single-cell sequencing data. It utilizes the partial information about the orders of mutation events provided by cancer mutation trees and extends it by inferring full evolutionary history and mutability landscape of a tumor. Evaluation of mutation rates on the level of subclones rather than individual genes allows to capture the effects of genomic interactions and epistasis. We estimate the accuracy of our approach and demonstrate that it can be used to study the evolution of genetic instability and infer tumor evolutionary history from experimental data. MULAN is available at https://github.com/compbel/MULAN.


Assuntos
Mutação , Neoplasias/genética , Neoplasias/patologia , Análise de Célula Única/métodos , Algoritmos , Instabilidade Genômica , Humanos
3.
BMC Genomics ; 21(Suppl 6): 405, 2020 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-33349236

RESUMO

BACKGROUND: Analysis of heterogeneous populations such as viral quasispecies is one of the most challenging bioinformatics problems. Although machine learning models are becoming to be widely employed for analysis of sequence data from such populations, their straightforward application is impeded by multiple challenges associated with technological limitations and biases, difficulty of selection of relevant features and need to compare genomic datasets of different sizes and structures. RESULTS: We propose a novel preprocessing approach to transform irregular genomic data into normalized image data. Such representation allows to restate the problems of classification and comparison of heterogeneous populations as image classification problems which can be solved using variety of available machine learning tools. We then apply the proposed approach to two important problems in molecular epidemiology: inference of viral infection stage and detection of viral transmission clusters using next-generation sequencing data. The infection staging method has been applied to HCV HVR1 samples collected from 108 recently and 257 chronically infected individuals. The SVM-based image classification approach achieved more than 95% accuracy for both recently and chronically HCV-infected individuals. Clustering has been performed on the data collected from 33 epidemiologically curated outbreaks, yielding more than 97% accuracy. CONCLUSIONS: Sequence image normalization method allows for a robust conversion of genomic data into numerical data and overcomes several issues associated with employing machine learning methods to viral populations. Image data also help in the visualization of genomic data. Experimental results demonstrate that the proposed method can be successfully applied to different problems in molecular epidemiology and surveillance of viral diseases. Simple binary classifiers and clustering techniques applied to the image data are equally or more accurate than other models.


Assuntos
Genômica , Aprendizado de Máquina , Algoritmos , Análise por Conglomerados , Biologia Computacional , Humanos , Quase-Espécies
4.
BMC Genomics ; 21(Suppl 5): 582, 2020 Dec 16.
Artigo em Inglês | MEDLINE | ID: mdl-33327932

RESUMO

BACKGROUND: RNA viruses mutate at extremely high rates, forming an intra-host viral population of closely related variants, which allows them to evade the host's immune system and makes them particularly dangerous. Viral outbreaks pose a significant threat for public health, and, in order to deal with it, it is critical to infer transmission clusters, i.e., decide whether two viral samples belong to the same outbreak. Next-generation sequencing (NGS) can significantly help in tackling outbreak-related problems. While NGS data is first obtained as short reads, existing methods rely on assembled sequences. This requires reconstruction of the entire viral population, which is complicated, error-prone and time-consuming. RESULTS: The experimental validation using sequencing data from HCV outbreaks shows that the proposed algorithm can successfully identify genetic relatedness between viral populations, infer transmission direction, transmission clusters and outbreak sources, as well as decide whether the source is present in the sequenced outbreak sample and identify it. CONCLUSIONS: Introduced algorithm allows to cluster genetically related samples, infer transmission directions and predict sources of outbreaks. Validation on experimental data demonstrated that algorithm is able to reconstruct various transmission characteristics. Advantage of the method is the ability to bypass cumbersome read assembly, thus eliminating the chance to introduce new errors, and saving processing time by allowing to use raw NGS reads.


Assuntos
Hepacivirus , Vírus de RNA , Algoritmos , Surtos de Doenças , Hepacivirus/genética , Sequenciamento de Nucleotídeos em Larga Escala
6.
Bioinformatics ; 35(14): i398-i407, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31510696

RESUMO

SUMMARY: Intra-tumor heterogeneity is one of the major factors influencing cancer progression and treatment outcome. However, evolutionary dynamics of cancer clone populations remain poorly understood. Quantification of clonal selection and inference of fitness landscapes of tumors is a key step to understanding evolutionary mechanisms driving cancer. These problems could be addressed using single-cell sequencing (scSeq), which provides an unprecedented insight into intra-tumor heterogeneity allowing to study and quantify selective advantages of individual clones. Here, we present Single Cell Inference of FItness Landscape (SCIFIL), a computational tool for inference of fitness landscapes of heterogeneous cancer clone populations from scSeq data. SCIFIL allows to estimate maximum likelihood fitnesses of clone variants, measure their selective advantages and order of appearance by fitting an evolutionary model into the tumor phylogeny. We demonstrate the accuracy our approach, and show how it could be applied to experimental tumor data to study clonal selection and infer evolutionary history. SCIFIL can be used to provide new insight into the evolutionary dynamics of cancer. AVAILABILITY AND IMPLEMENTATION: Its source code is available at https://github.com/compbel/SCIFIL.


Assuntos
Células Clonais , Neoplasias , Software , Humanos , Filogenia , Análise de Sequência de DNA
7.
Infect Immun ; 87(8)2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31085705

RESUMO

Lyme disease (LD), the most prevalent vector-borne illness in the United States and Europe, is caused by Borreliella burgdorferi No vaccine is available for humans. Dogmatically, B. burgdorferi can establish a persistent infection in the mammalian host (e.g., mice) due to a surface antigen, VlsE. This antigenically variable protein allows the spirochete to continually evade borreliacidal antibodies. However, our recent study has shown that the B. burgdorferi spirochete is effectively cleared by anti-B. burgdorferi antibodies of New Zealand White rabbits, despite the surface expression of VlsE. Besides homologous protection, the rabbit antibodies also cross-protect against heterologous B. burgdorferi spirochetes and significantly reduce the pathology of LD arthritis in persistently infected mice. Thus, this finding that NZW rabbits develop a unique repertoire of very potent antibodies targeting the protective surface epitopes, despite abundant VlsE, prompted us to identify the specificities of the protective rabbit antibodies and their respective targets. By applying subtractive reverse vaccinology, which involved the use of random peptide phage display libraries coupled with next-generation sequencing and our computational algorithms, repertoires of nonprotective (early) and protective (late) rabbit antibodies were identified and directly compared. Consequently, putative surface epitopes that are unique to the protective rabbit sera were mapped. Importantly, the relevance of newly identified protection-associated epitopes for their surface exposure has been strongly supported by prior empirical studies. This study is significant because it now allows us to systematically test the putative epitopes for their protective efficacy with an ultimate goal of selecting the most efficacious targets for development of a long-awaited LD vaccine.


Assuntos
Anticorpos Antibacterianos/imunologia , Vacinas Bacterianas/imunologia , Borrelia burgdorferi/imunologia , Epitopos , Animais , Antígenos de Bactérias/imunologia , Proteínas da Membrana Bacteriana Externa/imunologia , Proteínas de Bactérias/imunologia , Lipoproteínas/imunologia , Masculino , Camundongos , Camundongos Endogâmicos C3H , Coelhos , Vacinas de Subunidades Antigênicas/imunologia
8.
Infect Immun ; 87(7)2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-30988058

RESUMO

Borrelia burgdorferi is a tick-borne bacterium responsible for approximately 300,000 annual cases of Lyme disease (LD) in the United States, with increasing incidences in other parts of the world. The debilitating nature of LD is mainly attributed to the ability of B. burgdorferi to persist in patients for many years despite strong anti-Borrelia antibody responses. Antimicrobial treatment of persistent infection is challenging. Similar to infection of humans, B. burgdorferi establishes long-term infection in various experimental animal models except for New Zealand White (NZW) rabbits, which clear the spirochete within 4 to 12 weeks. LD spirochetes have a highly evolved antigenic variation vls system, on the lp28-1 plasmid, where gene conversion results in surface expression of the antigenically variable VlsE protein. VlsE is required for B. burgdorferi to establish persistent infection by continually evading otherwise potent antibodies. Since the clearance of B. burgdorferi is mediated by humoral immunity in NZW rabbits, the previously reported results that LD spirochetes lose lp28-1 during rabbit infection could potentially explain the failure of B. burgdorferi to persist. However, the present study unequivocally disproves that previous finding by demonstrating that LD spirochetes retain the vls system. However, despite the vls system being fully functional, the spirochete fails to evade anti-Borrelia antibodies of NZW rabbits. In addition to being protective against homologous and heterologous challenges, the rabbit antibodies significantly ameliorate LD-induced arthritis in persistently infected mice. Overall, the current data indicate that NZW rabbits develop a protective antibody repertoire, whose specificities, once defined, will identify potential candidates for a much-anticipated LD vaccine.


Assuntos
Variação Antigênica/fisiologia , Antígenos de Bactérias/imunologia , Borrelia burgdorferi/genética , Doença de Lyme/imunologia , Doença de Lyme/microbiologia , Animais , Anticorpos Antibacterianos/imunologia , Proteínas de Bactérias/genética , Lipoproteínas/genética , Plasmídeos , Coelhos
9.
Bioinformatics ; 34(15): 2530-2537, 2018 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-29547882

RESUMO

Summary: Genomic sequences are assembled into a variable, but large number of contigs that should be scaffolded (ordered and oriented) for facilitating comparative or functional analysis. Finding scaffolding is computationally challenging due to misassemblies, inconsistent coverage across the genome and long repeats. An accurate assessment of scaffolding tools should take into account multiple locations of the same contig on the reference scaffolding rather than matching a repeat to a single best location. This makes mapping of inferred scaffoldings onto the reference a computationally challenging problem. This paper formulates the repeat-aware scaffolding evaluation problem, which is to find a mapping of the inferred scaffolding onto the reference maximizing number of correct links and proposes a scalable algorithm capable of handling large whole-genome datasets. Our novel scaffolding validation framework has been applied to assess the most of state-of-the-art scaffolding tools on the representative subset of Genome Assembly Golden-Standard Evaluations (GAGE) datasets and some novel simulated datasets. Availability and implementation: The source code of this evaluation framework is available at https://github.com/mandricigor/repeat-aware. The documentation is hosted at https://mandricigor.github.io/repeat-aware. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Genoma , Sequências Repetitivas de Ácido Nucleico , Análise de Sequência de DNA/métodos , Software , Algoritmos , Bactérias/genética , Eucariotos/genética , Genômica/métodos , Humanos
10.
Bioinformatics ; 34(1): 163-170, 2018 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-29304222

RESUMO

Motivation: Genomic analysis has become one of the major tools for disease outbreak investigations. However, existing computational frameworks for inference of transmission history from viral genomic data often do not consider intra-host diversity of pathogens and heavily rely on additional epidemiological data, such as sampling times and exposure intervals. This impedes genomic analysis of outbreaks of highly mutable viruses associated with chronic infections, such as human immunodeficiency virus and hepatitis C virus, whose transmissions are often carried out through minor intra-host variants, while the additional epidemiological information often is either unavailable or has a limited use. Results: The proposed framework QUasispecies Evolution, Network-based Transmission INference (QUENTIN) addresses the above challenges by evolutionary analysis of intra-host viral populations sampled by deep sequencing and Bayesian inference using general properties of social networks relevant to infection dissemination. This method allows inference of transmission direction even without the supporting case-specific epidemiological information, identify transmission clusters and reconstruct transmission history. QUENTIN was validated on experimental and simulated data, and applied to investigate HCV transmission within a community of hosts with high-risk behavior. It is available at https://github.com/skumsp/QUENTIN. Contact: pskums@gsu.edu or alexz@cs.gsu.edu or rahul@sfsu.edu or yek0@cdc.gov. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma Viral , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Quase-Espécies , Análise de Sequência de RNA/métodos , Software , Teorema de Bayes , Surtos de Doenças , Genômica/métodos , Hepacivirus/genética , Humanos , Análise de Sequência de DNA/métodos
11.
BMC Bioinformatics ; 19(Suppl 11): 360, 2018 Oct 22.
Artigo em Inglês | MEDLINE | ID: mdl-30343669

RESUMO

BACKGROUND: Many biological analysis tasks require extraction of families of genetically similar sequences from large datasets produced by Next-generation Sequencing (NGS). Such tasks include detection of viral transmissions by analysis of all genetically close pairs of sequences from viral datasets sampled from infected individuals or studying of evolution of viruses or immune repertoires by analysis of network of intra-host viral variants or antibody clonotypes formed by genetically close sequences. The most obvious naïeve algorithms to extract such sequence families are impractical in light of the massive size of modern NGS datasets. RESULTS: In this paper, we present fast and scalable k-mer-based framework to perform such sequence similarity queries efficiently, which specifically targets data produced by deep sequencing of heterogeneous populations such as viruses. It shows better filtering quality and time performance when comparing to other tools. The tool is freely available for download at https://github.com/vyacheslav-tsivina/signature-sj CONCLUSION: The proposed tool allows for efficient detection of genetic relatedness between genomic samples produced by deep sequencing of heterogeneous populations. It should be especially useful for analysis of relatedness of genomes of viruses with unevenly distributed variable genomic regions, such as HIV and HCV. For the future we envision, that besides applications in molecular epidemiology the tool can also be adapted to immunosequencing and metagenomics data.


Assuntos
Algoritmos , Variação Genética , Genoma , Filogenia , Sequência de Bases , Entropia , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Metagenômica , Reprodutibilidade dos Testes , Fatores de Tempo
12.
Infect Immun ; 86(8)2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-29866906

RESUMO

The tick-borne pathogen Borrelia burgdorferi is responsible for approximately 300,000 Lyme disease (LD) cases per year in the United States. Recent increases in the number of LD cases, in addition to the spread of the tick vector and a lack of a vaccine, highlight an urgent need for designing and developing an efficacious LD vaccine. Identification of protective epitopes that could be used to develop a second-generation (subunit) vaccine is therefore imperative. Despite the antigenicity of several lipoproteins and integral outer membrane proteins (OMPs) on the B. burgdorferi surface, the spirochetes successfully evade antibodies primarily due to the VlsE-mediated antigenic variation. VlsE is thought to sterically block antibody access to protective epitopes of B. burgdorferi However, it is highly unlikely that VlsE shields the entire surface epitome. Thus, identification of subdominant epitope targets that induce protection when they are made dominant is necessary to generate an efficacious vaccine. Toward the identification, we repeatedly immunized immunocompetent mice with live-attenuated VlsE-deleted B. burgdorferi and then challenged the animals with the VlsE-expressing (host-adapted) wild type. Passive immunization and Western blotting data suggested that the protection of 50% of repeatedly immunized animals against the highly immune-evasive B. burgdorferi was antibody mediated. Comparison of serum antibody repertoires identified in protected and nonprotected animals permitted the identification of several putative epitopes significantly associated with the protection. Most linear putative epitopes were conserved between the main pathogenic Borrelia genospecies and found within known subdominant regions of OMPs. Currently, we are performing immunization studies to test whether the identified protection-associated epitopes are protective for mice.


Assuntos
Anticorpos Antibacterianos/sangue , Antígenos de Bactérias/metabolismo , Proteínas da Membrana Bacteriana Externa/imunologia , Proteínas de Bactérias/metabolismo , Vacinas Bacterianas/imunologia , Borrelia burgdorferi/imunologia , Epitopos/imunologia , Lipoproteínas/metabolismo , Doença de Lyme/imunologia , Animais , Vacinas Bacterianas/administração & dosagem , Western Blotting , Modelos Animais de Doenças , Mapeamento de Epitopos , Imunização Passiva , Lipoproteínas/deficiência , Doença de Lyme/prevenção & controle , Masculino , Camundongos , Camundongos Endogâmicos C3H , Camundongos SCID , Vacinas Atenuadas/administração & dosagem , Vacinas Atenuadas/imunologia
13.
Bioinformatics ; 33(20): 3302-3304, 2017 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-28605502

RESUMO

SUMMARY: This note presents IsoEM2 and IsoDE2, new versions with enhanced features and faster runtime of the IsoEM and IsoDE packages for expression level estimation and differential expression. IsoEM2 estimates fragments per kilobase million (FPKM) and transcript per million (TPM) levels for genes and isoforms with confidence intervals through bootstrapping, while IsoDE2 performs differential expression analysis using the bootstrap samples generated by IsoEM2. Both tools are available with a command line interface as well as a graphical user interface (GUI) through wrappers for the Galaxy platform. AVAILABILITY AND IMPLEMENTATION: The source code of this software suite is available at https://github.com/mandricigor/isoem2. The Galaxy wrappers are available at https://toolshed.g2.bx.psu.edu/view/saharlcc/isoem2_isode2/. CONTACT: imandric1@student.gsu.edu or ion@engr.uconn.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Intervalos de Confiança , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Software
14.
BMC Bioinformatics ; 18(Suppl 8): 244, 2017 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-28617221

RESUMO

BACKGROUND: For fighting cancer, earlier detection is crucial. Circulating auto-antibodies produced by the patient's own immune system after exposure to cancer proteins are promising bio-markers for the early detection of cancer. Since an antibody recognizes not the whole antigen but 4-7 critical amino acids within the antigenic determinant (epitope), the whole proteome can be represented by a random peptide phage display library. This opens the possibility to develop an early cancer detection test based on a set of peptide sequences identified by comparing cancer patients' and healthy donors' global peptide profiles of antibody specificities. RESULTS: Due to the enormously large number of peptide sequences contained in global peptide profiles generated by next generation sequencing, the large number of cancer and control sera is required to identify cancer-specific peptides with high degree of statistical significance. To decrease the number of peptides in profiles generated by nextgen sequencing without losing cancer-specific sequences we used for generation of profiles the phage library enriched by panning on the pool of cancer sera. To further decrease the complexity of profiles we used computational methods for transforming a list of peptides constituting the mimotope profiles to the list motifs formed by similar peptide sequences. CONCLUSION: We have shown that the amino-acid order is meaningful in mimotope motifs since they contain significantly more peptides than motifs among peptides where amino-acids are randomly permuted. Also the single sample motifs significantly differ from motifs in peptides drawn from multiple samples. Finally, multiple cancer-specific motifs have been identified.


Assuntos
Autoanticorpos , Biomarcadores Tumorais/sangue , Epitopos , Neoplasias , Autoanticorpos/química , Autoanticorpos/imunologia , Biologia Computacional , Detecção Precoce de Câncer , Epitopos/química , Epitopos/imunologia , Humanos , Neoplasias/sangue , Neoplasias/química , Neoplasias/diagnóstico , Neoplasias/imunologia , Biblioteca de Peptídeos
15.
Infect Immun ; 85(1)2017 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-27799330

RESUMO

Lyme disease (LD), the most prevalent tick-borne illness in North America, is caused by Borrelia burgdorferi The long-term survival of B. burgdorferi spirochetes in the mammalian host is achieved though VlsE-mediated antigenic variation. It is mathematically predicted that a highly variable surface antigen prolongs bacterial infection sufficiently to exhaust the immune response directed toward invariant surface antigens. If the prediction is correct, it is expected that the antibody response to B. burgdorferi invariant antigens will become nonprotective as B. burgdorferi infection progresses. To test this assumption, changes in the protective efficacy of the immune response to B. burgdorferi surface antigens were monitored via a superinfection model over the course of 70 days. B. burgdorferi-infected mice were subjected to secondary challenge by heterologous B. burgdorferi at different time points postinfection (p.i.). When the infected mice were superinfected with a VlsE-deficient clone (ΔVlsE) at day 28 p.i., the active anti-B. burgdorferi immune response did not prevent ΔVlsE-induced spirochetemia. In contrast, most mice blocked culture-detectable spirochetemia induced by wild-type B. burgdorferi (WT), indicating that VlsE was likely the primary target of the antibody response. As the B. burgdorferi infection further progressed, however, reversed outcomes were observed. At day 70 p.i. the host immune response to non-VlsE antigens became sufficiently potent to clear spirochetemia induced by ΔVlsE and yet failed to prevent WT-induced spirochetemia. To test if any significant changes in the anti-B. burgdorferi antibody repertoire accounted for the observed outcomes, global profiles of antibody specificities were determined. However, comparison of mimotopes revealed no major difference between day 28 and day 70 antibody repertoires.


Assuntos
Anticorpos Antibacterianos/imunologia , Formação de Anticorpos/imunologia , Antígenos de Bactérias/imunologia , Proteínas de Bactérias/imunologia , Evasão da Resposta Imune/imunologia , Lipoproteínas/imunologia , Doença de Lyme/imunologia , Doença de Lyme/microbiologia , Spirochaetales/imunologia , Animais , Variação Antigênica/imunologia , Antígenos de Superfície/imunologia , Borrelia burgdorferi/imunologia , Masculino , Camundongos , Camundongos Endogâmicos BALB C , Camundongos Endogâmicos C3H , América do Norte
16.
BMC Genomics ; 18(Suppl 10): 918, 2017 Dec 06.
Artigo em Inglês | MEDLINE | ID: mdl-29244009

RESUMO

BACKGROUND: RNA viruses such as HCV and HIV mutate at extremely high rates, and as a result, they exist in infected hosts as populations of genetically related variants. Recent advances in sequencing technologies make possible to identify such populations at great depth. In particular, these technologies provide new opportunities for inference of relatedness between viral samples, identification of transmission clusters and sources of infection, which are crucial tasks for viral outbreaks investigations. RESULTS: We present (i) an evolutionary simulation algorithm Viral Outbreak InferenCE (VOICE) inferring genetic relatedness, (ii) an algorithm MinDistB detecting possible transmission using minimal distances between intra-host viral populations and sizes of their relative borders, and (iii) a non-parametric recursive clustering algorithm Relatedness Depth (ReD) analyzing clusters' structure to infer possible transmissions and their directions. All proposed algorithms were validated using real sequencing data from HCV outbreaks. CONCLUSIONS: All algorithms are applicable to the analysis of outbreaks of highly heterogeneous RNA viruses. Our experimental validation shows that they can successfully identify genetic relatedness between viral populations, as well as infer transmission clusters and outbreak sources.


Assuntos
Biologia Computacional , Hepacivirus/genética , Filogenia , Quase-Espécies/genética , Análise de Sequência de RNA , Algoritmos , Análise por Conglomerados , Genoma Viral/genética , RNA Viral/genética
17.
Bioinformatics ; 31(16): 2632-8, 2015 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-25890305

RESUMO

MOTIVATION: Next-generation high-throughput sequencing has become a state-of-the-art technique in genome assembly. Scaffolding is one of the main stages of the assembly pipeline. During this stage, contigs assembled from the paired-end reads are merged into bigger chains called scaffolds. Because of a high level of statistical noise, chimeric reads, and genome repeats the problem of scaffolding is a challenging task. Current scaffolding software packages widely vary in their quality and are highly dependent on the read data quality and genome complexity. There are no clear winners and multiple opportunities for further improvements of the tools still exist. RESULTS: This article presents an efficient scaffolding algorithm ScaffMatch that is able to handle reads with both short (<600 bp) and long (>35 000 bp) insert sizes producing high-quality scaffolds. We evaluate our scaffolding tool with the F score and other metrics (N50, corrected N50) on eight datasets comparing it with the most available packages. Our experiments show that ScaffMatch is the tool of preference for the most datasets. AVAILABILITY AND IMPLEMENTATION: The source code is available at http://alan.cs.gsu.edu/NGS/?q=content/scaffmatch. CONTACT: mandric@cs.gsu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Mapeamento de Sequências Contíguas/métodos , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Peso Corporal , Humanos , Plasmodium falciparum/genética , Rhodobacter sphaeroides/genética , Staphylococcus aureus/genética
18.
Bioinformatics ; 31(5): 682-90, 2015 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-25359889

RESUMO

MOTIVATION: Next-generation sequencing (NGS) allows for analyzing a large number of viral sequences from infected patients, providing an opportunity to implement large-scale molecular surveillance of viral diseases. However, despite improvements in technology, traditional protocols for NGS of large numbers of samples are still highly cost and labor intensive. One of the possible cost-effective alternatives is combinatorial pooling. Although a number of pooling strategies for consensus sequencing of DNA samples and detection of SNPs have been proposed, these strategies cannot be applied to sequencing of highly heterogeneous viral populations. RESULTS: We developed a cost-effective and reliable protocol for sequencing of viral samples, that combines NGS using barcoding and combinatorial pooling and a computational framework including algorithms for optimal virus-specific pools design and deconvolution of individual samples from sequenced pools. Evaluation of the framework on experimental and simulated data for hepatitis C virus showed that it substantially reduces the sequencing costs and allows deconvolution of viral populations with a high accuracy. AVAILABILITY AND IMPLEMENTATION: The source code and experimental data sets are available at http://alan.cs.gsu.edu/NGS/?q=content/pooling.


Assuntos
Algoritmos , Biologia Computacional/métodos , DNA Viral/genética , Genoma Viral , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Vírus/classificação , Vírus/genética , Variação Genética , Hepacivirus/classificação , Hepacivirus/genética , Humanos
19.
Bioinformatics ; 30(12): i329-37, 2014 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-24932001

RESUMO

MOTIVATION: Next-generation sequencing technologies sequence viruses with ultra-deep coverage, thus promising to revolutionize our understanding of the underlying diversity of viral populations. While the sequencing coverage is high enough that even rare viral variants are sequenced, the presence of sequencing errors makes it difficult to distinguish between rare variants and sequencing errors. RESULTS: In this article, we present a method to overcome the limitations of sequencing technologies and assemble a diverse viral population that allows for the detection of previously undiscovered rare variants. The proposed method consists of a high-fidelity sequencing protocol and an accurate viral population assembly method, referred to as Viral Genome Assembler (VGA). The proposed protocol is able to eliminate sequencing errors by using individual barcodes attached to the sequencing fragments. Highly accurate data in combination with deep coverage allow VGA to assemble rare variants. VGA uses an expectation-maximization algorithm to estimate abundances of the assembled viral variants in the population. RESULTS on both synthetic and real datasets show that our method is able to accurately assemble an HIV viral population and detect rare variants previously undetectable due to sequencing errors. VGA outperforms state-of-the-art methods for genome-wide viral assembly. Furthermore, our method is the first viral assembly method that scales to millions of sequencing reads. AVAILABILITY: Our tool VGA is freely available at http://genetics.cs.ucla.edu/vga/


Assuntos
Algoritmos , Genoma Viral , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genômica/métodos , HIV/genética , Análise de Sequência de DNA , Software
20.
BMC Bioinformatics ; 15 Suppl 9: S9, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25253180

RESUMO

BACKGROUND: Interest in de novo genome assembly has been renewed in the past decade due to rapid advances in high-throughput sequencing (HTS) technologies which generate relatively short reads resulting in highly fragmented assemblies consisting of contigs. Additional long-range linkage information is typically used to orient, order, and link contigs into larger structures referred to as scaffolds. Due to library preparation artifacts and erroneous mapping of reads originating from repeats, scaffolding remains a challenging problem. In this paper, we provide a scalable scaffolding algorithm (SILP2) employing a maximum likelihood model capturing read mapping uncertainty and/or non-uniformity of contig coverage which is solved using integer linear programming. A Non-Serial Dynamic Programming (NSDP) paradigm is applied to render our algorithm useful in the processing of larger mammalian genomes. To compare scaffolding tools, we employ novel quantitative metrics in addition to the extant metrics in the field. We have also expanded the set of experiments to include scaffolding of low-complexity metagenomic samples. RESULTS: SILP2 achieves better scalability throughg a more efficient NSDP algorithm than previous release of SILP. The results show that SILP2 compares favorably to previous methods OPERA and MIP in both scalability and accuracy for scaffolding single genomes of up to human size, and significantly outperforms them on scaffolding low-complexity metagenomic samples. CONCLUSIONS: Equipped with NSDP, SILP2 is able to scaffold large mammalian genomes, resulting in the longest and most accurate scaffolds. The ILP formulation for the maximum likelihood model is shown to be flexible enough to handle metagenomic samples.


Assuntos
Genoma , Genômica/métodos , Funções Verossimilhança , Análise de Sequência de DNA/métodos , Algoritmos , Animais , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Metagenômica/métodos , Probabilidade , Programação Linear
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA