Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
Add more filters










Publication year range
1.
medRxiv ; 2024 Feb 09.
Article in English | MEDLINE | ID: mdl-38076942

ABSTRACT

Background: Large scale genomics projects have identified driver alterations for most childhood cancers that provide reliable biomarkers for clinical diagnosis and disease monitoring using targeted sequencing. However, there is lack of a comprehensive panel that matches the list of known driver genes. Here we fill this gap by developing SJPedPanel for childhood cancers. Results: SJPedPanel covers 5,275 coding exons of 357 driver genes, 297 introns frequently involved in rearrangements that generate fusion oncoproteins, commonly amplified/deleted regions (e.g., MYCN for neuroblastoma, CDKN2A and PAX5 for B-/T-ALL, and SMARCB1 for AT/RT), and 7,590 polymorphism sites for interrogating tumors with aneuploidy, such as hyperdiploid and hypodiploid B-ALL or 17q gain neuroblastoma. We used driver alterations reported from an established real-time clinical genomics cohort (n=253) to validate this gene panel. Among the 485 pathogenic variants reported, our panel covered 417 variants (86%). For 90 rearrangements responsible for oncogenic fusions, our panel covered 74 events (82%). We re-sequenced 113 previously characterized clinical specimens at an average depth of 2,500X using SJPedPanel and recovered 354 (91%) of the 389 reported pathogenic variants. We then investigated the power of this panel in detecting mutations from specimens with low tumor purity (as low as 0.1%) using cell line-based dilution experiments and discovered that this gene panel enabled us to detect ∼80% variants with allele fraction of 0.2%, while the detection rate decreases to ∼50% when the allele fraction is 0.1%. We finally demonstrate its utility in disease monitoring on clinical specimens collected from AML patients in morphologic remission. Conclusions: SJPedPanel enables the detection of clinically relevant genetic alterations including rearrangements responsible for subtype-defining fusions for childhood cancers by targeted sequencing of ∼0.15% of human genome. It will enhance the analysis of specimens with low tumor burdens for cancer monitoring and early detection.

2.
medRxiv ; 2023 Oct 12.
Article in English | MEDLINE | ID: mdl-37873138

ABSTRACT

Sequence-based genetic testing currently identifies causative genetic variants in ∼50% of individuals with developmental and epileptic encephalopathies (DEEs). Aberrant changes in DNA methylation are implicated in various neurodevelopmental disorders but remain unstudied in DEEs. Rare epigenetic variations ("epivariants") can drive disease by modulating gene expression at single loci, whereas genome-wide DNA methylation changes can result in distinct "episignature" biomarkers for monogenic disorders in a growing number of rare diseases. Here, we interrogate the diagnostic utility of genome-wide DNA methylation array analysis on peripheral blood samples from 516 individuals with genetically unsolved DEEs who had previously undergone extensive genetic testing. We identified rare differentially methylated regions (DMRs) and explanatory episignatures to discover causative and candidate genetic etiologies in 10 individuals. We then used long-read sequencing to identify DNA variants underlying rare DMRs, including one balanced translocation, three CG-rich repeat expansions, and two copy number variants. We also identify pathogenic sequence variants associated with episignatures; some had been missed by previous exome sequencing. Although most DEE genes lack known episignatures, the increase in diagnostic yield for DNA methylation analysis in DEEs is comparable to the added yield of genome sequencing. Finally, we refine an episignature for CHD2 using an 850K methylation array which was further refined at higher CpG resolution using bisulfite sequencing to investigate potential insights into CHD2 pathophysiology. Our study demonstrates the diagnostic yield of genome-wide DNA methylation analysis to identify causal and candidate genetic causes as ∼2% (10/516) for unsolved DEE cases.

3.
Genome Biol ; 24(1): 64, 2023 04 04.
Article in English | MEDLINE | ID: mdl-37016431

ABSTRACT

BACKGROUND: The NSD2 p.E1099K (EK) mutation is shown to be enriched in patients with relapsed acute lymphoblastic leukemia (ALL), indicating a role in clonal evolution and drug resistance. RESULTS: To uncover 3D chromatin architecture-related mechanisms underlying drug resistance, we perform Hi-C on three B-ALL cell lines heterozygous for NSD2 EK. The NSD2 mutation leads to widespread remodeling of the 3D genome, most dramatically in terms of compartment changes with a strong bias towards A compartment shifts. Systematic integration of the Hi-C data with previously published ATAC-seq, RNA-seq, and ChIP-seq data show an expansion in H3K36me2 and a shrinkage in H3K27me3 within A compartments as well as increased gene expression and chromatin accessibility. These results suggest that NSD2 EK plays a prominent role in chromatin decompaction through enrichment of H3K36me2. In contrast, we identify few changes in intra-topologically associating domain activity. While compartment changes vary across cell lines, a common core of decompacting loci are shared, driving the expression of genes/pathways previously implicated in drug resistance. We further perform RNA sequencing on a cohort of matched diagnosis/relapse ALL patients harboring the relapse-specific NSD2 EK mutation. Changes in patient gene expression upon relapse significantly correlate with core compartment changes, further implicating the role of NSD2 EK in genome decompaction. CONCLUSIONS: In spite of cell-context-dependent changes mediated by EK, there appears to be a shared transcriptional program dependent on compartment shifts which could explain phenotypic differences across EK cell lines. This core program is an attractive target for therapeutic intervention.


Subject(s)
Precursor Cell Lymphoblastic Leukemia-Lymphoma , Repressor Proteins , Child , Humans , Chromatin , Precursor Cell Lymphoblastic Leukemia-Lymphoma/genetics , Repressor Proteins/genetics , Repressor Proteins/metabolism
4.
Nat Commun ; 14(1): 1739, 2023 04 05.
Article in English | MEDLINE | ID: mdl-37019972

ABSTRACT

Oncogenic fusions formed through chromosomal rearrangements are hallmarks of childhood cancer that define cancer subtype, predict outcome, persist through treatment, and can be ideal therapeutic targets. However, mechanistic understanding of the etiology of oncogenic fusions remains elusive. Here we report a comprehensive detection of 272 oncogenic fusion gene pairs by using tumor transcriptome sequencing data from 5190 childhood cancer patients. We identify diverse factors, including translation frame, protein domain, splicing, and gene length, that shape the formation of oncogenic fusions. Our mathematical modeling reveals a strong link between differential selection pressure and clinical outcome in CBFB-MYH11. We discover 4 oncogenic fusions, including RUNX1-RUNX1T1, TCF3-PBX1, CBFA2T3-GLIS2, and KMT2A-AFDN, with promoter-hijacking-like features that may offer alternative strategies for therapeutic targeting. We uncover extensive alternative splicing in oncogenic fusions including KMT2A-MLLT3, KMT2A-MLLT10, C11orf95-RELA, NUP98-NSD1, KMT2A-AFDN and ETV6-RUNX1. We discover neo splice sites in 18 oncogenic fusion gene pairs and demonstrate that such splice sites confer therapeutic vulnerability for etiology-based genome editing. Our study reveals general principles on the etiology of oncogenic fusions in childhood cancer and suggests profound clinical implications including etiology-based risk stratification and genome-editing-based therapeutics.


Subject(s)
Core Binding Factor Alpha 2 Subunit , Precursor Cell Lymphoblastic Leukemia-Lymphoma , Humans , Child , Core Binding Factor Alpha 2 Subunit/genetics , Oncogene Fusion , Precursor Cell Lymphoblastic Leukemia-Lymphoma/genetics , Transcriptome , Causality , Oncogene Proteins, Fusion/genetics
6.
Blood Cancer Discov ; 3(3): 194-207, 2022 05 05.
Article in English | MEDLINE | ID: mdl-35176137

ABSTRACT

The genetics of relapsed pediatric acute myeloid leukemia (AML) has yet to be comprehensively defined. Here, we present the spectrum of genomic alterations in 136 relapsed pediatric AMLs. We identified recurrent exon 13 tandem duplications (TD) in upstream binding transcription factor (UBTF) in 9% of relapsed AML cases. UBTF-TD AMLs commonly have normal karyotype or trisomy 8 with cooccurring WT1 mutations or FLT3-ITD but not other known oncogenic fusions. These UBTF-TD events are stable during disease progression and are present in the founding clone. In addition, we observed that UBTF-TD AMLs account for approximately 4% of all de novo pediatric AMLs, are less common in adults, and are associated with poor outcomes and MRD positivity. Expression of UBTF-TD in primary hematopoietic cells is sufficient to enhance serial clonogenic activity and to drive a similar transcriptional program to UBTF-TD AMLs. Collectively, these clinical, genomic, and functional data establish UBTF-TD as a new recurrent mutation in AML. SIGNIFICANCE: We defined the spectrum of mutations in relapsed pediatric AML and identified UBTF-TDs as a new recurrent genetic alteration. These duplications are more common in children and define a group of AMLs with intermediate-risk cytogenetic abnormalities, FLT3-ITD and WT1 alterations, and are associated with poor outcomes. See related commentary by Hasserjian and Nardi, p. 173. This article is highlighted in the In This Issue feature, p. 171.


Subject(s)
Leukemia, Myeloid, Acute , Adult , Child , Chromosome Aberrations , Exons , Genomics , Humans , Leukemia, Myeloid, Acute/genetics , Mutation , Recurrence
8.
Genome Biol ; 22(1): 37, 2021 01 25.
Article in English | MEDLINE | ID: mdl-33487172

ABSTRACT

BACKGROUND: There is currently no method to precisely measure the errors that occur in the sequencing instrument/sequencer, which is critical for next-generation sequencing applications aimed at discovering the genetic makeup of heterogeneous cellular populations. RESULTS: We propose a novel computational method, SequencErr, to address this challenge by measuring the base correspondence between overlapping regions in forward and reverse reads. An analysis of 3777 public datasets from 75 research institutions in 18 countries revealed the sequencer error rate to be ~ 10 per million (pm) and 1.4% of sequencers and 2.7% of flow cells have error rates > 100 pm. At the flow cell level, error rates are elevated in the bottom surfaces and > 90% of HiSeq and NovaSeq flow cells have at least one outlier error-prone tile. By sequencing a common DNA library on different sequencers, we demonstrate that sequencers with high error rates have reduced overall sequencing accuracy, and removal of outlier error-prone tiles improves sequencing accuracy. We demonstrate that SequencErr can reveal novel insights relative to the popular quality control method FastQC and achieve a 10-fold lower error rate than popular error correction methods including Lighter and Musket. CONCLUSIONS: Our study reveals novel insights into the nature of DNA sequencing errors incurred on DNA sequencers. Our method can be used to assess, calibrate, and monitor sequencer accuracy, and to computationally suppress sequencer errors in existing datasets.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Algorithms , Calibration , Gene Library , Humans , Models, Genetic , SARS-CoV-2 , Sequence Analysis, DNA/methods
9.
Nat Cancer ; 2(8): 819-834, 2021 08.
Article in English | MEDLINE | ID: mdl-35122027

ABSTRACT

Chemotherapy is a standard treatment for pediatric acute lymphoblastic leukemia (ALL), which sometimes relapses with chemoresistant features. However, whether acquired drug-resistance mutations in relapsed ALL pre-exist or are induced by treatment remains unknown. Here we provide direct evidence of a specific mechanism by which chemotherapy induces drug-resistance-associated mutations leading to relapse. Using genomic and functional analysis of relapsed ALL we show that thiopurine treatment in mismatch repair (MMR)-deficient leukemias induces hotspot TP53 R248Q mutations through a specific mutational signature (thio-dMMR). Clonal evolution analysis reveals sequential MMR inactivation followed by TP53 mutation in some patients with ALL. Acquired TP53 R248Q mutations are associated with on-treatment relapse, poor treatment response and resistance to multiple chemotherapeutic agents, which could be reversed by pharmacological p53 reactivation. Our findings indicate that TP53 R248Q in relapsed ALL originates through synergistic mutagenesis from thiopurine treatment and MMR deficiency and suggest strategies to prevent or treat TP53-mutant relapse.


Subject(s)
Neoplastic Syndromes, Hereditary , Precursor Cell Lymphoblastic Leukemia-Lymphoma , Child , Humans , Mutagenesis , Precursor Cell Lymphoblastic Leukemia-Lymphoma/genetics , Primary Immunodeficiency Diseases , Recurrence , Tumor Suppressor Protein p53/genetics
10.
Cancer Med ; 7(11): 5439-5447, 2018 11.
Article in English | MEDLINE | ID: mdl-30264478

ABSTRACT

Liquid biopsy is increasingly gaining traction as an alternative to invasive solid tumor biopsies for prognosis, treatment decisions, and disease monitoring. Matched tumor-plasma samples were collected from 180 patients across different cancers with >90% of the samples below Stage IIIB. Tumors were profiled using next-generation sequencing (NGS) or quantitative PCR (qPCR), and the mutation status was queried in the matched plasma using digital platforms such as droplet digital PCR (ddCPR) or NGS for concordance. Tumor-plasma concordance of 82% and 32% was observed in advanced (Stage IIB and above) and early (Stage I to Stage IIA) stage samples, respectively. Interestingly, the overall survival outcomes correlated to presurgical/at-biopsy ctDNA levels. Baseline ctDNA stratified patients into three categories: (a) high ctDNA correlated with poor survival outcome, (b) undetectable ctDNA with good outcome, and (c) low ctDNA whose outcome was ambiguous. ctDNA could be a powerful tool for therapy decisions and patient management in a large number of cancers across a variety of stages.


Subject(s)
Circulating Tumor DNA , Neoplasms/genetics , Neoplasms/pathology , Adult , Aged , Aged, 80 and over , Female , Humans , Kaplan-Meier Estimate , Liquid Biopsy , Male , Middle Aged , Mutation , Prognosis , Proportional Hazards Models , Young Adult
11.
PLoS Comput Biol ; 14(1): e1005802, 2018 01.
Article in English | MEDLINE | ID: mdl-29346365

ABSTRACT

Education and training are two essential ingredients for a successful career. On one hand, universities provide students a curriculum for specializing in one's field of study, and on the other, internships complement coursework and provide invaluable training experience for a fruitful career. Consequently, undergraduates and graduates are encouraged to undertake an internship during the course of their degree. The opportunity to explore one's research interests in the early stages of their education is important for students because it improves their skill set and gives their career a boost. In the long term, this helps to close the gap between skills and employability among students across the globe and balance the research capacity in the field of computational biology. However, training opportunities are often scarce for computational biology students, particularly for those who reside in less-privileged regions. Aimed at helping students develop research and academic skills in computational biology and alleviating the divide across countries, the Student Council of the International Society for Computational Biology introduced its Internship Program in 2009. The Internship Program is committed to providing access to computational biology training, especially for students from developing regions, and improving competencies in the field. Here, we present how the Internship Program works and the impact of the internship opportunities so far, along with the challenges associated with this program.


Subject(s)
Computational Biology/education , Internship and Residency , Algorithms , Australia , Curriculum , Developing Countries , Europe , Geography , Humans , Program Development , Students , Universities
12.
PLoS One ; 12(3): e0173408, 2017.
Article in English | MEDLINE | ID: mdl-28282404

ABSTRACT

Interactions between different phytoplankton taxa and heterotrophic bacterial communities within aquatic environments can differentially support growth of various heterotrophic bacterial species. In this study, phytoplankton diversity was studied using traditional microscopic techniques and the bacterial communities associated with phytoplankton bloom were studied using High Throughput Sequencing (HTS) analysis of 16S rRNA gene amplicons from the V1-V3 and V3-V4 hypervariable regions. Samples were collected from Lake Akersvannet, a eutrophic lake in South Norway, during the growth season from June to August 2013. Microscopic examination revealed that the phytoplankton community was mostly represented by Cyanobacteria and the dinoflagellate Ceratium hirundinella. The HTS results revealed that Proteobacteria (Alpha, Beta, and Gamma), Bacteriodetes, Cyanobacteria, Actinobacteria and Verrucomicrobia dominated the bacterial community, with varying relative abundances throughout the sampling season. Species level identification of Cyanobacteria showed a mixed population of Aphanizomenon flos-aquae, Microcystis aeruginosa and Woronichinia naegeliana. A significant proportion of the microbial community was composed of unclassified taxa which might represent locally adapted freshwater bacterial groups. Comparison of cyanobacterial species composition from HTS and microscopy revealed quantitative discrepancies, indicating a need for cross validation of results. To our knowledge, this is the first study that uses HTS methods for studying the bacterial community associated with phytoplankton blooms in a Norwegian lake. The study demonstrates the value of considering results from multiple methods when studying bacterial communities.


Subject(s)
Bacteria/genetics , Lakes/microbiology , Phytoplankton/genetics , RNA, Ribosomal, 16S/metabolism , Bacteria/isolation & purification , Bacteria/metabolism , Cyanobacteria/genetics , DNA, Bacterial/chemistry , DNA, Bacterial/isolation & purification , DNA, Bacterial/metabolism , Enzyme-Linked Immunosorbent Assay , High-Throughput Nucleotide Sequencing , Microcystins/analysis , Microcystis/genetics , Microcystis/metabolism , Norway , Phytoplankton/growth & development , Proteobacteria/genetics , RNA, Ribosomal, 16S/chemistry , RNA, Ribosomal, 16S/genetics , Sequence Analysis, DNA
13.
PeerJ ; 4: e2326, 2016.
Article in English | MEDLINE | ID: mdl-27635316

ABSTRACT

BACKGROUND: Dengue is one of the most common arboviral diseases prevalent worldwide and is caused by Dengue viruses (genus Flavivirus, family Flaviviridae). There are four serotypes of Dengue Virus (DENV-1 to DENV-4), each of which is further subdivided into distinct genotypes. DENV-2 is frequently associated with severe dengue infections and epidemics. DENV-2 consists of six genotypes such as Asian/American, Asian I, Asian II, Cosmopolitan, American and sylvatic. Comparative genomic study was carried out to infer population structure of DENV-2 and to analyze the role of evolutionary and spatiotemporal factors in emergence of diversifying lineages. METHODS: Complete genome sequences of 990 strains of DENV-2 were analyzed using Bayesian-based population genetics and phylogenetic approaches to infer genetically distinct lineages. The role of spatiotemporal factors, genetic recombination and selection pressure in the evolution of DENV-2 is examined using the sequence-based bioinformatics approaches. RESULTS: DENV-2 genetic structure is complex and consists of fifteen subpopulations/lineages. The Asian/American genotype is observed to be diversified into seven lineages. The Asian I, Cosmopolitan and sylvatic genotypes were found to be subdivided into two lineages, each. The populations of American and Asian II genotypes were observed to be homogeneous. Significant evidence of episodic positive selection was observed in all the genes, except NS4A. Positive selection operational on a few codons in envelope gene confers antigenic and lineage diversity in the American strains of Asian/American genotype. Selection on codons of non-structural genes was observed to impact diversification of lineages in Asian I, cosmopolitan and sylvatic genotypes. Evidence of intra/inter-genotype recombination was obtained and the uncertainty in classification of recombinant strains was resolved using the population genetics approach. DISCUSSION: Complete genome-based analysis revealed that the worldwide population of DENV-2 strains is subdivided into fifteen lineages. The population structure of DENV-2 is spatiotemporal and is shaped by episodic positive selection and recombination. Intra-genotype diversity was observed in four genotypes (Asian/American, Asian I, cosmopolitan and sylvatic). Episodic positive selection on envelope and non-structural genes translates into antigenic diversity and appears to be responsible for emergence of strains/lineages in DENV-2 genotypes. Understanding of the genotype diversity and emerging lineages will be useful to design strategies for epidemiological surveillance and vaccine design.

14.
Sci Rep ; 6: 27436, 2016 06 06.
Article in English | MEDLINE | ID: mdl-27264539

ABSTRACT

Cellular mRNAs are predominantly translated in a cap-dependent manner. However, some viral and a subset of cellular mRNAs initiate their translation in a cap-independent manner. This requires presence of a structured RNA element, known as, Internal Ribosome Entry Site (IRES) in their 5' untranslated regions (UTRs). Experimental demonstration of IRES in UTR remains a challenging task. Computational prediction of IRES merely based on sequence and structure conservation is also difficult, particularly for cellular IRES. A web server, IRESPred is developed for prediction of both viral and cellular IRES using Support Vector Machine (SVM). The predictive model was built using 35 features that are based on sequence and structural properties of UTRs and the probabilities of interactions between UTR and small subunit ribosomal proteins (SSRPs). The model was found to have 75.51% accuracy, 75.75% sensitivity, 75.25% specificity, 75.75% precision and Matthews Correlation Coefficient (MCC) of 0.51 in blind testing. IRESPred was found to perform better than the only available viral IRES prediction server, VIPS. The IRESPred server is freely available at http://bioinfo.net.in/IRESPred/.


Subject(s)
Internal Ribosome Entry Sites , Internet , 5' Untranslated Regions , Humans , Membrane Fusion , RNA, Viral/genetics
15.
PLoS One ; 11(2): e0149350, 2016.
Article in English | MEDLINE | ID: mdl-26870949

ABSTRACT

Rhinoviruses (RV) are increasingly being reported to cause mild to severe infections of respiratory tract in humans. RV are antigenically the most diverse species of the genus Enterovirus and family Picornaviridae. There are three species of RV (RV-A, -B and -C), with 80, 32 and 55 serotypes/types, respectively. Antigenic variation is the main limiting factor for development of a cross-protective vaccine against RV.Serotyping of Rhinoviruses is carried out using cross-neutralization assays in cell culture. However, these assays become laborious and time-consuming for the large number of strains. Alternatively, serotyping of RV is carried out by alignment-based phylogeny of both protein and nucleotide sequences of VP1. However, serotyping of RV based on alignment-based phylogeny is a multi-step process, which needs to be repeated every time a new isolate is sequenced. In view of the growing need for serotyping of RV, an alignment-free method based on "return time distribution" (RTD) of amino acid residues in VP1 protein has been developed and implemented in the form of a web server titled RV-Typer. RV-Typer accepts nucleotide or protein sequences as an input and computes return times of di-peptides (k = 2) to assign serotypes. The RV-Typer performs with 100% sensitivity and specificity. It is significantly faster than alignment-based methods. The web server is available at http://bioinfo.net.in/RV-Typer/home.html.


Subject(s)
Phylogeny , Picornaviridae Infections/virology , Rhinovirus/classification , Rhinovirus/genetics , Serotyping/methods , Capsid Proteins/genetics , Genes, Viral , Humans , Internet , Software
16.
PLoS One ; 9(2): e88981, 2014.
Article in English | MEDLINE | ID: mdl-24586469

ABSTRACT

Rhinoviruses, formerly known as Human rhinoviruses, are the most common cause of air-borne upper respiratory tract infections in humans. Rhinoviruses belong to the family Picornaviridae and are divided into three species namely, Rhinovirus A, -B and -C, which are antigenically diverse. Genetic recombination is found to be one of the important causes for diversification of Rhinovirus species. Although emerging lineages within Rhinoviruses have been reported, their population structure has not been studied yet. The availability of complete genome sequences facilitates study of population structure, genetic diversity and underlying evolutionary forces, such as mutation, recombination and selection pressure. Analysis of complete genomes of Rhinoviruses using a model-based population genetics approach provided a strong evidence for existence of seven genetically distinct subpopulations. As a result of diversification, Rhinovirus A and -C populations are divided into four and two subpopulations, respectively. Genetically, the Rhinovirus B population was found to be homogeneous. Intra-species recombination was observed to be prominent in Rhinovirus A and -C species. Significant evidence of episodic positive selection was obtained for several sites within coding sequences of structural and non-structural proteins. This corroborates well with known phenotypic properties such as antigenicity of structural proteins. Episodic positive selection appears to be responsible for emergence of new lineages especially in Rhinovirus A. In summary, the Rhinovirus population is an ensemble of seven distinct lineages. In case of Rhinovirus A, intra-species recombination and episodic positive selection contribute to its further diversification. In case of Rhinovirus C, intra- and inter-species recombinations are responsible for observed diversity. Population genetics approach was further useful to analyze phylogenetic tree topologies pertaining to recombinant strains, especially when trees are derived using complete genomes. Understanding of population structure serves as a foundation for designing new vaccines and drugs as well as to explain emergence of drug resistance amongst subpopulations.


Subject(s)
Evolution, Molecular , Genetic Variation , Rhinovirus/genetics , Genetic Linkage , Genome, Viral/genetics , Humans , Phylogeny , RNA, Viral/genetics , Recombination, Genetic , Respiratory Tract Infections/virology , Rhinovirus/classification , Sequence Analysis, RNA
17.
J Virol Methods ; 198: 41-55, 2014 Mar.
Article in English | MEDLINE | ID: mdl-24388930

ABSTRACT

West Nile virus (WNV), genus Flavivirus, family Flaviviridae, is a major cause of viral encephalitis with broad host range and global spread. The virus has undergone a series of evolutionary changes with emergence of various genotypic lineages that are known to differ in type and severity of the diseases caused. Currently, genotyping is carried out using molecular phylogeny of complete coding sequences and genotype is assigned based on proximity to reference genotypes in tree topology. Efficient epidemiological surveillance of WNVs demands development of objective criteria for typing. An alignment-free approach based on return time distribution (RTD) of k-mers has been validated for genotyping of WNVs. The RTDs of complete genome sequences at k=7 were found to be optimum for classification of the known lineages of WNVs as well as for genotyping. It provides time and computationally efficient alternative for genome based annotation of WNV lineages. The development of a WNV Typer server based on RTD is described (http://bioinfo.net.in/wnv/homepage.html). Both the method and the server have 100% sensitivity and specificity.


Subject(s)
Genome, Viral/genetics , Sequence Analysis/methods , West Nile virus/genetics , Genotype , Phylogeny , West Nile Fever/virology
18.
Mol Phylogenet Evol ; 65(2): 510-22, 2012 Nov.
Article in English | MEDLINE | ID: mdl-22820020

ABSTRACT

The data deluge in post-genomic era demands development of novel data mining tools. Existing molecular phylogeny analyses (MPAs) developed for individual gene/protein sequences are alignment-based. However, the size of genomic data and uncertainties associated with alignments, necessitate development of alignment-free methods for MPA. Derivation of distances between sequences is an important step in both, alignment-dependant and alignment-free methods. Various alignment-free distance measures based on oligo-nucleotide frequencies, information content, compression techniques, etc. have been proposed. However, these distance measures do not account for relative order of components viz. nucleotides or amino acids. A new distance measure, based on the concept of 'return time distribution' (RTD) of k-mers is proposed, which accounts for the sequence composition and their relative orders. Statistical parameters of RTDs are used to derive a distance function. The resultant distance matrix is used for clustering and phylogeny using Neighbor-joining. Its performance for MPA and subtyping was evaluated using simulated data generated by block-bootstrap, receiver operating characteristics and leave-one-out cross validation methods. The proposed method was successfully applied for MPA of family Flaviviridae and subtyping of Dengue viruses. It is observed that method retains resolution for classification and subtyping of viruses at varying levels of sequence similarity and taxonomic hierarchy.


Subject(s)
Phylogeny , Sequence Analysis/methods , Cluster Analysis , Computational Biology , Data Mining , Dengue Virus/classification , Flaviviridae/classification , Genome, Viral , Pattern Recognition, Automated , Sequence Alignment
SELECTION OF CITATIONS
SEARCH DETAIL
...