Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
1.
Genome Res ; 21(5): 665-75, 2011 May.
Article in English | MEDLINE | ID: mdl-21467267

ABSTRACT

Somatic genome rearrangements are thought to play important roles in cancer development. We optimized a long-span paired-end-tag (PET) sequencing approach using 10-Kb genomic DNA inserts to study human genome structural variations (SVs). The use of a 10-Kb insert size allows the identification of breakpoints within repetitive or homology-containing regions of a few kilobases in size and results in a higher physical coverage compared with small insert libraries with the same sequencing effort. We have applied this approach to comprehensively characterize the SVs of 15 cancer and two noncancer genomes and used a filtering approach to strongly enrich for somatic SVs in the cancer genomes. Our analyses revealed that most inversions, deletions, and insertions are germ-line SVs, whereas tandem duplications, unpaired inversions, interchromosomal translocations, and complex rearrangements are over-represented among somatic rearrangements in cancer genomes. We demonstrate that the quantitative and connective nature of DNA-PET data is precise in delineating the genealogy of complex rearrangement events, we observe signatures that are compatible with breakage-fusion-bridge cycles, and we discover that large duplications are among the initial rearrangements that trigger genome instability for extensive amplification in epithelial cancers.


Subject(s)
Base Pairing/genetics , Breast Neoplasms/genetics , Chromosome Mapping/methods , Genome, Human/genetics , Genomic Structural Variation/genetics , Stomach Neoplasms/genetics , Cell Line, Tumor , Computational Biology , DNA/genetics , Female , Gene Rearrangement , Humans , Sequence Analysis, DNA
2.
PLoS Biol ; 4(1): e3, 2006 Jan.
Article in English | MEDLINE | ID: mdl-16336043

ABSTRACT

The human gut is known to be a reservoir of a wide variety of microbes, including viruses. Many RNA viruses are known to be associated with gastroenteritis; however, the enteric RNA viral community present in healthy humans has not been described. Here, we present a comparative metagenomic analysis of the RNA viruses found in three fecal samples from two healthy human individuals. For this study, uncultured viruses were concentrated by tangential flow filtration, and viral RNA was extracted and cloned into shotgun viral cDNA libraries for sequencing analysis. The vast majority of the 36,769 viral sequences obtained were similar to plant pathogenic RNA viruses. The most abundant fecal virus in this study was pepper mild mottle virus (PMMV), which was found in high concentrations--up to 10(9) virions per gram of dry weight fecal matter. PMMV was also detected in 12 (66.7%) of 18 fecal samples collected from healthy individuals on two continents, indicating that this plant virus is prevalent in the human population. A number of pepper-based foods tested positive for PMMV, suggesting dietary origins for this virus. Intriguingly, the fecal PMMV was infectious to host plants, suggesting that humans might act as a vehicle for the dissemination of certain plant viruses.


Subject(s)
Feces/virology , Plant Viruses/isolation & purification , RNA Viruses/isolation & purification , Adult , Capsicum/virology , Food/virology , Humans , Infant , Molecular Sequence Data , Reverse Transcriptase Polymerase Chain Reaction
3.
BMC Bioinformatics ; 9: 368, 2008 Sep 10.
Article in English | MEDLINE | ID: mdl-18783594

ABSTRACT

BACKGROUND: Pathogen detection using DNA microarrays has the potential to become a fast and comprehensive diagnostics tool. However, since pathogen detection chips currently utilize random primers rather than specific primers for the RT-PCR step, bias inherent in random PCR amplification becomes a serious problem that causes large inaccuracies in hybridization signals. RESULTS: In this paper, we study how the efficiency of random PCR amplification affects hybridization signals. We describe a model that predicts the amplification efficiency of a given random primer on a target viral genome. The prediction allows us to filter false-negative probes of the genome that lie in regions of poor random PCR amplification and improves the accuracy of pathogen detection. Subsequently, we propose LOMA, an algorithm to generate random primers that have good amplification efficiency. Wet-lab validation showed that the generated random primers improve the amplification efficiency significantly. CONCLUSION: The blind use of a random primer with attached universal tag (random-tagged primer) in a PCR reaction on a pathogen sample may not lead to a successful amplification. Thus, the design of random-tagged primers is an important consideration when performing PCR.


Subject(s)
Algorithms , DNA Primers/genetics , DNA, Viral/genetics , Polymerase Chain Reaction/methods , Sequence Analysis, DNA/methods , Software , Artifacts , Base Sequence , DNA, Viral/isolation & purification , Data Interpretation, Statistical , Molecular Sequence Data , Reproducibility of Results , Sensitivity and Specificity
4.
Oncotarget ; 6(10): 7727-40, 2015 Apr 10.
Article in English | MEDLINE | ID: mdl-25762628

ABSTRACT

Somatic mutations of TP53 are among the most common in cancer and germline mutations of TP53 (usually missense) can cause Li-Fraumeni syndrome (LFS). Recently, recurrent genomic rearrangements in intron 1 of TP53 have been described in osteosarcoma (OS), a highly malignant neoplasm of bone belonging to the spectrum of LFS tumors. Using whole-genome sequencing of OS, we found features of TP53 intron 1 rearrangements suggesting a unique mechanism correlated with transcription. Screening of 288 OS and 1,090 tumors of other types revealed evidence for TP53 rearrangements in 46 (16%) OS, while none were detected in other tumor types, indicating this rearrangement to be highly specific to OS. We revisited a four-generation LFS family where no TP53 mutation had been identified and found a 445 kb inversion spanning from the TP53 intron 1 towards the centromere. The inversion segregated with tumors in the LFS family. Cancers in this family had loss of heterozygosity, retaining the rearranged allele and resulting in TP53 expression loss. In conclusion, intron 1 rearrangements cause p53-driven malignancies by both germline and somatic mechanisms and provide an important mechanism of TP53 inactivation in LFS, which might in part explain the diagnostic gap of formerly classified "TP53 wild-type" LFS.


Subject(s)
Bone Neoplasms/genetics , Genes, p53 , Introns , Li-Fraumeni Syndrome/genetics , Osteosarcoma/genetics , Adolescent , Adult , Aged , Aged, 80 and over , Base Sequence , Child , Child, Preschool , Gene Rearrangement , Genetic Predisposition to Disease , Germ-Line Mutation , Humans , Male , Middle Aged , Molecular Sequence Data , Pedigree , Young Adult
5.
Cell Rep ; 12(2): 272-85, 2015 Jul 14.
Article in English | MEDLINE | ID: mdl-26146084

ABSTRACT

Genome rearrangements, a hallmark of cancer, can result in gene fusions with oncogenic properties. Using DNA paired-end-tag (DNA-PET) whole-genome sequencing, we analyzed 15 gastric cancers (GCs) from Southeast Asians. Rearrangements were enriched in open chromatin and shaped by chromatin structure. We identified seven rearrangement hot spots and 136 gene fusions. In three out of 100 GC cases, we found recurrent fusions between CLDN18, a tight junction gene, and ARHGAP26, a gene encoding a RHOA inhibitor. Epithelial cell lines expressing CLDN18-ARHGAP26 displayed a dramatic loss of epithelial phenotype and long protrusions indicative of epithelial-mesenchymal transition (EMT). Fusion-positive cell lines showed impaired barrier properties, reduced cell-cell and cell-extracellular matrix adhesion, retarded wound healing, and inhibition of RHOA. Gain of invasion was seen in cancer cell lines expressing the fusion. Thus, CLDN18-ARHGAP26 mediates epithelial disintegration, possibly leading to stomach H(+) leakage, and the fusion might contribute to invasiveness once a cell is transformed.


Subject(s)
Claudins/genetics , GTPase-Activating Proteins/genetics , Oncogene Proteins, Fusion/metabolism , Stomach Neoplasms/pathology , Amino Acid Sequence , Animals , Cell Adhesion , Cell Line, Tumor , Cell Movement , Cell Proliferation , Clathrin/pharmacology , Claudins/metabolism , Dogs , Endocytosis/drug effects , Epithelial Cells/cytology , Epithelial Cells/metabolism , Epithelial-Mesenchymal Transition , GTPase-Activating Proteins/metabolism , HeLa Cells , Humans , MCF-7 Cells , Madin Darby Canine Kidney Cells , Molecular Sequence Data , Oncogene Proteins, Fusion/genetics , Phenotype , Stomach Neoplasms/metabolism , rhoA GTP-Binding Protein/antagonists & inhibitors , rhoA GTP-Binding Protein/metabolism
6.
BMC Infect Dis ; 4: 32, 2004 Sep 06.
Article in English | MEDLINE | ID: mdl-15347429

ABSTRACT

BACKGROUND: The SARS coronavirus is the etiologic agent for the epidemic of the Severe Acute Respiratory Syndrome. The recent emergence of this new pathogen, the careful tracing of its transmission patterns, and the ability to propagate in culture allows the exploration of the mutational dynamics of the SARS-CoV in human populations. METHODS: We sequenced complete SARS-CoV genomes taken from primary human tissues (SIN3408, SIN3725V, SIN3765V), cultured isolates (SIN848, SIN846, SIN842, SIN845, SIN847, SIN849, SIN850, SIN852, SIN3408L), and five consecutive Vero cell passages (SIN2774_P1, SIN2774_P2, SIN2774_P3, SIN2774_P4, SIN2774_P5) arising from SIN2774 isolate. These represented individual patient samples, serial in vitro passages in cell culture, and paired human and cell culture isolates. Employing a refined mutation filtering scheme and constant mutation rate model, the mutation rates were estimated and the possible date of emergence was calculated. Phylogenetic analysis was used to uncover molecular relationships between the isolates. RESULTS: Close examination of whole genome sequence of 54 SARS-CoV isolates identified before 14th October 2003, including 22 from patients in Singapore, revealed the mutations engendered during human-to-Vero and Vero-to-human transmission as well as in multiple Vero cell passages in order to refine our analysis of human-to-human transmission. Though co-infection by different quasipecies in individual tissue samples is observed, the in vitro mutation rate of the SARS-CoV in Vero cell passage is negligible. The in vivo mutation rate, however, is consistent with estimates of other RNA viruses at approximately 5.7 x 10-6 nucleotide substitutions per site per day (0.17 mutations per genome per day), or two mutations per human passage (adjusted R-square = 0.4014). Using the immediate Hotel M contact isolates as roots, we observed that the SARS epidemic has generated four major genetic groups that are geographically associated: two Singapore isolates, one Taiwan isolate, and one North China isolate which appears most closely related to the putative SARS-CoV isolated from a palm civet. Non-synonymous mutations are centered in non-essential ORFs especially in structural and antigenic genes such as the S and M proteins, but these mutations did not distinguish the geographical groupings. However, no non-synonymous mutations were found in the 3CLpro and the polymerase genes. CONCLUSIONS: Our results show that the SARS-CoV is well adapted to growth in culture and did not appear to undergo specific selection in human populations. We further assessed that the putative origin of the SARS epidemic was in late October 2002 which is consistent with a recent estimate using cases from China. The greater sequence divergence in the structural and antigenic proteins and consistent deletions in the 3'--most portion of the viral genome suggest that certain selection pressures are interacting with the functional nature of these validated and putative ORFs.


Subject(s)
Mutation , Severe Acute Respiratory Syndrome/virology , Severe acute respiratory syndrome-related coronavirus/genetics , Animals , Chlorocebus aethiops , Cluster Analysis , DNA, Complementary/chemistry , Genome, Viral , Humans , Mass Spectrometry , Phylogeny , Polymorphism, Single Nucleotide , Probability , RNA, Viral/genetics , RNA, Viral/isolation & purification , Severe acute respiratory syndrome-related coronavirus/classification , Severe acute respiratory syndrome-related coronavirus/isolation & purification , Sequence Alignment , Serial Passage , Singapore , Vero Cells
7.
PLoS One ; 7(9): e46152, 2012.
Article in English | MEDLINE | ID: mdl-23029419

ABSTRACT

Structural variations (SVs) contribute significantly to the variability of the human genome and extensive genomic rearrangements are a hallmark of cancer. While genomic DNA paired-end-tag (DNA-PET) sequencing is an attractive approach to identify genomic SVs, the current application of PET sequencing with short insert size DNA can be insufficient for the comprehensive mapping of SVs in low complexity and repeat-rich genomic regions. We employed a recently developed procedure to generate PET sequencing data using large DNA inserts of 10-20 kb and compared their characteristics with short insert (1 kb) libraries for their ability to identify SVs. Our results suggest that although short insert libraries bear an advantage in identifying small deletions, they do not provide significantly better breakpoint resolution. In contrast, large inserts are superior to short inserts in providing higher physical genome coverage for the same sequencing cost and achieve greater sensitivity, in practice, for the identification of several classes of SVs, such as copy number neutral and complex events. Furthermore, our results confirm that large insert libraries allow for the identification of SVs within repetitive sequences, which cannot be spanned by short inserts. This provides a key advantage in studying rearrangements in cancer, and we show how it can be used in a fusion-point-guided-concatenation algorithm to study focally amplified regions in cancer.


Subject(s)
Genome, Human , Genomic Structural Variation , Mutation , Neoplasms/genetics , Open Reading Frames , Sequence Analysis, DNA/methods , Algorithms , Cell Line, Tumor , Chromosome Mapping , DNA Copy Number Variations , Genomic Library , Humans , Mutagenesis, Insertional
8.
Genome Biol ; 13(12): R115, 2012 Dec 13.
Article in English | MEDLINE | ID: mdl-23237666

ABSTRACT

BACKGROUND: Gastric cancer is the second highest cause of global cancer mortality. To explore the complete repertoire of somatic alterations in gastric cancer, we combined massively parallel short read and DNA paired-end tag sequencing to present the first whole-genome analysis of two gastric adenocarcinomas, one with chromosomal instability and the other with microsatellite instability. RESULTS: Integrative analysis and de novo assemblies revealed the architecture of a wild-type KRAS amplification, a common driver event in gastric cancer. We discovered three distinct mutational signatures in gastric cancer--against a genome-wide backdrop of oxidative and microsatellite instability-related mutational signatures, we identified the first exome-specific mutational signature. Further characterization of the impact of these signatures by combining sequencing data from 40 complete gastric cancer exomes and targeted screening of an additional 94 independent gastric tumors uncovered ACVR2A, RPL22 and LMAN1 as recurrently mutated genes in microsatellite instability-positive gastric cancer and PAPPA as a recurrently mutated gene in TP53 wild-type gastric cancer. CONCLUSIONS: These results highlight how whole-genome cancer sequencing can uncover information relevant to tissue-specific carcinogenesis that would otherwise be missed from exome-sequencing data.


Subject(s)
DNA Mutational Analysis/methods , High-Throughput Nucleotide Sequencing/methods , Stomach Neoplasms/genetics , Adenocarcinoma/genetics , Chromosomal Instability , Deamination , Exome , Genomics , Microsatellite Instability , Mutation , Reactive Oxygen Species/metabolism
9.
Nat Med ; 18(4): 521-8, 2012 Mar 18.
Article in English | MEDLINE | ID: mdl-22426421

ABSTRACT

Tyrosine kinase inhibitors (TKIs) elicit high response rates among individuals with kinase-driven malignancies, including chronic myeloid leukemia (CML) and epidermal growth factor receptor-mutated non-small-cell lung cancer (EGFR NSCLC). However, the extent and duration of these responses are heterogeneous, suggesting the existence of genetic modifiers affecting an individual's response to TKIs. Using paired-end DNA sequencing, we discovered a common intronic deletion polymorphism in the gene encoding BCL2-like 11 (BIM). BIM is a pro-apoptotic member of the B-cell CLL/lymphoma 2 (BCL2) family of proteins, and its upregulation is required for TKIs to induce apoptosis in kinase-driven cancers. The polymorphism switched BIM splicing from exon 4 to exon 3, which resulted in expression of BIM isoforms lacking the pro-apoptotic BCL2-homology domain 3 (BH3). The polymorphism was sufficient to confer intrinsic TKI resistance in CML and EGFR NSCLC cell lines, but this resistance could be overcome with BH3-mimetic drugs. Notably, individuals with CML and EGFR NSCLC harboring the polymorphism experienced significantly inferior responses to TKIs than did individuals without the polymorphism (P = 0.02 for CML and P = 0.027 for EGFR NSCLC). Our results offer an explanation for the heterogeneity of TKI responses across individuals and suggest the possibility of personalizing therapy with BH3 mimetics to overcome BIM-polymorphism-associated TKI resistance.


Subject(s)
Apoptosis Regulatory Proteins/genetics , Apoptosis/drug effects , Carcinoma, Non-Small-Cell Lung/genetics , Drug Resistance, Neoplasm/drug effects , Leukemia, Myelogenous, Chronic, BCR-ABL Positive/genetics , Lung Neoplasms/genetics , Membrane Proteins/genetics , Polymorphism, Genetic/genetics , Protein Kinase Inhibitors/pharmacology , Proto-Oncogene Proteins/genetics , Sequence Deletion/genetics , Adult , Aged , Aged, 80 and over , Annexins/metabolism , BH3 Interacting Domain Death Agonist Protein/genetics , Bcl-2-Like Protein 11 , Carcinoma, Non-Small-Cell Lung/drug therapy , Cell Line, Tumor , Cohort Studies , Dose-Response Relationship, Drug , Drug Resistance, Neoplasm/genetics , Enzyme-Linked Immunosorbent Assay/methods , ErbB Receptors/genetics , Exons/genetics , Female , Follow-Up Studies , Gene Expression Regulation, Neoplastic/drug effects , Gene Frequency , Genotype , Humans , International Cooperation , Leukemia, Myelogenous, Chronic, BCR-ABL Positive/drug therapy , Lung Neoplasms/drug therapy , Male , Middle Aged , Protein Isoforms/genetics , Protein Isoforms/metabolism , RNA, Small Interfering/metabolism , Statistics, Nonparametric , Transfection
10.
J Comput Biol ; 15(7): 881-98, 2008 Sep.
Article in English | MEDLINE | ID: mdl-18707535

ABSTRACT

Recombination detection is important before inferring phylogenetic relationships. This will eventually lead to a better understanding of pathogen evolution, more accurate genotyping, and advancements in vaccine development. In this paper, we introduce RB-Finder, a fast and accurate distance-based window method to detect recombination in a multiple sequence alignment. Our method introduces a more informative distance measure and a novel weighting strategy to reduce the window size sensitivity problem and hence improve the accuracy of breakpoint detection. Furthermore, our method is faster than existing phylogeny-based methods since we do not need to construct and compare complex phylogenetic trees. When compared with the current best method Pruned-PDM, our method is a few hundred times more efficient. Experimental evaluation of RB-Finder using synthetic and biological datasets showed that our method is more accurate than existing phylogeny-based methods. We also show how our method has potential use in other related applications such as genotyping.


Subject(s)
Algorithms , Base Sequence , Recombination, Genetic , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Computational Biology/methods , Databases, Genetic , Genotype , HIV-1/genetics , Humans , Models, Genetic , Molecular Sequence Data , Phylogeny
11.
PLoS One ; 3(4): e1862, 2008 Apr 02.
Article in English | MEDLINE | ID: mdl-18382653

ABSTRACT

The indoor atmosphere is an ecological unit that impacts on public health. To investigate the composition of organisms in this space, we applied culture-independent approaches to microbes harvested from the air of two densely populated urban buildings, from which we analyzed 80 megabases genomic DNA sequence and 6000 16S rDNA clones. The air microbiota is primarily bacteria, including potential opportunistic pathogens commonly isolated from human-inhabited environments such as hospitals, but none of the data contain matches to virulent pathogens or bioterror agents. Comparison of air samples with each other and nearby environments suggested that the indoor air microbes are not random transients from surrounding outdoor environments, but rather originate from indoor niches. Sequence annotation by gene function revealed specific adaptive capabilities enriched in the air environment, including genes potentially involved in resistance to desiccation and oxidative damage. This baseline index of air microbiota will be valuable for improving designs of surveillance for natural or man-made release of virulent pathogens.


Subject(s)
Air , Environmental Monitoring/methods , Air Microbiology , Air Pollutants , Air Pollution, Indoor , Algorithms , Contig Mapping , Genome , Genome, Bacterial , Humans , Open Reading Frames , Particulate Matter , Phylogeny , Sequence Analysis, DNA , Ventilation
12.
Genome Res ; 17(6): 898-909, 2007 Jun.
Article in English | MEDLINE | ID: mdl-17568005

ABSTRACT

Recent progress in mapping transcription factor (TF) binding regions can largely be credited to chromatin immunoprecipitation (ChIP) technologies. We compared strategies for mapping TF binding regions in mammalian cells using two different ChIP schemes: ChIP with DNA microarray analysis (ChIP-chip) and ChIP with DNA sequencing (ChIP-PET). We first investigated parameters central to obtaining robust ChIP-chip data sets by analyzing STAT1 targets in the ENCODE regions of the human genome, and then compared ChIP-chip to ChIP-PET. We devised methods for scoring and comparing results among various tiling arrays and examined parameters such as DNA microarray format, oligonucleotide length, hybridization conditions, and the use of competitor Cot-1 DNA. The best performance was achieved with high-density oligonucleotide arrays, oligonucleotides >/=50 bases (b), the presence of competitor Cot-1 DNA and hybridizations conducted in microfluidics stations. When target identification was evaluated as a function of array number, 80%-86% of targets were identified with three or more arrays. Comparison of ChIP-chip with ChIP-PET revealed strong agreement for the highest ranked targets with less overlap for the low ranked targets. With advantages and disadvantages unique to each approach, we found that ChIP-chip and ChIP-PET are frequently complementary in their relative abilities to detect STAT1 targets for the lower ranked targets; each method detected validated targets that were missed by the other method. The most comprehensive list of STAT1 binding regions is obtained by merging results from ChIP-chip and ChIP-sequencing. Overall, this study provides information for robust identification, scoring, and validation of TF targets using ChIP-based technologies.


Subject(s)
Chromatin Immunoprecipitation , Genome, Human , Microfluidic Analytical Techniques , Oligonucleotide Array Sequence Analysis , Sequence Analysis, DNA , Animals , Binding Sites/genetics , HeLa Cells , Humans , STAT1 Transcription Factor/genetics
13.
Article in English | MEDLINE | ID: mdl-16452780

ABSTRACT

The oligo microarray (DNA chip) technology in recent years has a significant impact on genomic study. Many fields such as gene discovery, drug discovery, toxicological research and disease diagnosis, will certainly benefit from its use. A microarray is an orderly arrangement of thousands of DNA fragments where each DNA fragment is a probe (or a fingerprint) of a gene/cDNA. It is important that each probe must uniquely associate with a particular gene/cDNA. Otherwise, the performance of the microarray will be affected. Existing algorithms usually select probes using the criteria of homogeneity, sensitivity, and specificity. Moreover, they improve efficiency employing some heuristics. Such approaches reduce the accuracy. Instead, we make use of some smart filtering techniques to avoid redundant computation while maintaining the accuracy. Based on the new algorithm, optimal short (20 bases) or long (50 or 70 bases) probes can be computed efficiently for large genomes.


Subject(s)
Algorithms , Chromosome Mapping/methods , DNA Probes/genetics , Genome/genetics , Oligonucleotide Array Sequence Analysis/instrumentation , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Base Sequence , Equipment Design , Equipment Failure Analysis , Models, Genetic , Models, Statistical , Molecular Sequence Data , Oligonucleotide Array Sequence Analysis/methods , Pattern Recognition, Automated/methods
SELECTION OF CITATIONS
SEARCH DETAIL