Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 21
Filter
1.
Leukemia ; 37(4): 776-787, 2023 04.
Article in English | MEDLINE | ID: mdl-36788336

ABSTRACT

We recently described a 16-gene expression signature for improved risk stratification of acute myeloid leukemia (AML) patients called the AML Prognostic Score (APS). A subset of APS-high-risk AML patients showed increased levels of focal adhesion kinase (FAK), encoded by the Protein Tyrosine Kinase 2 (PTK2) gene, which was correlated with RUNX1 mutations. RUNX1 mutant cells are more sensitive to PTK2 inhibitors. As we were not able to detect RUNX1-binding sites in the PTK2 promoter, we hypothesized that RUNX1 might regulate micro(mi)RNAs that repress PTK2, such that loss-of-function RUNX1 mutations would result in reduced miRNA expression and derepression of PTK2. Examination of paired RNA-seq and miRNA-seq data from 301 AML cases revealed two miRNAs that positively correlated with RUNX1 expression, contained RUNX1-binding sites in their promoters and were predicted to target PTK2. We show that the hsa-let7a-2-3p and hsa-miR-135a-5p promoters are regulated by RUNX1, and that PTK2 is a direct target of both miRNAs. Even in the absence of RUNX1 mutations, hsa-let7a-2-3p and hsa-miR-135a-5p regulate PTK2 expression, and reduced expression of these two miRNAs sensitizes AML cells to PTK2 inhibition. These data explain how RUNX1 regulates PTK2, and identify potential miRNA biomarkers for targeting AML with PTK2 inhibitors.


Subject(s)
Leukemia, Myeloid, Acute , MicroRNAs , Humans , Core Binding Factor Alpha 2 Subunit/genetics , Focal Adhesion Kinase 1 , Focal Adhesion Protein-Tyrosine Kinases , Leukemia, Myeloid, Acute/genetics , MicroRNAs/genetics , MicroRNAs/metabolism
2.
Nat Commun ; 12(1): 2474, 2021 04 30.
Article in English | MEDLINE | ID: mdl-33931648

ABSTRACT

As more clinically-relevant genomic features of myeloid malignancies are revealed, it has become clear that targeted clinical genetic testing is inadequate for risk stratification. Here, we develop and validate a clinical transcriptome-based assay for stratification of acute myeloid leukemia (AML). Comparison of ribonucleic acid sequencing (RNA-Seq) to whole genome and exome sequencing reveals that a standalone RNA-Seq assay offers the greatest diagnostic return, enabling identification of expressed gene fusions, single nucleotide and short insertion/deletion variants, and whole-transcriptome expression information. Expression data from 154 AML patients are used to develop a novel AML prognostic score, which is strongly associated with patient outcomes across 620 patients from three independent cohorts, and 42 patients from a prospective cohort. When combined with molecular risk guidelines, the risk score allows for the re-stratification of 22.1 to 25.3% of AML patients from three independent cohorts into correct risk groups. Within the adverse-risk subgroup, we identify a subset of patients characterized by dysregulated integrin signaling and RUNX1 or TP53 mutation. We show that these patients may benefit from therapy with inhibitors of focal adhesion kinase, encoded by PTK2, demonstrating additional utility of transcriptome-based testing for therapy selection in myeloid malignancy.


Subject(s)
Biomarkers, Tumor/metabolism , Gene Expression Regulation, Neoplastic/genetics , Leukemia, Myeloid, Acute/diagnosis , Leukemia, Myeloid, Acute/metabolism , Biomarkers, Tumor/genetics , Cell Line, Tumor , Cohort Studies , Core Binding Factor Alpha 2 Subunit/genetics , Core Binding Factor Alpha 2 Subunit/metabolism , Female , Gene Fusion , Humans , INDEL Mutation , Integrins/genetics , Integrins/metabolism , Leukemia, Myeloid, Acute/genetics , Male , Polymorphism, Single Nucleotide , Prognosis , Prospective Studies , RNA-Seq , Risk Factors , Signal Transduction/genetics , Survival Analysis , Transcriptome , Tumor Suppressor Protein p53/genetics , Tumor Suppressor Protein p53/metabolism , Exome Sequencing , Whole Genome Sequencing
3.
J Mol Diagn ; 23(4): 455-466, 2021 04.
Article in English | MEDLINE | ID: mdl-33486075

ABSTRACT

Clinical reporting of solid tumor sequencing requires reliable assessment of the accuracy and reproducibility of each assay. Somatic mutation variant allele fractions may be below 10% in many samples due to sample heterogeneity, tumor clonality, and/or sample degradation in fixatives such as formalin. The toolkits available to the clinical sequencing community for correlating assay design parameters with assay sensitivity remain limited, and large-scale empirical assessments are often relied upon due to the lack of clear theoretical grounding. To address this uncertainty, a theoretical model was developed for predicting the expected variant calling sensitivity for a given library complexity and sequencing depth. Binomial models were found to be appropriate when assay sensitivity was only limited by library complexity or sequencing depth, but functional scaling for library complexity was necessary when both library complexity and sequencing depth were co-limiting. This model was empirically validated with sequencing experiments by using a series of DNA input amounts and sequencing depths. Based on these findings, a workflow is proposed for determining the limiting factors to sensitivity in different assay designs, and the formulas for these scenarios are presented. The approach described here provides designers of clinical assays with the methods to theoretically predict assay design outcomes a priori, potentially reducing burden in clinical tumor assay design and validation efforts.


Subject(s)
Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Models, Statistical , Neoplasms/genetics , Polymerase Chain Reaction/methods , Alleles , DNA/genetics , DNA/isolation & purification , Humans , Limit of Detection , Mutation , Polymorphism, Single Nucleotide , Reproducibility of Results , Sensitivity and Specificity
4.
Blood ; 135(25): 2235-2251, 2020 06 18.
Article in English | MEDLINE | ID: mdl-32384151

ABSTRACT

Aging is associated with significant changes in the hematopoietic system, including increased inflammation, impaired hematopoietic stem cell (HSC) function, and increased incidence of myeloid malignancy. Inflammation of aging ("inflammaging") has been proposed as a driver of age-related changes in HSC function and myeloid malignancy, but mechanisms linking these phenomena remain poorly defined. We identified loss of miR-146a as driving aging-associated inflammation in AML patients. miR-146a expression declined in old wild-type mice, and loss of miR-146a promoted premature HSC aging and inflammation in young miR-146a-null mice, preceding development of aging-associated myeloid malignancy. Using single-cell assays of HSC quiescence, stemness, differentiation potential, and epigenetic state to probe HSC function and population structure, we found that loss of miR-146a depleted a subpopulation of primitive, quiescent HSCs. DNA methylation and transcriptome profiling implicated NF-κB, IL6, and TNF as potential drivers of HSC dysfunction, activating an inflammatory signaling relay promoting IL6 and TNF secretion from mature miR-146a-/- myeloid and lymphoid cells. Reducing inflammation by targeting Il6 or Tnf was sufficient to restore single-cell measures of miR-146a-/- HSC function and subpopulation structure and reduced the incidence of hematological malignancy in miR-146a-/- mice. miR-146a-/- HSCs exhibited enhanced sensitivity to IL6 stimulation, indicating that loss of miR-146a affects HSC function via both cell-extrinsic inflammatory signals and increased cell-intrinsic sensitivity to inflammation. Thus, loss of miR-146a regulates cell-extrinsic and -intrinsic mechanisms linking HSC inflammaging to the development of myeloid malignancy.


Subject(s)
Aging/genetics , Inflammation/genetics , Interleukin-6/physiology , Leukemia, Myeloid, Acute/etiology , MicroRNAs/genetics , Tumor Necrosis Factor-alpha/physiology , Adolescent , Adult , Aged , Aging/immunology , Animals , Cell Differentiation , Cell Self Renewal , Cellular Senescence , Cytokines/biosynthesis , DNA Methylation , Female , Hematopoietic Stem Cells/metabolism , Hematopoietic Stem Cells/pathology , Humans , Inflammation/physiopathology , Interleukin-6/antagonists & inhibitors , Male , Mice , Mice, Knockout , MicroRNAs/biosynthesis , Middle Aged , NF-kappa B/physiology , Single-Cell Analysis , Transcriptome , Tumor Necrosis Factor-alpha/antagonists & inhibitors , Young Adult
5.
Nat Cell Biol ; 22(5): 526-533, 2020 05.
Article in English | MEDLINE | ID: mdl-32251398

ABSTRACT

Interstitial deletion of the long arm of chromosome 5 (del(5q)) is the most common structural genomic variant in myelodysplastic syndromes (MDS)1. Lenalidomide (LEN) is the treatment of choice for patients with del(5q) MDS, but half of the responding patients become resistant2 within 2 years. TP53 mutations are detected in ~20% of LEN-resistant patients3. Here we show that patients who become resistant to LEN harbour recurrent variants of TP53 or RUNX1. LEN upregulated RUNX1 protein and function in a CRBN- and TP53-dependent manner in del(5q) cells, and mutation or downregulation of RUNX1 rendered cells resistant to LEN. LEN induced megakaryocytic differentiation of del(5q) cells followed by cell death that was dependent on calpain activation and CSNK1A1 degradation4,5. We also identified GATA2 as a LEN-responsive gene that is required for LEN-induced megakaryocyte differentiation. Megakaryocytic gene-promoter analyses suggested that LEN-induced degradation of IKZF1 enables a RUNX1-GATA2 complex to drive megakaryocytic differentiation. Overexpression of GATA2 restored LEN sensitivity in the context of RUNX1 or TP53 mutations by enhancing LEN-induced megakaryocytic differentiation. Screening for mutations that block LEN-induced megakaryocytic differentiation should identify patients who are resistant to LEN.


Subject(s)
Cell Differentiation/drug effects , Cell Differentiation/genetics , Chromosomes, Human, Pair 5/genetics , Lenalidomide/pharmacology , Megakaryocytes/drug effects , Myelodysplastic Syndromes/genetics , Cell Line , Chromosomes, Human, Pair 5/drug effects , Core Binding Factor Alpha 2 Subunit/genetics , Down-Regulation/drug effects , Down-Regulation/genetics , GATA2 Transcription Factor/genetics , HEK293 Cells , Humans , Mutation/drug effects , Mutation/genetics , Tumor Suppressor Protein p53/genetics
6.
J Mol Diagn ; 22(2): 141-146, 2020 02.
Article in English | MEDLINE | ID: mdl-31837431

ABSTRACT

Sample tracking and identity are essential when processing multiple samples in parallel. Sequencing applications often involve high sample numbers, and the data are frequently used in a clinical setting. As such, a simple and accurate intrinsic sample tracking process through a sequencing pipeline is essential. Various solutions have been implemented to verify sample identity, including variant detection at the start and end of the pipeline using arrays or genotyping, bioinformatic comparisons, and optical barcoding of samples. None of these approaches are optimal. To establish a more effective approach using genetic barcoding, we developed a panel of unique DNA sequences cloned into a common vector. A unique DNA sequence is added to the sample when it is first received and can be detected by PCR and/or sequencing at any stage of the process. The control sequences are approximately 200 bases long with low identity to any sequence in the National Center for Biotechnology Information nonredundant database (<30 bases) and contain no long homopolymer (>7) stretches. When a spiked next-generation sequencing library is sequenced, sequence reads derived from this control sequence are generated along with the standard sequencing run and are used to confirm sample identity and determine cross-contamination levels. This approach is used in our targeted clinical diagnostic whole-genome and RNA-sequencing pipelines and is an inexpensive, flexible, and platform-agnostic solution.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/standards , Computational Biology , DNA Contamination , Databases, Nucleic Acid , Gene Library , Humans , Reference Standards , Reproducibility of Results , Sequence Analysis, DNA
7.
Int J Lab Hematol ; 41 Suppl 1: 117-125, 2019 May.
Article in English | MEDLINE | ID: mdl-31069982

ABSTRACT

Clinical genetic testing in the myeloid malignancies is undergoing a rapid transition from the era of cytogenetics and single-gene testing to an era dominated by next-generation sequencing (NGS). This transition promises to better reveal the genetic alterations underlying disease, but there are distinct risks and benefits associated with different NGS testing platforms. NGS offers the potential benefit of being able to survey alterations across a wider set of genes, but analytic and clinical challenges associated with incidental findings, germ line variation, turnaround time, and limits of detection must be addressed. Additionally, transcriptome-based testing may offer several distinct benefits beyond traditional DNA-based methods. In addition to testing at disease diagnosis, research indicates potential benefits of genetic testing both prior to disease onset and at remission. In this review, we discuss the transition from the era of cytogenetics and single-gene tests to the era of NGS panels and genome-wide sequencing-highlighting both the potential and drawbacks of these novel technologies.


Subject(s)
Biomarkers, Tumor/genetics , Genetic Predisposition to Disease , Genetic Testing/methods , Genomics/methods , Hematologic Neoplasms/genetics , Myeloproliferative Disorders/genetics , Sequence Analysis, DNA/methods , Humans
8.
J Mol Diagn ; 21(4): 705-717, 2019 07.
Article in English | MEDLINE | ID: mdl-31055024

ABSTRACT

Formalin fixation is the standard method for the preservation of tissue for diagnostic purposes, including pathologic review and molecular assays. However, this method is known to cause artifacts that can affect the accuracy of molecular genetic test results. We assessed the applicability of alternative fixatives to determine whether these perform significantly better on next-generation sequencing assays, and whether adequate morphology is retained for primary diagnosis, in a prospective study using a clinical-grade, laboratory-developed targeted resequencing assay. Several parameters relating to sequencing quality and variant calling were examined and quantified in tumor and normal colon epithelial tissues. We identified an alternative fixative that suppresses many formalin-related artifacts while retaining adequate morphology for pathologic review.


Subject(s)
High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA , Tissue Fixation , High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/standards , Humans , Immunohistochemistry , Paraffin Embedding , Polymorphism, Single Nucleotide , Sequence Analysis, DNA/methods , Sequence Analysis, DNA/standards
9.
Sci Rep ; 8(1): 6951, 2018 05 03.
Article in English | MEDLINE | ID: mdl-29725024

ABSTRACT

Network analysis is the preferred approach for the detection of subtle but coordinated changes in expression of an interacting and related set of genes. We introduce a novel method based on the analyses of coexpression networks and Bayesian networks, and we use this new method to classify two types of hematological malignancies; namely, acute myeloid leukemia (AML) and myelodysplastic syndrome (MDS). Our classifier has an accuracy of 93%, a precision of 98%, and a recall of 90% on the training dataset (n = 366); which outperforms the results reported by other scholars on the same dataset. Although our training dataset consists of microarray data, our model has a remarkable performance on the RNA-Seq test dataset (n = 74, accuracy = 89%, precision = 88%, recall = 98%), which confirms that eigengenes are robust with respect to expression profiling technology. These signatures are useful in classification and correctly predicting the diagnosis. They might also provide valuable information about the underlying biology of diseases. Our network analysis approach is generalizable and can be useful for classifying other diseases based on gene expression profiles. Our previously published Pigengene package is publicly available through Bioconductor, which can be used to conveniently fit a Bayesian network to gene expression data.


Subject(s)
Hematologic Neoplasms/genetics , Leukemia, Myeloid, Acute/genetics , Myelodysplastic Syndromes/genetics , Transcriptome , Bayes Theorem , Gene Expression Regulation, Neoplastic , Gene Regulatory Networks , Hematologic Neoplasms/diagnosis , Humans , Leukemia, Myeloid, Acute/diagnosis , Myelodysplastic Syndromes/diagnosis , Sequence Analysis, RNA
10.
Pac Symp Biocomput ; : 347-58, 2015.
Article in English | MEDLINE | ID: mdl-25592595

ABSTRACT

In eukaryotic cells, alternative cleavage of 3' untranslated regions (UTRs) can affect transcript stability, transport and translation. For polyadenylated (poly(A)) transcripts, cleavage sites can be characterized with short-read sequencing using specialized library construction methods. However, for large-scale cohort studies as well as for clinical sequencing applications, it is desirable to characterize such events using RNA-seq data, as the latter are already widely applied to identify other relevant information, such as mutations, alternative splicing and chimeric transcripts. Here we describe KLEAT, an analysis tool that uses de novo assembly of RNA-seq data to characterize cleavage sites on 3' UTRs. We demonstrate the performance of KLEAT on three cell line RNA-seq libraries constructed and sequenced by the ENCODE project, and assembled using Trans-ABySS. Validating the KLEAT predictions with matched ENCODE RNA-seq and RNA-PET libraries, we show that the tool has over 90% positive predictive value when there are at least three RNA-seq reads supporting a poly(A) tail and requiring at least three RNA-PET reads mapping within 100 nucleotides as validation. We also compare the performance of KLEAT with other popular RNA-seq analysis pipelines that reconstruct 3' UTR ends, and show that it performs favourably, based on an ROC-like curve.


Subject(s)
Transcriptome , 3' Untranslated Regions , Binding Sites , Cell Line , Computational Biology , Gene Library , Humans , ROC Curve , Sequence Alignment/statistics & numerical data , Sequence Analysis, RNA/statistics & numerical data
11.
J Mol Diagn ; 15(6): 796-809, 2013 Nov.
Article in English | MEDLINE | ID: mdl-24094589

ABSTRACT

Individuals who inherit mutations in BRCA1 or BRCA2 are predisposed to breast and ovarian cancers. However, identifying mutations in these large genes by conventional dideoxy sequencing in a clinical testing laboratory is both time consuming and costly, and similar challenges exist for other large genes, or sets of genes, with relevance in the clinical setting. Second-generation sequencing technologies have the potential to improve the efficiency and throughput of clinical diagnostic sequencing, once clinically validated methods become available. We have developed a method for detection of variants based on automated small-amplicon PCR followed by sample pooling and sequencing with a second-generation instrument. To demonstrate the suitability of this method for clinical diagnostic sequencing, we analyzed the coding exons and the intron-exon boundaries of BRCA1 and BRCA2 in 91 hereditary breast cancer patient samples. Our method generated high-quality sequence coverage across all targeted regions, with median coverage greater than 4000-fold for each sample in pools of 24. Sensitive and specific automated variant detection, without false-positive or false-negative results, was accomplished with a standard software pipeline using bwa for sequence alignment and samtools for variant detection. We experimentally derived a minimum threshold of 100-fold sequence depth for confident variant detection. The results demonstrate that this method is suitable for sensitive, automatable, high-throughput sequence variant detection in the clinical laboratory.


Subject(s)
DNA Mutational Analysis/methods , Genes, BRCA1 , Genes, BRCA2 , Hereditary Breast and Ovarian Cancer Syndrome/genetics , Base Sequence , Gene Frequency , Gene Library , High-Throughput Nucleotide Sequencing , Humans , Prospective Studies , Sensitivity and Specificity
12.
BMC Genomics ; 14: 550, 2013 Aug 14.
Article in English | MEDLINE | ID: mdl-23941359

ABSTRACT

BACKGROUND: Chimeric transcripts, including partial and internal tandem duplications (PTDs, ITDs) and gene fusions, are important in the detection, prognosis, and treatment of human cancers. RESULTS: We describe Barnacle, a production-grade analysis tool that detects such chimeras in de novo assemblies of RNA-seq data, and supports prioritizing them for review and validation by reporting the relative coverage of co-occurring chimeric and wild-type transcripts. We demonstrate applications in large-scale disease studies, by identifying PTDs in MLL, ITDs in FLT3, and reciprocal fusions between PML and RARA, in two deeply sequenced acute myeloid leukemia (AML) RNA-seq datasets. CONCLUSIONS: Our analyses of real and simulated data sets show that, with appropriate filter settings, Barnacle makes highly specific predictions for three types of chimeric transcripts that are important in a range of cancers: PTDs, ITDs, and fusions. High specificity makes manual review and validation efficient, which is necessary in large-scale disease studies. Characterizing an extended range of chimera types will help generate insights into progression, treatment, and outcomes for complex diseases.


Subject(s)
Gene Duplication/genetics , Gene Expression Profiling/methods , Gene Fusion/genetics , Genomics , Breast Neoplasms/genetics , Exons/genetics , Humans , Leukemia, Myeloid, Acute/genetics , Molecular Sequence Annotation , RNA, Messenger/genetics , Statistics as Topic
13.
Gigascience ; 2(1): 10, 2013 Jul 22.
Article in English | MEDLINE | ID: mdl-23870653

ABSTRACT

BACKGROUND: The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. RESULTS: In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. CONCLUSIONS: Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.

14.
Genome Biol ; 14(3): R27, 2013 Mar 27.
Article in English | MEDLINE | ID: mdl-23537049

ABSTRACT

BACKGROUND: The mountain pine beetle, Dendroctonus ponderosae Hopkins, is the most serious insect pest of western North American pine forests. A recent outbreak destroyed more than 15 million hectares of pine forests, with major environmental effects on forest health, and economic effects on the forest industry. The outbreak has in part been driven by climate change, and will contribute to increased carbon emissions through decaying forests. RESULTS: We developed a genome sequence resource for the mountain pine beetle to better understand the unique aspects of this insect's biology. A draft de novo genome sequence was assembled from paired-end, short-read sequences from an individual field-collected male pupa, and scaffolded using mate-paired, short-read genomic sequences from pooled field-collected pupae, paired-end short-insert whole-transcriptome shotgun sequencing reads of mRNA from adult beetle tissues, and paired-end Sanger EST sequences from various life stages. We describe the cytochrome P450, glutathione S-transferase, and plant cell wall-degrading enzyme gene families important to the survival of the mountain pine beetle in its harsh and nutrient-poor host environment, and examine genome-wide single-nucleotide polymorphism variation. A horizontally transferred bacterial sucrose-6-phosphate hydrolase was evident in the genome, and its tissue-specific transcription suggests a functional role for this beetle. CONCLUSIONS: Despite Coleoptera being the largest insect order with over 400,000 described species, including many agricultural and forest pest species, this is only the second genome sequence reported in Coleoptera, and will provide an important resource for the Curculionoidea and other insects.


Subject(s)
Coleoptera/genetics , Ecosystem , Forests , Genome, Insect/genetics , Animals , Cell Wall/metabolism , Coleoptera/enzymology , Female , Gene Transfer, Horizontal/genetics , Genetic Linkage , Heterozygote , Male , Multigene Family , Phylogeny , Plant Cells/metabolism , Polymorphism, Single Nucleotide/genetics , Repetitive Sequences, Nucleic Acid/genetics , Sequence Homology, Nucleic Acid , Sex Chromosomes/genetics , Synteny/genetics
15.
Insect Biochem Mol Biol ; 42(8): 525-36, 2012 Aug.
Article in English | MEDLINE | ID: mdl-22516182

ABSTRACT

Bark beetles (Coleoptera: Curculionidae: Scolytinae) are major insect pests of many woody plants around the world. The mountain pine beetle (MPB), Dendroctonus ponderosae Hopkins, is a significant historical pest of western North American pine forests. It is currently devastating pine forests in western North America--particularly in British Columbia, Canada--and is beginning to expand its host range eastward into the Canadian boreal forest, which extends to the Atlantic coast of North America. Limited genomic resources are available for this and other bark beetle pests, restricting the use of genomics-based information to help monitor, predict, and manage the spread of these insects. To overcome these limitations, we generated comprehensive transcriptome resources from fourteen full-length enriched cDNA libraries through paired-end Sanger sequencing of 100,000 cDNA clones, and single-end Roche 454 pyrosequencing of three of these cDNA libraries. Hybrid de novo assembly of the 3.4 million sequences resulted in 20,571 isotigs in 14,410 isogroups and 246,848 singletons. In addition, over 2300 non-redundant full-length cDNA clones putatively containing complete open reading frames, including 47 cytochrome P450s, were sequenced fully to high quality. This first large-scale genomics resource for bark beetles provides the relevant sequence information for gene discovery; functional and population genomics; comparative analyses; and for future efforts to annotate the MPB genome. These resources permit the study of this beetle at the molecular level and will inform research in other Dendroctonus spp. and more generally in the Curculionidae and other Coleoptera.


Subject(s)
Coleoptera/genetics , Pinus/parasitology , Transcriptome , 3' Untranslated Regions , 5' Untranslated Regions , Animals , Arthropod Antennae/metabolism , Coleoptera/metabolism , Cytochrome P-450 Enzyme System/metabolism , Fat Body/metabolism , Female , Male , Multigene Family , Open Reading Frames , Sequence Analysis, DNA
16.
Genome Res ; 21(12): 2224-41, 2011 Dec.
Article in English | MEDLINE | ID: mdl-21926179

ABSTRACT

Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.


Subject(s)
Genome/physiology , Genomics/methods , Sequence Analysis, DNA/methods
17.
BMC Genomics ; 12: 450, 2011 Sep 16.
Article in English | MEDLINE | ID: mdl-21923906

ABSTRACT

BACKGROUND: As scientists continue to pursue various 'omics-based research, there is a need for high quality data for the most fundamental 'omics of all: genomics. The bacterium Paenibacillus larvae is the causative agent of the honey bee disease American foulbrood. If untreated, it can lead to the demise of an entire hive; the highly social nature of bees also leads to easy disease spread, between both individuals and colonies. Biologists have studied this organism since the early 1900s, and a century later, the molecular mechanism of infection remains elusive. Transcriptomics and proteomics, because of their ability to analyze multiple genes and proteins in a high-throughput manner, may be very helpful to its study. However, the power of these methodologies is severely limited without a complete genome; we undertake to address that deficiency here. RESULTS: We used the Illumina GAIIx platform and conventional Sanger sequencing to generate a 182-fold sequence coverage of the P. larvae genome, and assembled the data using ABySS into a total of 388 contigs spanning 4.5 Mbp. Comparative genomics analysis against fully-sequenced soil bacteria P. JDR2 and P. vortex showed that regions of poor conservation may contain putative virulence factors. We used GLIMMER to predict 3568 gene models, and named them based on homology revealed by BLAST searches; proteases, hemolytic factors, toxins, and antibiotic resistance enzymes were identified in this way. Finally, mass spectrometry was used to provide experimental evidence that at least 35% of the genes are expressed at the protein level. CONCLUSIONS: This update on the genome of P. larvae and annotation represents an immense advancement from what we had previously known about this species. We provide here a reliable resource that can be used to elucidate the mechanism of infection, and by extension, more effective methods to control and cure this widespread honey bee disease.


Subject(s)
Bees/microbiology , Genome, Bacterial , Paenibacillus/genetics , Animals , Comparative Genomic Hybridization , Computational Biology , DNA, Bacterial/genetics , Molecular Sequence Annotation , Proteomics , Sequence Analysis, DNA
18.
BMC Genomics ; 11: 536, 2010 Oct 04.
Article in English | MEDLINE | ID: mdl-20920358

ABSTRACT

BACKGROUND: Grosmannia clavigera is a bark beetle-vectored fungal pathogen of pines that causes wood discoloration and may kill trees by disrupting nutrient and water transport. Trees respond to attacks from beetles and associated fungi by releasing terpenoid and phenolic defense compounds. It is unclear which genes are important for G. clavigera's ability to overcome antifungal pine terpenoids and phenolics. RESULTS: We constructed seven cDNA libraries from eight G. clavigera isolates grown under various culture conditions, and Sanger sequenced the 5' and 3' ends of 25,000 cDNA clones, resulting in 44,288 high quality ESTs. The assembled dataset of unique transcripts (unigenes) consists of 6,265 contigs and 2,459 singletons that mapped to 6,467 locations on the G. clavigera reference genome, representing ~70% of the predicted G. clavigera genes. Although only 54% of the unigenes matched characterized proteins at the NCBI database, this dataset extensively covers major metabolic pathways, cellular processes, and genes necessary for response to environmental stimuli and genetic information processing. Furthermore, we identified genes expressed in spores prior to germination, and genes involved in response to treatment with lodgepole pine phloem extract (LPPE). CONCLUSIONS: We provide a comprehensively annotated EST dataset for G. clavigera that represents a rich resource for gene characterization in this and other ophiostomatoid fungi. Genes expressed in response to LPPE treatment are indicative of fungal oxidative stress response. We identified two clusters of potentially functionally related genes responsive to LPPE treatment. Furthermore, we report a simple method for identifying contig misassemblies in de novo assembled EST collections caused by gene overlap on the genome.


Subject(s)
Coleoptera/microbiology , Genes, Fungal/genetics , Insect Vectors/microbiology , Ophiostomatales/genetics , Pinus/microbiology , Plant Bark/microbiology , Trees/microbiology , Animals , Coleoptera/drug effects , Databases, Genetic , Expressed Sequence Tags , Gene Expression Regulation, Fungal/drug effects , Gene Library , Insect Vectors/drug effects , Metabolic Networks and Pathways/drug effects , Metabolic Networks and Pathways/genetics , Mycelium/drug effects , Mycelium/genetics , Ophiostomatales/drug effects , Ophiostomatales/isolation & purification , Phloem/chemistry , Phloem/drug effects , Pinus/drug effects , Plant Bark/drug effects , Plant Extracts/pharmacology , RNA, Messenger/genetics , RNA, Messenger/metabolism , Reverse Transcriptase Polymerase Chain Reaction , Spores, Fungal/drug effects , Spores, Fungal/genetics , Trees/drug effects
19.
Proc Natl Acad Sci U S A ; 107(38): 16589-94, 2010 Sep 21.
Article in English | MEDLINE | ID: mdl-20807748

ABSTRACT

The Pleiades Promoter Project integrates genomewide bioinformatics with large-scale knockin mouse production and histological examination of expression patterns to develop MiniPromoters and related tools designed to study and treat the brain by directed gene expression. Genes with brain expression patterns of interest are subjected to bioinformatic analysis to delineate candidate regulatory regions, which are then incorporated into a panel of compact human MiniPromoters to drive expression to brain regions and cell types of interest. Using single-copy, homologous-recombination "knockins" in embryonic stem cells, each MiniPromoter reporter is integrated immediately 5' of the Hprt locus in the mouse genome. MiniPromoter expression profiles are characterized in differentiation assays of the transgenic cells or in mouse brains following transgenic mouse production. Histological examination of adult brains, eyes, and spinal cords for reporter gene activity is coupled to costaining with cell-type-specific markers to define expression. The publicly available Pleiades MiniPromoter Project is a key resource to facilitate research on brain development and therapies.


Subject(s)
Brain/metabolism , Promoter Regions, Genetic , Regulatory Sequences, Nucleic Acid , Animals , Cell Differentiation/genetics , Computational Biology , Databases, Genetic , Embryonic Stem Cells/cytology , Embryonic Stem Cells/metabolism , Gene Expression , Gene Expression Profiling/statistics & numerical data , Gene Knock-In Techniques , Genes, Reporter , Genomics , Humans , Mice , Mice, Transgenic , Neurons/cytology , Neurons/metabolism
20.
Genome Biol ; 10(9): R94, 2009.
Article in English | MEDLINE | ID: mdl-19747388

ABSTRACT

Sequencing-by-synthesis technologies can reduce the cost of generating de novo genome assemblies. We report a method for assembling draft genome sequences of eukaryotic organisms that integrates sequence information from different sources, and demonstrate its effectiveness by assembling an approximately 32.5 Mb draft genome sequence for the forest pathogen Grosmannia clavigera, an ascomycete fungus. We also developed a method for assessing draft assemblies using Illumina paired end read data and demonstrate how we are using it to guide future sequence finishing. Our results demonstrate that eukaryotic genome sequences can be accurately assembled by combining Illumina, 454 and Sanger sequence data.


Subject(s)
Ascomycota/genetics , Genome, Fungal/genetics , Sequence Analysis, DNA/methods , Algorithms , Fungal Proteins/genetics , Genomics/methods , Open Reading Frames/genetics , Reproducibility of Results
SELECTION OF CITATIONS
SEARCH DETAIL
...