Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 109
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Nat Rev Genet ; 24(8): 573-584, 2023 08.
Article in English | MEDLINE | ID: mdl-37258725

ABSTRACT

The use of genomics is firmly established in clinical practice, resulting in innovations across a wide range of disciplines such as genetic screening, rare disease diagnosis and molecularly guided therapy choice. This new field of genomic medicine has led to improvements in patient outcomes. However, most clinical applications of genomics rely on information generated from bulk approaches, which do not directly capture the genomic variation that underlies cellular heterogeneity. With the advent of single-cell technologies, research is rapidly uncovering how genomic data at cellular resolution can be used to understand disease pathology and mechanisms. Both DNA-based and RNA-based single-cell technologies have the potential to improve existing clinical applications and open new application spaces for genomics in clinical practice, with oncology, immunology and haematology poised for initial adoption. However, challenges in translating cellular genomics from research to a clinical setting must first be overcome.


Subject(s)
Genetic Testing , Genomics , Humans , Genomics/methods , Precision Medicine/methods
2.
Nature ; 604(7906): 437-446, 2022 04.
Article in English | MEDLINE | ID: mdl-35444317

ABSTRACT

The human reference genome is the most widely used resource in human genetics and is due for a major update. Its current structure is a linear composite of merged haplotypes from more than 20 people, with a single individual comprising most of the sequence. It contains biases and errors within a framework that does not represent global human genomic variation. A high-quality reference with global representation of common variants, including single-nucleotide variants, structural variants and functional elements, is needed. The Human Pangenome Reference Consortium aims to create a more sophisticated and complete human reference genome with a graph-based, telomere-to-telomere representation of global genomic diversity. Here we leverage innovations in technology, study design and global partnerships with the goal of constructing the highest-possible quality human pangenome reference. Our goal is to improve data representation and streamline analyses to enable routine assembly of complete diploid genomes. With attention to ethical frameworks, the human pangenome reference will contain a more accurate and diverse representation of global genomic variation, improve gene-disease association studies across populations, expand the scope of genomics research to the most repetitive and polymorphic regions of the genome, and serve as the ultimate genetic resource for future biomedical research and precision medicine.


Subject(s)
Genome, Human , Genomics , Genome, Human/genetics , Haplotypes/genetics , High-Throughput Nucleotide Sequencing , Humans , Sequence Analysis, DNA
3.
Proc Natl Acad Sci U S A ; 121(31): e2322834121, 2024 Jul 30.
Article in English | MEDLINE | ID: mdl-39042694

ABSTRACT

We developed a generally applicable method, CRISPR/Cas9-targeted long-read sequencing (CTLR-Seq), to resolve, haplotype-specifically, the large and complex regions in the human genome that had been previously impenetrable to sequencing analysis, such as large segmental duplications (SegDups) and their associated genome rearrangements. CTLR-Seq combines in vitro Cas9-mediated cutting of the genome and pulse-field gel electrophoresis to isolate intact large (i.e., up to 2,000 kb) genomic regions that encompass previously unresolvable genomic sequences. These targets are then sequenced (amplification-free) at high on-target coverage using long-read sequencing, allowing for their complete sequence assembly. We applied CTLR-Seq to the SegDup-mediated rearrangements that constitute the boundaries of, and give rise to, the 22q11.2 Deletion Syndrome (22q11DS), the most common human microdeletion disorder. We then performed de novo assembly to resolve, at base-pair resolution, the full sequence rearrangements and exact chromosomal breakpoints of 22q11.2DS (including all common subtypes). Across multiple patients, we found a high degree of variability for both the rearranged SegDup sequences and the exact chromosomal breakpoint locations, which coincide with various transposons within the 22q11.2 SegDups, suggesting that 22q11DS can be driven by transposon-mediated genome recombination. Guided by CTLR-Seq results from two 22q11DS patients, we performed three-dimensional chromosomal folding analysis for the 22q11.2 SegDups from patient-derived neurons and astrocytes and found chromosome interactions anchored within the SegDups to be both cell type-specific and patient-specific. Lastly, we demonstrated that CTLR-Seq enables cell-type specific analysis of DNA methylation patterns within the deletion haplotype of 22q11DS.


Subject(s)
DiGeorge Syndrome , Humans , DiGeorge Syndrome/genetics , CRISPR-Cas Systems , Chromosome Breakpoints , Chromosomes, Human, Pair 22/genetics , Genome, Human , Gene Rearrangement , Sequence Analysis, DNA/methods , Chromosome Deletion
4.
Blood ; 142(26): 2296-2304, 2023 12 28.
Article in English | MEDLINE | ID: mdl-37683139

ABSTRACT

ABSTRACT: An early event in the genesis of follicular lymphoma (FL) is the acquisition of new glycosylation motifs in the B-cell receptor (BCR) due to gene rearrangement and/or somatic hypermutation. These N-linked glycosylation motifs (N-motifs) contain mannose-terminated glycans and can interact with lectins in the tumor microenvironment, activating the tumor BCR pathway. N-motifs are stable during FL evolution, suggesting that FL tumor cells are dependent on them for their survival. Here, we investigated the dynamics and potential impact of N-motif prevalence in FL at the single-cell level across distinct tumor sites and over time in 17 patients. Although most patients had acquired at least 1 N-motif as an early event, we also found (1) cases without N-motifs in the heavy or light chains at any tumor site or time point and (2) cases with discordant N-motif patterns across different tumor sites. Inferring phylogenetic trees of the patients with discordant patterns, we observed that both N-motif-positive and N-motif-negative tumor subclones could be selected and expanded during tumor evolution. Comparing N-motif-positive with N-motif-negative tumor cells within a patient revealed higher expression of genes involved in the BCR pathway and inflammatory response, whereas tumor cells without N-motifs had higher activity of pathways involved in energy metabolism. In conclusion, although acquired N-motifs likely support FL pathogenesis through antigen-independent BCR signaling in most patients with FL, N-motif-negative tumor cells can also be selected and expanded and may depend more heavily on altered metabolism for competitive survival.


Subject(s)
Lymphoma, Follicular , Humans , Lymphoma, Follicular/pathology , Glycosylation , Phylogeny , Receptors, Antigen, B-Cell/genetics , Receptors, Antigen, B-Cell/metabolism , Lectins , Tumor Microenvironment
5.
Nucleic Acids Res ; 50(W1): W448-W453, 2022 07 05.
Article in English | MEDLINE | ID: mdl-35474383

ABSTRACT

K-mers are short DNA sequences that are used for genome sequence analysis. Applications that use k-mers include genome assembly and alignment. However, the wider bioinformatic use of these short sequences has challenges related to the massive scale of genomic sequence data. A single human genome assembly has billions of k-mers. As a result, the computational requirements for analyzing k-mer information is enormous, particularly when involving complete genome assemblies. To address these issues, we developed a new indexing data structure based on a hash table tuned for the lookup of short sequence keys. This web application, referred to as KmerKeys, provides performant, rapid query speeds for cloud computation on genome assemblies. We enable fuzzy as well as exact sequence searches of assemblies. To enable robust and speedy performance, the website implements cache-friendly hash tables, memory mapping and massive parallel processing. Our method employs a scalable and efficient data structure that can be used to jointly index and search a large collection of human genome assembly information. One can include variant databases and their associated metadata such as the gnomAD population variant catalogue. This feature enables the incorporation of future genomic information into sequencing analysis. KmerKeys is freely accessible at https://kmerkeys.dgi-stanford.org.


Subject(s)
Algorithms , Sequence Analysis, DNA , Software , Humans , Genome, Human , Genomics/methods , Sequence Analysis, DNA/methods
6.
Anal Chem ; 95(20): 7872-7879, 2023 05 23.
Article in English | MEDLINE | ID: mdl-37183373

ABSTRACT

We report an amplification-free genotyping method to determine the number of human short tandem repeats (STRs). DNA-based STR profiling is a robust method for genetic identification purposes such as forensics and biobanking and for identifying specific molecular subtypes of cancer. STR detection requires polymerase amplification, which introduces errors that obscure the correct genotype. We developed a new method that requires no polymerase. First, we synthesized perylene-nucleoside reagents and incorporated them into oligonucleotide probes that recognize five common human STRs. Using these probes and a bead-based hybridization approach, accurate STR detection was achieved in only 1.5 h, including DNA preparation steps, with up to a 1000-fold target DNA enrichment. This method was comparable to PCR-based assays. Using standard fluorometry, the limit of detection was 2.00 ± 0.07 pM for a given target. We used this assay to accurately identify STRs from 50 human subjects, achieving >98% consensus with sequencing data for STR genotyping.


Subject(s)
DNA Fingerprinting , Perylene , Humans , DNA Fingerprinting/methods , Oligonucleotides , Biological Specimen Banks , Microsatellite Repeats , DNA/genetics , Genotype
7.
Blood ; 137(21): 2869-2880, 2021 05 27.
Article in English | MEDLINE | ID: mdl-33728464

ABSTRACT

Tumor heterogeneity complicates biomarker development and fosters drug resistance in solid malignancies. In lymphoma, our knowledge of site-to-site heterogeneity and its clinical implications is still limited. Here, we profiled 2 nodal, synchronously acquired tumor samples from 10 patients with follicular lymphoma (FL) using single-cell RNA, B-cell receptor (BCR) and T-cell receptor sequencing, and flow cytometry. By following the rapidly mutating tumor immunoglobulin genes, we discovered that BCR subclones were shared between the 2 tumor sites in some patients, but in many patients, the disease had evolved separately with limited tumor cell migration between the sites. Patients exhibiting divergent BCR evolution also exhibited divergent tumor gene-expression and cell-surface protein profiles. While the overall composition of the tumor microenvironment did not differ significantly between sites, we did detect a specific correlation between site-to-site tumor heterogeneity and T follicular helper (Tfh) cell abundance. We further observed enrichment of particular ligand-receptor pairs between tumor and Tfh cells, including CD40 and CD40LG, and a significant correlation between tumor CD40 expression and Tfh proliferation. Our study may explain discordant responses to systemic therapies, underscores the difficulty of capturing a patient's disease with a single biopsy, and furthers our understanding of tumor-immune networks in FL.


Subject(s)
Clonal Evolution/genetics , Lymphoma, Follicular/pathology , Single-Cell Analysis , Adult , Aged , Antigens, Neoplasm/biosynthesis , Antigens, Neoplasm/genetics , Biopsy, Fine-Needle , CD40 Antigens/biosynthesis , CD40 Antigens/genetics , CD40 Ligand/biosynthesis , CD40 Ligand/genetics , DNA, Neoplasm/genetics , Disease Progression , Female , Flow Cytometry , Gene Rearrangement, B-Lymphocyte, Light Chain , Gene Rearrangement, T-Lymphocyte , Humans , Lymph Nodes/chemistry , Lymph Nodes/ultrastructure , Lymphocytes, Tumor-Infiltrating/immunology , Lymphoma, Follicular/chemistry , Lymphoma, Follicular/genetics , Male , Middle Aged , Neoplasm Proteins/biosynthesis , Neoplasm Proteins/genetics , Phylogeny , RNA, Neoplasm/genetics , Sequence Alignment , Sequence Homology, Nucleic Acid , T Follicular Helper Cells/immunology , T Follicular Helper Cells/metabolism , Transcriptome , Tumor Microenvironment
8.
Mol Ther ; 30(1): 32-46, 2022 01 05.
Article in English | MEDLINE | ID: mdl-34091053

ABSTRACT

CRISPR-Cas9 is rapidly entering molecular biology and biomedicine as a promising gene-editing tool. A unique feature of CRISPR-Cas9 is a single-guide RNA directing a Cas9 nuclease toward its genomic target. Herein, we highlight new approaches for improving cellular uptake and endosomal escape of CRISPR-Cas9. As opposed to other recently published works, this review is focused on non-viral carriers as a means to facilitate the cellular uptake of CRISPR-Cas9 through endocytosis. The majority of non-viral carriers, such as gold nanoparticles, polymer nanoparticles, lipid nanoparticles, and nanoscale zeolitic imidazole frameworks, is developed with a focus toward optimizing the endosomal escape of CRISPR-Cas9 by taking advantage of the acidic environment in the late endosomes. Among the most broadly used methods for in vitro and ex vivo ribonucleotide protein transfection are electroporation and microinjection. Thus, other delivery formats are warranted for in vivo delivery of CRISPR-Cas9. Herein, we specifically revise the use of peptide and nanoparticle-based systems as platforms for CRISPR-Cas9 delivery in vivo. Finally, we highlight future perspectives of the CRISPR-Cas9 gene-editing tool and the prospects of using non-viral vectors to improve its bioavailability and therapeutic potential.


Subject(s)
CRISPR-Cas Systems , Metal Nanoparticles , Endosomes/metabolism , Gene Editing/methods , Gold/metabolism , Liposomes , Nanoparticles
9.
Clin Gastroenterol Hepatol ; 20(4): 950-952.e3, 2022 04.
Article in English | MEDLINE | ID: mdl-33434656

ABSTRACT

Early identification of gastric precancerous lesions, including atrophic gastritis (AG) and intestinal metaplasia (IM), may improve gastric cancer detection and prevention. Because AG and IM are generally asymptomatic, many of the estimated 15 million Americans who carry these lesions remain undiagnosed.1 AG and IM are associated with either active or prior Helicobacter pylori (Hp) infection. Hp infection leads to perturbations in the serum concentration of gastric hormones pepsinogen I (PGI), pepsinogen II, the pepsinogen I/II ratio (PGR), gastrin-17 (G-17), and Hp IgG.2,3 In East Asia and other regions with high burden of Hp infection and gastric cancer, these biomarkers have been used as screening tools for AG and IM.4 However, there exists limited data on the sensitivity and discrimination of these serologic markers in low-Hp-prevalence populations, such as the United States.


Subject(s)
Helicobacter pylori , Precancerous Conditions , Gastrins , Humans , Pepsinogen A , Precancerous Conditions/diagnosis , Precancerous Conditions/pathology , Stomach/pathology , United States/epidemiology
10.
Genome Res ; 29(3): 472-484, 2019 03.
Article in English | MEDLINE | ID: mdl-30737237

ABSTRACT

K562 is widely used in biomedical research. It is one of three tier-one cell lines of ENCODE and also most commonly used for large-scale CRISPR/Cas9 screens. Although its functional genomic and epigenomic characteristics have been extensively studied, its genome sequence and genomic structural features have never been comprehensively analyzed. Such information is essential for the correct interpretation and understanding of the vast troves of existing functional genomics and epigenomics data for K562. We performed and integrated deep-coverage whole-genome (short-insert), mate-pair, and linked-read sequencing as well as karyotyping and array CGH analysis to identify a wide spectrum of genome characteristics in K562: copy numbers (CN) of aneuploid chromosome segments at high-resolution, SNVs and indels (both corrected for CN in aneuploid regions), loss of heterozygosity, megabase-scale phased haplotypes often spanning entire chromosome arms, structural variants (SVs), including small and large-scale complex SVs and nonreference retrotransposon insertions. Many SVs were phased, assembled, and experimentally validated. We identified multiple allele-specific deletions and duplications within the tumor suppressor gene FHIT Taking aneuploidy into account, we reanalyzed K562 RNA-seq and whole-genome bisulfite sequencing data for allele-specific expression and allele-specific DNA methylation. We also show examples of how deeper insights into regulatory complexity are gained by integrating genomic variant information and structural context with functional genomics and epigenomics data. Furthermore, using K562 haplotype information, we produced an allele-specific CRISPR targeting map. This comprehensive whole-genome analysis serves as a resource for future studies that utilize K562 as well as a framework for the analysis of other cancer genomes.


Subject(s)
Genome, Human , Humans , K562 Cells , Karyotype , Polymorphism, Genetic , Whole Genome Sequencing
11.
Blood ; 133(10): 1119-1129, 2019 03 07.
Article in English | MEDLINE | ID: mdl-30591526

ABSTRACT

Follicular lymphoma (FL) is a low-grade B-cell malignancy that transforms into a highly aggressive and lethal disease at a rate of 2% per year. Perfect isolation of the malignant B-cell population from a surgical biopsy is a significant challenge, masking important FL biology, such as immune checkpoint coexpression patterns. To resolve the underlying transcriptional networks of follicular B-cell lymphomas, we analyzed the transcriptomes of 34 188 cells derived from 6 primary FL tumors. For each tumor, we identified normal immune subpopulations and malignant B cells, based on gene expression. We used multicolor flow cytometry analysis of the same tumors to confirm our assignments of cellular lineages and validate our predictions of expressed proteins. Comparison of gene expression between matched malignant and normal B cells from the same patient revealed tumor-specific features. Malignant B cells exhibited restricted immunoglobulin (Ig) light chain expression (either Igκ or Igλ), as well the expected upregulation of the BCL2 gene, but also downregulation of the FCER2, CD52, and major histocompatibility complex class II genes. By analyzing thousands of individual cells per patient tumor, we identified the mosaic of malignant B-cell subclones that coexist within a FL and examined the characteristics of tumor-infiltrating T cells. We identified genes coexpressed with immune checkpoint molecules, such as CEBPA and B2M in regulatory T (Treg) cells, providing a better understanding of the gene networks involved in immune regulation. In summary, parallel measurement of single-cell expression in thousands of tumor cells and tumor-infiltrating lymphocytes can be used to obtain a systems-level view of the tumor microenvironment and identify new avenues for therapeutic development.


Subject(s)
Lymphoma, B-Cell/genetics , Lymphoma, Follicular/genetics , T-Lymphocytes, Regulatory/cytology , Biopsy , CCAAT-Enhancer-Binding Proteins/genetics , CD4-Positive T-Lymphocytes/cytology , CD52 Antigen/genetics , Cell Lineage , Flow Cytometry , Gene Expression Profiling , Gene Expression Regulation, Leukemic , Hematopoietic Stem Cells/cytology , Histocompatibility Antigens Class II/metabolism , Humans , Immune System , Immunoglobulin G , Lectins, C-Type/genetics , Leukocytes, Mononuclear/cytology , Lymphoma, B-Cell/blood , Lymphoma, Follicular/blood , Palatine Tonsil/metabolism , Receptors, IgE/genetics , Sequence Analysis, RNA , Transcriptome , Tumor Microenvironment , beta 2-Microglobulin/genetics
12.
Nucleic Acids Res ; 47(19): e115, 2019 11 04.
Article in English | MEDLINE | ID: mdl-31350896

ABSTRACT

The human genome is composed of two haplotypes, otherwise called diplotypes, which denote phased polymorphisms and structural variations (SVs) that are derived from both parents. Diplotypes place genetic variants in the context of cis-related variants from a diploid genome. As a result, they provide valuable information about hereditary transmission, context of SV, regulation of gene expression and other features which are informative for understanding human genetics. Successful diplotyping with short read whole genome sequencing generally requires either a large population or parent-child trio samples. To overcome these limitations, we developed a targeted sequencing method for generating megabase (Mb)-scale haplotypes with short reads. One selects specific 0.1-0.2 Mb high molecular weight DNA targets with custom-designed Cas9-guide RNA complexes followed by sequencing with barcoded linked reads. To test this approach, we designed three assays, targeting the BRCA1 gene, the entire 4-Mb major histocompatibility complex locus and 18 well-characterized SVs, respectively. Using an integrated alignment- and assembly-based approach, we generated comprehensive variant diplotypes spanning the entirety of the targeted loci and characterized SVs with exact breakpoints. Our results were comparable in quality to long read sequencing.


Subject(s)
Genome, Human/genetics , Genomic Structural Variation/genetics , High-Throughput Nucleotide Sequencing/methods , Whole Genome Sequencing/methods , Diploidy , Gene Expression Regulation/genetics , Genetic Association Studies/methods , Haplotypes/genetics , Humans , Sequence Analysis, DNA/methods
13.
Nucleic Acids Res ; 47(8): 3846-3861, 2019 05 07.
Article in English | MEDLINE | ID: mdl-30864654

ABSTRACT

HepG2 is one of the most widely used human cancer cell lines in biomedical research and one of the main cell lines of ENCODE. Although the functional genomic and epigenomic characteristics of HepG2 are extensively studied, its genome sequence has never been comprehensively analyzed and higher order genomic structural features are largely unknown. The high degree of aneuploidy in HepG2 renders traditional genome variant analysis methods challenging and partially ineffective. Correct and complete interpretation of the extensive functional genomics data from HepG2 requires an understanding of the cell line's genome sequence and genome structure. Using a variety of sequencing and analysis methods, we identified a wide spectrum of genome characteristics in HepG2: copy numbers of chromosomal segments at high resolution, SNVs and Indels (corrected for aneuploidy), regions with loss of heterozygosity, phased haplotypes extending to entire chromosome arms, retrotransposon insertions and structural variants (SVs) including complex and somatic genomic rearrangements. A large number of SVs were phased, sequence assembled and experimentally validated. We re-analyzed published HepG2 datasets for allele-specific expression and DNA methylation and assembled an allele-specific CRISPR/Cas9 targeting map. We demonstrate how deeper insights into genomic regulatory complexity are gained by adopting a genome-integrated framework.


Subject(s)
Chromosome Mapping/methods , Genome, Human , Genomics/methods , Haplotypes , Sequence Analysis, DNA/statistics & numerical data , Alleles , Aneuploidy , DNA Methylation , Genomic Structural Variation , Hep G2 Cells , High-Throughput Nucleotide Sequencing , Humans , INDEL Mutation , Karyotyping , Loss of Heterozygosity , Polymorphism, Single Nucleotide , Retroelements
14.
Analyst ; 145(17): 5836-5844, 2020 Aug 24.
Article in English | MEDLINE | ID: mdl-32648858

ABSTRACT

Two types of clinically important nucleic acid biomarkers, microRNA (miRNA) and circulating tumor DNA (ctDNA) were detected and quantified from human serum using an amplification-free fluorescence hybridization assay. Specifically, miRNAs hsa-miR-223-3p and hsa-miR-486-5p with relevance for rheumatoid arthritis and cancer related mutations BRAF and KRAS of ctDNA were directly measured. The required oligonucleotide probes for the assay were rationally designed and synthesized through a novel "clickable" approach which is time and cost-effective. With no need for isolating nucleic acid components from serum, the fluoresence-based assay took only 1 hour. Detection and absolute quantification of targets was successfully achieved despite their notoriously low abundance, with a precision down to individual nucleotides. Obtained miRNA and ctDNA amounts showed overall a good correlation with current techniques. With appropriate probes, our novel assay and signal boosting approach could become a useful tool for point-of-care measuring other low abundance nucleic acid biomarkers.


Subject(s)
Circulating Tumor DNA , MicroRNAs , Nucleic Acids , Biomarkers , Humans , MicroRNAs/genetics , Nucleic Acid Hybridization
15.
Nucleic Acids Res ; 46(4): e19, 2018 02 28.
Article in English | MEDLINE | ID: mdl-29186506

ABSTRACT

Large genomic rearrangements involve inversions, deletions and other structural changes that span Megabase segments of the human genome. This category of genetic aberration is the cause of many hereditary genetic disorders and contributes to pathogenesis of diseases like cancer. We developed a new algorithm called ZoomX for analysing barcode-linked sequence reads-these sequences can be traced to individual high molecular weight DNA molecules (>50 kb). To generate barcode linked sequence reads, we employ a library preparation technology (10X Genomics) that uses droplets to partition and barcode DNA molecules. Using linked read data from whole genome sequencing, we identify large genomic rearrangements, typically greater than 200kb, even when they are only present in low allelic fractions. Our algorithm uses a Poisson scan statistic to identify genomic rearrangement junctions, determine counts of junction-spanning molecules and calculate a Fisher's exact test for determining statistical significance for somatic aberrations. Utilizing a well-characterized human genome, we benchmarked this approach to accurately identify large rearrangement. Subsequently, we demonstrated that our algorithm identifies somatic rearrangements when present in lower allelic fractions as occurs in tumors. We characterized a set of complex cancer rearrangements with multiple classes of structural aberrations and with possible roles in oncogenesis.


Subject(s)
Genomic Structural Variation , Neoplasms/genetics , Whole Genome Sequencing/methods , Algorithms , Chromosome Aberrations , Gastrointestinal Neoplasms/genetics , Genome, Human , Humans
16.
Anal Chem ; 91(3): 1706-1710, 2019 02 05.
Article in English | MEDLINE | ID: mdl-30652472

ABSTRACT

Molecular analysis of DNA samples with limited quantities can be challenging. Repeatedly sequencing the original DNA molecules from a given sample would overcome many issues related to accurate genetic analysis and mitigate issues with processing small amounts of DNA analyte. Moreover, an iterative, replicated analysis of the same DNA molecule has the potential to improve genetic characterization. Herein, we demonstrate that the use of "click"-based attachment of DNA sequencing libraries onto an agarose bead support enables repetitive primer extension assays for specific genomic DNA targets such as gene exons. We validated the performance of this assay for evaluating specific genetic alterations in both normal and cancer reference standard DNA samples. We demonstrate the stability of conjugated DNA libraries and related sequencing results over the course of independent serial assays spanning several months from the same set of samples. Finally, we finally applied this method to DNA derived from a tumor sample and demonstrated improved mutation detection accuracy.


Subject(s)
DNA, Neoplasm/analysis , High-Throughput Nucleotide Sequencing/methods , Cell Line, Tumor , Click Chemistry , Cycloaddition Reaction , DNA, Neoplasm/chemistry , DNA, Neoplasm/genetics , Gene Library , Humans , Mutation , Neoplasms/genetics , Proof of Concept Study , Sepharose/chemistry
17.
Nucleic Acids Res ; 45(19): e162, 2017 Nov 02.
Article in English | MEDLINE | ID: mdl-28977555

ABSTRACT

Genomic instability is a frequently occurring feature of cancer that involves large-scale structural alterations. These somatic changes in chromosome structure include duplication of entire chromosome arms and aneuploidy where chromosomes are duplicated beyond normal diploid content. However, the accurate determination of aneuploidy events in cancer genomes is a challenge. Recent advances in sequencing technology allow the characterization of haplotypes that extend megabases along the human genome using high molecular weight (HMW) DNA. For this study, we employed a library preparation method in which sequence reads have barcodes linked to single HMW DNA molecules. Barcode-linked reads are used to generate extended haplotypes on the order of megabases. We developed a method that leverages haplotypes to identify chromosomal segmental alterations in cancer and uses this information to join haplotypes together, thus extending the range of phased variants. With this approach, we identified mega-haplotypes that encompass entire chromosome arms. We characterized the chromosomal arm changes and aneuploidy events in a manner that offers similar information as a traditional karyotype but with the benefit of DNA sequence resolution. We applied this approach to characterize aneuploidy and chromosomal alterations from a series of primary colorectal cancers.


Subject(s)
Aneuploidy , Haplotypes , Neoplasms/genetics , Chromosome Aberrations , Colorectal Neoplasms/diagnosis , Colorectal Neoplasms/genetics , DNA Mutational Analysis/methods , Genome, Human/genetics , Genomic Instability , High-Throughput Nucleotide Sequencing/methods , Humans , Karyotype , Karyotyping/methods , Neoplasms/diagnosis , Reproducibility of Results , Sensitivity and Specificity
18.
Nucleic Acids Res ; 44(15): e126, 2016 09 06.
Article in English | MEDLINE | ID: mdl-27325742

ABSTRACT

We present SWAN, a statistical framework for robust detection of genomic structural variants in next-generation sequencing data and an analysis of mid-range size insertion and deletions (<10 Kb) for whole genome analysis and DNA mixtures. To identify these mid-range size events, SWAN collectively uses information from read-pair, read-depth and one end mapped reads through statistical likelihoods based on Poisson field models. SWAN also uses soft-clip/split read remapping to supplement the likelihood analysis and determine variant boundaries. The accuracy of SWAN is demonstrated by in silico spike-ins and by identification of known variants in the NA12878 genome. We used SWAN to identify a series of novel set of mid-range insertion/deletion detection that were confirmed by targeted deep re-sequencing. An R package implementation of SWAN is open source and freely available.


Subject(s)
DNA Mutational Analysis/methods , Genome/genetics , Genomics/methods , INDEL Mutation/genetics , Adenoviridae/genetics , Algorithms , Animals , Benchmarking , Computer Simulation , Datasets as Topic , Pan troglodytes/virology , Poisson Distribution , Reproducibility of Results
19.
BMC Genomics ; 18(1): 745, 2017 Sep 21.
Article in English | MEDLINE | ID: mdl-28934929

ABSTRACT

BACKGROUND: RNA-Seq measures gene expression by counting sequence reads belonging to unique cDNA fragments. Molecular barcodes commonly in the form of random nucleotides were recently introduced to improve gene expression measures by detecting amplification duplicates, but are susceptible to errors generated during PCR and sequencing. This results in false positive counts, leading to inaccurate transcriptome quantification especially at low input and single-cell RNA amounts where the total number of molecules present is minuscule. To address this issue, we demonstrated the systematic identification of molecular species using transposable error-correcting barcodes that are exponentially expanded to tens of billions of unique labels. RESULTS: We experimentally showed random-mer molecular barcodes suffer from substantial and persistent errors that are difficult to resolve. To assess our method's performance, we applied it to the analysis of known reference RNA standards. By including an inline random-mer molecular barcode, we systematically characterized the presence of sequence errors in random-mer molecular barcodes. We observed that such errors are extensive and become more dominant at low input amounts. CONCLUSIONS: We described the first study to use transposable molecular barcodes and its use for studying random-mer molecular barcode errors. Extensive errors found in random-mer molecular barcodes may warrant the use of error correcting barcodes for transcriptome analysis as input amounts decrease.


Subject(s)
DNA Transposable Elements/genetics , Sequence Analysis, RNA/methods , DNA, Complementary/genetics , Gene Expression Profiling
20.
Anal Chem ; 89(22): 11913-11917, 2017 11 21.
Article in English | MEDLINE | ID: mdl-29083143

ABSTRACT

Digital PCR (dPCR) relies on the analysis of individual partitions to accurately quantify nucleic acid species. The most widely used analysis method requires manual clustering through individual visual inspection. Some automated analysis methods have emerged but do not robustly account for multiplexed targets, low target concentration, and assay noise. In this study, we describe an open source analysis software called Calico that uses "data gridding" to increase the sensitivity of clustering toward small clusters. Our workflow also generates quality score metrics in order to gauge and filter individual assay partitions by how well they were classified. We applied our analysis algorithm to multiplexed droplet-based digital PCR data sets in both EvaGreen and probes-based schemes, and targeted the oncogenic BRAF V600E and KRAS G12D mutations. We demonstrate an automated clustering sensitivity of down to 0.1% mutant fraction and filtering of artifactual assay partitions from low quality DNA samples. Overall, we demonstrate a vastly improved approach to analyzing ddPCR data that can be applied to clinical use, where automation and reproducibility are critical.


Subject(s)
Polymerase Chain Reaction/methods , Polymerase Chain Reaction/standards , Automation , Cluster Analysis , Humans , Mutation , Proto-Oncogene Proteins B-raf/genetics , Proto-Oncogene Proteins p21(ras)/genetics , Software
SELECTION OF CITATIONS
SEARCH DETAIL