Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
1.
Genome Res ; 31(2): 337-347, 2021 Feb.
Article in English | MEDLINE | ID: mdl-33361113

ABSTRACT

Understanding the changes in diverse molecular pathways underlying the development of breast tumors is critical for improving diagnosis, treatment, and drug development. Here, we used RNA-profiling of canine mammary tumors (CMTs) coupled with a robust analysis framework to model molecular changes in human breast cancer. Our study leveraged a key advantage of the canine model, the frequent presence of multiple naturally occurring tumors at diagnosis, thus providing samples spanning normal tissue and benign and malignant tumors from each patient. We showed human breast cancer signals, at both expression and mutation level, are evident in CMTs. Profiling multiple tumors per patient enabled by the CMT model allowed us to resolve statistically robust transcription patterns and biological pathways specific to malignant tumors versus those arising in benign tumors or shared with normal tissues. We showed that multiple histological samples per patient is necessary to effectively capture these progression-related signatures, and that carcinoma-specific signatures are predictive of survival for human breast cancer patients. To catalyze and support similar analyses and use of the CMT model by other biomedical researchers, we provide FREYA, a robust data processing pipeline and statistical analyses framework.

2.
Nature ; 498(7453): 220-3, 2013 Jun 13.
Article in English | MEDLINE | ID: mdl-23665959

ABSTRACT

Congenital heart disease (CHD) is the most frequent birth defect, affecting 0.8% of live births. Many cases occur sporadically and impair reproductive fitness, suggesting a role for de novo mutations. Here we compare the incidence of de novo mutations in 362 severe CHD cases and 264 controls by analysing exome sequencing of parent-offspring trios. CHD cases show a significant excess of protein-altering de novo mutations in genes expressed in the developing heart, with an odds ratio of 7.5 for damaging (premature termination, frameshift, splice site) mutations. Similar odds ratios are seen across the main classes of severe CHD. We find a marked excess of de novo mutations in genes involved in the production, removal or reading of histone 3 lysine 4 (H3K4) methylation, or ubiquitination of H2BK120, which is required for H3K4 methylation. There are also two de novo mutations in SMAD2, which regulates H3K27 methylation in the embryonic left-right organizer. The combination of both activating (H3K4 methylation) and inactivating (H3K27 methylation) chromatin marks characterizes 'poised' promoters and enhancers, which regulate expression of key developmental genes. These findings implicate de novo point mutations in several hundreds of genes that collectively contribute to approximately 10% of severe CHD.


Subject(s)
Heart Diseases/congenital , Heart Diseases/genetics , Histones/metabolism , Adult , Case-Control Studies , Child , Chromatin/chemistry , Chromatin/metabolism , DNA Mutational Analysis , Enhancer Elements, Genetic/genetics , Exome/genetics , Female , Genes, Developmental/genetics , Heart Diseases/metabolism , Histones/chemistry , Humans , Lysine/chemistry , Lysine/metabolism , Male , Methylation , Mutation , Odds Ratio , Promoter Regions, Genetic/genetics
3.
Nature ; 485(7397): 237-41, 2012 Apr 04.
Article in English | MEDLINE | ID: mdl-22495306

ABSTRACT

Multiple studies have confirmed the contribution of rare de novo copy number variations to the risk for autism spectrum disorders. But whereas de novo single nucleotide variants have been identified in affected individuals, their contribution to risk has yet to be clarified. Specifically, the frequency and distribution of these mutations have not been well characterized in matched unaffected controls, and such data are vital to the interpretation of de novo coding mutations observed in probands. Here we show, using whole-exome sequencing of 928 individuals, including 200 phenotypically discordant sibling pairs, that highly disruptive (nonsense and splice-site) de novo mutations in brain-expressed genes are associated with autism spectrum disorders and carry large effects. On the basis of mutation rates in unaffected individuals, we demonstrate that multiple independent de novo single nucleotide variants in the same gene among unrelated probands reliably identifies risk alleles, providing a clear path forward for gene discovery. Among a total of 279 identified de novo coding mutations, there is a single instance in probands, and none in siblings, in which two independent nonsense variants disrupt the same gene, SCN2A (sodium channel, voltage-gated, type II, α subunit), a result that is highly unlikely by chance.


Subject(s)
Autistic Disorder/genetics , Exome/genetics , Exons/genetics , Genetic Predisposition to Disease/genetics , Mutation/genetics , Nerve Tissue Proteins/genetics , Sodium Channels/genetics , Alleles , Codon, Nonsense/genetics , Genetic Heterogeneity , Humans , NAV1.2 Voltage-Gated Sodium Channel , RNA Splice Sites/genetics , Siblings
4.
Clin Cancer Res ; 24(8): 1872-1880, 2018 04 15.
Article in English | MEDLINE | ID: mdl-29330207

ABSTRACT

Purpose: Decisions to continue or suspend therapy with immune checkpoint inhibitors are commonly guided by tumor dynamics seen on serial imaging. However, immunotherapy responses are uniquely challenging to interpret because tumors often shrink slowly or can appear transiently enlarged due to inflammation. We hypothesized that monitoring tumor cell death in real time by quantifying changes in circulating tumor DNA (ctDNA) levels could enable early assessment of immunotherapy efficacy.Experimental Design: We compared longitudinal changes in ctDNA levels with changes in radiographic tumor size and with survival outcomes in 28 patients with metastatic non-small cell lung cancer (NSCLC) receiving immune checkpoint inhibitor therapy. CtDNA was quantified by determining the allele fraction of cancer-associated somatic mutations in plasma using a multigene next-generation sequencing assay. We defined a ctDNA response as a >50% decrease in mutant allele fraction from baseline, with a second confirmatory measurement.Results: Strong agreement was observed between ctDNA response and radiographic response (Cohen's kappa, 0.753). Median time to initial response among patients who achieved responses in both categories was 24.5 days by ctDNA versus 72.5 days by imaging. Time on treatment was significantly longer for ctDNA responders versus nonresponders (median, 205.5 vs. 69 days; P < 0.001). A ctDNA response was associated with superior progression-free survival [hazard ratio (HR), 0.29; 95% CI, 0.09-0.89; P = 0.03], and superior overall survival (HR, 0.17; 95% CI, 0.05-0.62; P = 0.007).Conclusions: A drop in ctDNA level is an early marker of therapeutic efficacy and predicts prolonged survival in patients treated with immune checkpoint inhibitors for NSCLC. Clin Cancer Res; 24(8); 1872-80. ©2018 AACR.


Subject(s)
Biomarkers, Tumor , Circulating Tumor DNA , Lung Neoplasms/genetics , Lung Neoplasms/therapy , Antineoplastic Agents, Immunological/therapeutic use , B7-H1 Antigen/antagonists & inhibitors , Disease Progression , Humans , Immunotherapy , Lung Neoplasms/diagnosis , Lung Neoplasms/immunology , Mutation , Prognosis , Programmed Cell Death 1 Receptor/antagonists & inhibitors , Survival Analysis , Time Factors , Tomography, X-Ray Computed , Treatment Outcome
5.
BMC Bioinformatics ; 8: 186, 2007 Jun 07.
Article in English | MEDLINE | ID: mdl-17555595

ABSTRACT

BACKGROUND: Tiling microarrays are becoming an essential technology in the functional genomics toolbox. They have been applied to the tasks of novel transcript identification, elucidation of transcription factor binding sites, detection of methylated DNA and several other applications in several model organisms. These experiments are being conducted at increasingly finer resolutions as the microarray technology enjoys increasingly greater feature densities. The increased densities naturally lead to increased data analysis requirements. Specifically, the most widely employed algorithm for tiling array analysis involves smoothing observed signals by computing pseudomedians within sliding windows, a O(n2logn) calculation in each window. This poor time complexity is an issue for tiling array analysis and could prove to be a real bottleneck as tiling microarray experiments become grander in scope and finer in resolution. RESULTS: We therefore implemented Monahan's HLQEST algorithm that reduces the runtime complexity for computing the pseudomedian of n numbers to O(nlogn) from O(n2logn). For a representative tiling microarray dataset, this modification reduced the smoothing procedure's runtime by nearly 90%. We then leveraged the fact that elements within sliding windows remain largely unchanged in overlapping windows (as one slides across genomic space) to further reduce computation by an additional 43%. This was achieved by the application of skip lists to maintaining a sorted list of values from window to window. This sorted list could be maintained with simple O(log n) inserts and deletes. We illustrate the favorable scaling properties of our algorithms with both time complexity analysis and benchmarking on synthetic datasets. CONCLUSION: Tiling microarray analyses that rely upon a sliding window pseudomedian calculation can require many hours of computation. We have eased this requirement significantly by implementing efficient algorithms that scale well with genomic feature density. This result not only speeds the current standard analyses, but also makes possible ones where many iterations of the filter may be required, such as might be required in a bootstrap or parameter estimation setting. Source code and executables are available at http://tiling.gersteinlab.org/pseudomedian/.


Subject(s)
Algorithms , Chromosome Mapping/methods , Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis/methods , Sequence Analysis, DNA/methods , Signal Processing, Computer-Assisted
6.
BMC Genet ; 6 Suppl 1: S130, 2005 Dec 30.
Article in English | MEDLINE | ID: mdl-16451589

ABSTRACT

Alcoholism is a complex disease. As with other common diseases, genetic variants underlying alcoholism have been illusive, possibly due to the small effect from each individual susceptible variant, gene x environment and gene x gene interactions and complications in phenotype definition. We conducted association tests, the family-based association tests (FBAT) and the backward haplotype transmission association (BHTA), on the Collaborative Study of the Genetics of Alcoholism (COGA) data provided by Genetic Analysis Workshop (GAW) 14. Efron's local false discovery rate method was applied to control the proportion of false discoveries. For FBAT, we compared the results based on different types of genetic markers (single-nucleotide polymorphisms (SNPs) versus microsatellites) and different phenotype definitions (clinical diagnoses versus electrophysiological phenotypes). Significant association results were found only between SNPs and clinical diagnoses. In contrast, significant results were found only between microsatellites and electrophysiological phenotypes. In addition, we obtained the association results for SNPs and microsatellites using COGA diagnosis as phenotype based on BHTA. In this case, the results for SNPs and microsatellites are more consistent. Compared to FBAT, more significant markers are detected with BHTA.


Subject(s)
Alcoholism/genetics , Genome-Wide Association Study , Microsatellite Repeats/genetics , Polymorphism, Single Nucleotide/genetics , Alcoholism/diagnosis , Cooperative Behavior , False Positive Reactions , Family , Haplotypes/genetics , Humans , Phenotype
7.
DNA Repair (Amst) ; 26: 44-53, 2015 Feb.
Article in English | MEDLINE | ID: mdl-25547252

ABSTRACT

Efficient DNA double-strand break (DSB) repair is a critical determinant of cell survival in response to DNA damaging agents, and it plays a key role in the maintenance of genomic integrity. Homologous recombination (HR) and non-homologous end-joining (NHEJ) represent the two major pathways by which DSBs are repaired in mammalian cells. We now understand that HR and NHEJ repair are composed of multiple sub-pathways, some of which still remain poorly understood. As such, there is great interest in the development of novel assays to interrogate these key pathways, which could lead to the development of novel therapeutics, and a better understanding of how DSBs are repaired. Furthermore, assays which can measure repair specifically at endogenous chromosomal loci are of particular interest, because of an emerging understanding that chromatin interactions heavily influence DSB repair pathway choice. Here, we present the design and validation of a novel, next-generation sequencing-based approach to study DSB repair at chromosomal loci in cells. We demonstrate that NHEJ repair "fingerprints" can be identified using our assay, which are dependent on the status of key DSB repair proteins. In addition, we have validated that our system can be used to detect dynamic shifts in DSB repair activity in response to specific perturbations. This approach represents a unique alternative to many currently available DSB repair assays, which typical rely on the expression of reporter genes as an indirect read-out for repair. As such, we believe this tool will be useful for DNA repair researchers to study NHEJ repair in a high-throughput and sensitive manner, with the capacity to detect subtle changes in DSB repair patterns that was not possible previously.


Subject(s)
DNA Breaks, Double-Stranded , DNA End-Joining Repair , DNA Mutational Analysis/methods , High-Throughput Nucleotide Sequencing/methods , Animals , Chromatin/metabolism , DNA/metabolism , DNA-Binding Proteins/metabolism , Genetic Loci , Humans , INDEL Mutation , Mammals , Recombinational DNA Repair
8.
Genomics Proteomics Bioinformatics ; 13(1): 25-35, 2015 Feb.
Article in English | MEDLINE | ID: mdl-25712262

ABSTRACT

We report a significantly-enhanced bioinformatics suite and database for proteomics research called Yale Protein Expression Database (YPED) that is used by investigators at more than 300 institutions worldwide. YPED meets the data management, archival, and analysis needs of a high-throughput mass spectrometry-based proteomics research ranging from a single laboratory, group of laboratories within and beyond an institution, to the entire proteomics community. The current version is a significant improvement over the first version in that it contains new modules for liquid chromatography-tandem mass spectrometry (LC-MS/MS) database search results, label and label-free quantitative proteomic analysis, and several scoring outputs for phosphopeptide site localization. In addition, we have added both peptide and protein comparative analysis tools to enable pairwise analysis of distinct peptides/proteins in each sample and of overlapping peptides/proteins between all samples in multiple datasets. We have also implemented a targeted proteomics module for automated multiple reaction monitoring (MRM)/selective reaction monitoring (SRM) assay development. We have linked YPED's database search results and both label-based and label-free fold-change analysis to the Skyline Panorama repository for online spectra visualization. In addition, we have built enhanced functionality to curate peptide identifications into an MS/MS peptide spectral library for all of our protein database search identification results.


Subject(s)
Chromatography, Liquid/methods , Computational Biology/methods , Databases, Protein , Peptide Fragments/analysis , Proteome/analysis , Proteomics/methods , Tandem Mass Spectrometry/methods , Humans
9.
Cell Rep ; 9(1): 16-23, 2014 Oct 09.
Article in English | MEDLINE | ID: mdl-25284784

ABSTRACT

Whole-exome sequencing (WES) studies have demonstrated the contribution of de novo loss-of-function single-nucleotide variants (SNVs) to autism spectrum disorder (ASD). However, challenges in the reliable detection of de novo insertions and deletions (indels) have limited inclusion of these variants in prior analyses. By applying a robust indel detection method to WES data from 787 ASD families (2,963 individuals), we demonstrate that de novo frameshift indels contribute to ASD risk (OR = 1.6; 95% CI = 1.0-2.7; p = 0.03), are more common in female probands (p = 0.02), are enriched among genes encoding FMRP targets (p = 6 × 10(-9)), and arise predominantly on the paternal chromosome (p < 0.001). On the basis of mutation rates in probands versus unaffected siblings, we conclude that de novo frameshift indels contribute to risk in approximately 3% of individuals with ASD. Finally, by observing clustering of mutations in unrelated probands, we uncover two ASD-associated genes: KMT2E (MLL5), a chromatin regulator, and RIMS1, a regulator of synaptic vesicle release.


Subject(s)
Child Development Disorders, Pervasive/genetics , Frameshift Mutation , Sequence Deletion , Child , Child Development Disorders, Pervasive/blood , Child Development Disorders, Pervasive/diagnosis , DNA/blood , DNA/genetics , DNA-Binding Proteins/genetics , Female , Fragile X Mental Retardation Protein/genetics , GTP-Binding Proteins/genetics , Humans , Male , Nerve Tissue Proteins/genetics , Pedigree , Phenotype , Sex Factors
11.
Cancer Res ; 72(14): 3492-8, 2012 Jul 15.
Article in English | MEDLINE | ID: mdl-22581825

ABSTRACT

Detection of cell-free tumor DNA in the blood has offered promise as a cancer biomarker, but practical clinical implementations have been impeded by the lack of a sensitive and accurate method for quantitation that is also simple, inexpensive, and readily scalable. Here we present an approach that uses next-generation sequencing to quantify the small fraction of DNA molecules that contain tumor-specific mutations within a background of normal DNA in plasma. Using layers of sequence redundancy designed to distinguish true mutations from sequencer misreads and PCR misincorporations, we achieved a detection sensitivity of approximately 1 variant in 5,000 molecules. In addition, the attachment of modular barcode tags to the DNA fragments to be sequenced facilitated the simultaneous analysis of more than 100 patient samples. As proof-of-principle, we showed the successful use of this method to follow treatment-associated changes in circulating tumor DNA levels in patients with non-small cell lung cancer. Our findings suggest that the deep sequencing approach described here may be applied to the development of a practical diagnostic test that measures tumor-derived DNA levels in blood.


Subject(s)
DNA, Neoplasm/blood , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA/methods , Carcinoma, Non-Small-Cell Lung/blood , Carcinoma, Non-Small-Cell Lung/genetics , Cell Line, Tumor , Female , Humans , Lung Neoplasms/blood , Lung Neoplasms/genetics , Male , Mutation , Polymerase Chain Reaction
12.
J Proteome Res ; 7(1): 293-9, 2008 Jan.
Article in English | MEDLINE | ID: mdl-17902638

ABSTRACT

The widespread use of mass spectrometry for protein identification has created a demand for computationally efficient methods of matching mass spectrometry data to protein databases. A search using X!Tandem, a popular and representative program, can require hours or days to complete, particularly when missed cleavages and post-translational modifications are considered. Existing techniques for accelerating X!Tandem by employing parallelism are unsatisfactory for a variety of reasons. The paper describes a parallelization of X!Tandem, called X!!Tandem, that shows excellent speedups on commodity hardware and produces the same results as the original program. Furthermore, the parallelization technique used is unusual and potentially useful for parallelizing other complex programs.


Subject(s)
Proteins/analysis , Software/standards , Tandem Mass Spectrometry , Databases, Protein , Methods , Protein Processing, Post-Translational , Time Factors
13.
Science ; 318(5849): 420-6, 2007 Oct 19.
Article in English | MEDLINE | ID: mdl-17901297

ABSTRACT

Structural variation of the genome involves kilobase- to megabase-sized deletions, duplications, insertions, inversions, and complex combinations of rearrangements. We introduce high-throughput and massive paired-end mapping (PEM), a large-scale genome-sequencing method to identify structural variants (SVs) approximately 3 kilobases (kb) or larger that combines the rescue and capture of paired ends of 3-kb fragments, massive 454 sequencing, and a computational approach to map DNA reads onto a reference genome. PEM was used to map SVs in an African and in a putatively European individual and identified shared and divergent SVs relative to the reference genome. Overall, we fine-mapped more than 1000 SVs and documented that the number of SVs among humans is much larger than initially hypothesized; many of the SVs potentially affect gene function. The breakpoint junction sequences of more than 200 SVs were determined with a novel pooling strategy and computational analysis. Our analysis provided insights into the mechanisms of SV formation in humans.


Subject(s)
Genetic Variation , Genome, Human , Mutation , Chromosome Inversion , Chromosome Mapping , Computational Biology , Female , Gene Fusion , Humans , Mutagenesis, Insertional , Oligonucleotide Array Sequence Analysis , Recombination, Genetic , Repetitive Sequences, Nucleic Acid , Retroelements , Sequence Analysis, DNA , Sequence Deletion
SELECTION OF CITATIONS
SEARCH DETAIL