Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 48
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Cell ; 184(8): 2239-2254.e39, 2021 04 15.
Article in English | MEDLINE | ID: mdl-33831375

ABSTRACT

Intra-tumor heterogeneity (ITH) is a mechanism of therapeutic resistance and therefore an important clinical challenge. However, the extent, origin, and drivers of ITH across cancer types are poorly understood. To address this, we extensively characterize ITH across whole-genome sequences of 2,658 cancer samples spanning 38 cancer types. Nearly all informative samples (95.1%) contain evidence of distinct subclonal expansions with frequent branching relationships between subclones. We observe positive selection of subclonal driver mutations across most cancer types and identify cancer type-specific subclonal patterns of driver gene mutations, fusions, structural variants, and copy number alterations as well as dynamic changes in mutational processes between subclonal expansions. Our results underline the importance of ITH and its drivers in tumor evolution and provide a pan-cancer resource of comprehensively annotated subclonal events from whole-genome sequencing data.


Subject(s)
Genetic Heterogeneity , Neoplasms/genetics , DNA Copy Number Variations , DNA, Neoplasm/chemistry , DNA, Neoplasm/metabolism , Databases, Genetic , Drug Resistance, Neoplasm/genetics , Humans , Neoplasms/pathology , Polymorphism, Single Nucleotide , Whole Genome Sequencing
2.
Cell ; 173(4): 1003-1013.e15, 2018 05 03.
Article in English | MEDLINE | ID: mdl-29681457

ABSTRACT

The majority of newly diagnosed prostate cancers are slow growing, with a long natural life history. Yet a subset can metastasize with lethal consequences. We reconstructed the phylogenies of 293 localized prostate tumors linked to clinical outcome data. Multiple subclones were detected in 59% of patients, and specific subclonal architectures associate with adverse clinicopathological features. Early tumor development is characterized by point mutations and deletions followed by later subclonal amplifications and changes in trinucleotide mutational signatures. Specific genes are selectively mutated prior to or following subclonal diversification, including MTOR, NKX3-1, and RB1. Patients with low-risk monoclonal tumors rarely relapse after primary therapy (7%), while those with high-risk polyclonal tumors frequently do (61%). The presence of multiple subclones in an index biopsy may be necessary, but not sufficient, for relapse of localized prostate cancer, suggesting that evolution-aware biomarkers should be studied in prospective studies of low-risk tumors suitable for active surveillance.


Subject(s)
Prostatic Neoplasms/pathology , Biomarkers, Tumor/blood , High-Throughput Nucleotide Sequencing , Homeodomain Proteins/genetics , Homeodomain Proteins/metabolism , Humans , Male , Neoplasm Grading , Neoplasm Recurrence, Local , Polymorphism, Single Nucleotide , Proportional Hazards Models , Prospective Studies , Prostatic Neoplasms/classification , Prostatic Neoplasms/genetics , Retinoblastoma Binding Proteins/genetics , Retinoblastoma Binding Proteins/metabolism , TOR Serine-Threonine Kinases/genetics , TOR Serine-Threonine Kinases/metabolism , Transcription Factors/genetics , Transcription Factors/metabolism , Ubiquitin-Protein Ligases/genetics , Ubiquitin-Protein Ligases/metabolism
3.
Nature ; 578(7793): 122-128, 2020 02.
Article in English | MEDLINE | ID: mdl-32025013

ABSTRACT

Cancer develops through a process of somatic evolution1,2. Sequencing data from a single biopsy represent a snapshot of this process that can reveal the timing of specific genomic aberrations and the changing influence of mutational processes3. Here, by whole-genome sequencing analysis of 2,658 cancers as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA)4, we reconstruct the life history and evolution of mutational processes and driver mutation sequences of 38 types of cancer. Early oncogenesis is characterized by mutations in a constrained set of driver genes, and specific copy number gains, such as trisomy 7 in glioblastoma and isochromosome 17q in medulloblastoma. The mutational spectrum changes significantly throughout tumour evolution in 40% of samples. A nearly fourfold diversification of driver genes and increased genomic instability are features of later stages. Copy number alterations often occur in mitotic crises, and lead to simultaneous gains of chromosomal segments. Timing analyses suggest that driver mutations often precede diagnosis by many years, if not decades. Together, these results determine the evolutionary trajectories of cancer, and highlight opportunities for early cancer detection.


Subject(s)
Evolution, Molecular , Genome, Human/genetics , Neoplasms/genetics , DNA Repair/genetics , Gene Dosage , Genes, Tumor Suppressor , Genetic Variation , Humans , Mutagenesis, Insertional/genetics
4.
Nat Methods ; 18(2): 144-155, 2021 02.
Article in English | MEDLINE | ID: mdl-33398189

ABSTRACT

Subclonal reconstruction from bulk tumor DNA sequencing has become a pillar of cancer evolution studies, providing insight into the clonality and relative ordering of mutations and mutational processes. We provide an outline of the complex computational approaches used for subclonal reconstruction from single and multiple tumor samples. We identify the underlying assumptions and uncertainties in each step and suggest best practices for analysis and quality assessment. This guide provides a pragmatic resource for the growing user community of subclonal reconstruction methods.


Subject(s)
DNA, Neoplasm/genetics , Neoplasms/genetics , Sequence Analysis, DNA/methods , Algorithms , Humans , Polymorphism, Single Nucleotide
6.
Cell ; 133(7): 1266-76, 2008 Jun 27.
Article in English | MEDLINE | ID: mdl-18585359

ABSTRACT

Most homeodomains are unique within a genome, yet many are highly conserved across vast evolutionary distances, implying strong selection on their precise DNA-binding specificities. We determined the binding preferences of the majority (168) of mouse homeodomains to all possible 8-base sequences, revealing rich and complex patterns of sequence specificity and showing that there are at least 65 distinct homeodomain DNA-binding activities. We developed a computational system that successfully predicts binding sites for homeodomain proteins as distant from mouse as Drosophila and C. elegans, and we infer full 8-mer binding profiles for the majority of known animal homeodomains. Our results provide an unprecedented level of resolution in the analysis of this simple domain structure and suggest that variation in sequence recognition may be a factor in its functional diversity and evolutionary success.


Subject(s)
DNA/chemistry , Homeodomain Proteins/chemistry , Animals , Base Sequence , Computational Biology , Conserved Sequence , DNA/metabolism , Evolution, Molecular , Homeodomain Proteins/metabolism , Mice , Models, Molecular , Protein Binding , Transcription Factors/chemistry , Transcription Factors/metabolism
7.
Nucleic Acids Res ; 47(6): 2856-2870, 2019 04 08.
Article in English | MEDLINE | ID: mdl-30698747

ABSTRACT

Stress hormones bind and activate the glucocorticoid receptor (GR) in many tissues including the brain. We identified arginine and glutamate rich 1 (ARGLU1) in a screen for new modulators of glucocorticoid signaling in the CNS. Biochemical studies show that the glutamate rich C-terminus of ARGLU1 coactivates multiple nuclear receptors including the glucocorticoid receptor (GR) and the arginine rich N-terminus interacts with splicing factors and binds to RNA. RNA-seq of neural cells depleted of ARGLU1 revealed significant changes in the expression and alternative splicing of distinct genes involved in neurogenesis. Loss of ARGLU1 is embryonic lethal in mice, and knockdown in zebrafish causes neurodevelopmental and heart defects. Treatment with dexamethasone, a GR activator, also induces changes in the pattern of alternatively spliced genes, many of which were lost when ARGLU1 was absent. Importantly, the genes found to be alternatively spliced in response to glucocorticoid treatment were distinct from those under transcriptional control by GR, suggesting an additional mechanism of glucocorticoid action is present in neural cells. Our results thus show that ARGLU1 is a novel factor for embryonic development that modulates basal transcription and alternative splicing in neural cells with consequences for glucocorticoid signaling.


Subject(s)
Embryonic Development , Glucocorticoids/pharmacology , Intracellular Signaling Peptides and Proteins/physiology , RNA Splicing/genetics , Transcriptional Activation/genetics , Alternative Splicing/drug effects , Alternative Splicing/genetics , Animals , Animals, Genetically Modified , Cells, Cultured , Embryo, Nonmammalian , Embryonic Development/drug effects , Embryonic Development/genetics , Glucocorticoids/metabolism , HEK293 Cells , Humans , Mice , Mice, Inbred C57BL , Neurogenesis/drug effects , Neurogenesis/genetics , RNA Splicing/drug effects , Signal Transduction/drug effects , Signal Transduction/genetics , Stress, Physiological/drug effects , Stress, Physiological/genetics , Trans-Activators/physiology , Transcriptional Activation/drug effects , Zebrafish
8.
Nature ; 499(7457): 172-7, 2013 Jul 11.
Article in English | MEDLINE | ID: mdl-23846655

ABSTRACT

RNA-binding proteins are key regulators of gene expression, yet only a small fraction have been functionally characterized. Here we report a systematic analysis of the RNA motifs recognized by RNA-binding proteins, encompassing 205 distinct genes from 24 diverse eukaryotes. The sequence specificities of RNA-binding proteins display deep evolutionary conservation, and the recognition preferences for a large fraction of metazoan RNA-binding proteins can thus be inferred from their RNA-binding domain sequence. The motifs that we identify in vitro correlate well with in vivo RNA-binding data. Moreover, we can associate them with distinct functional roles in diverse types of post-transcriptional regulation, enabling new insights into the functions of RNA-binding proteins both in normal physiology and in human disease. These data provide an unprecedented overview of RNA-binding proteins and their targets, and constitute an invaluable resource for determining post-transcriptional regulatory mechanisms in eukaryotes.


Subject(s)
Gene Expression Regulation/genetics , Nucleotide Motifs/genetics , RNA-Binding Proteins/metabolism , Autistic Disorder/genetics , Base Sequence , Binding Sites/genetics , Conserved Sequence/genetics , Eukaryotic Cells/metabolism , Humans , Molecular Sequence Data , Protein Structure, Tertiary/genetics , RNA Splicing Factors , RNA Stability/genetics , RNA-Binding Proteins/chemistry , RNA-Binding Proteins/genetics
9.
Methods ; 118-119: 3-15, 2017 04 15.
Article in English | MEDLINE | ID: mdl-27956239

ABSTRACT

RNA-binding proteins (RBPs) participate in diverse cellular processes and have important roles in human development and disease. The human genome, and that of many other eukaryotes, encodes hundreds of RBPs that contain canonical sequence-specific RNA-binding domains (RBDs) as well as numerous other unconventional RNA binding proteins (ucRBPs). ucRBPs physically associate with RNA but lack common RBDs. The degree to which these proteins bind RNA, in a sequence specific manner, is unknown. Here, we provide a detailed description of both the laboratory and data processing methods for RNAcompete, a method we have previously used to analyze the RNA binding preferences of hundreds of RBD-containing RBPs, from diverse eukaryotes. We also determine the RNA-binding preferences for two human ucRBPs, NUDT21 and CNBP, and use this analysis to exemplify the RNAcompete pipeline. The results of our RNAcompete experiments are consistent with independent RNA-binding data for these proteins and demonstrate the utility of RNAcompete for analyzing the growing repertoire of ucRBPs.


Subject(s)
Cleavage And Polyadenylation Specificity Factor/genetics , Microarray Analysis/methods , RNA-Binding Proteins/genetics , RNA/chemistry , Animals , Base Sequence , Binding Sites , Cleavage And Polyadenylation Specificity Factor/metabolism , Cloning, Molecular , DNA Primers/chemistry , DNA Primers/metabolism , Drosophila melanogaster/genetics , Drosophila melanogaster/metabolism , Escherichia coli/genetics , Escherichia coli/metabolism , Gene Expression , Humans , Protein Binding , Protein Domains , RNA/genetics , RNA/metabolism , RNA-Binding Proteins/metabolism , Recombinant Proteins/genetics , Recombinant Proteins/metabolism , Sequence Alignment
10.
Methods ; 126: 18-28, 2017 08 15.
Article in English | MEDLINE | ID: mdl-28651966

ABSTRACT

RNA-binding proteins recognize RNA sequences and structures, but there is currently no systematic and accurate method to derive large (>12base) motifs de novo that reflect a combination of intrinsic preference to both sequence and structure. To address this absence, we introduce RNAcompete-S, which couples a single-step competitive binding reaction with an excess of random RNA 40-mers to a custom computational pipeline for interrogation of the bound RNA sequences and derivation of SSMs (Sequence and Structure Models). RNAcompete-S confirms that HuR, QKI, and SRSF1 prefer binding sites that are single stranded, and recapitulates known 8-10bp sequence and structure preferences for Vts1p and RBMY. We also derive an 18-base long SSM for Drosophila SLBP, which to our knowledge has not been previously determined by selections from pure random sequence, and accurately discriminates human replication-dependent histone mRNAs. Thus, RNAcompete-S enables accurate identification of large, intrinsic sequence-structure specificities with a uniform assay.


Subject(s)
Base Sequence/genetics , High-Throughput Nucleotide Sequencing/methods , RNA-Binding Proteins/genetics , Humans , RNA-Binding Proteins/chemistry , Sequence Analysis, RNA/methods
11.
Genome Res ; 24(1): 154-66, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24170600

ABSTRACT

Identifying genes in the genomic context is central to a cell's ability to interpret the genome. Yet, in general, the signals used to define eukaryotic genes are poorly described. Here, we derived simple classifiers that identify where transcription will initiate and terminate using nucleic acid sequence features detectable by the yeast cell, which we integrate into a Unified Model (UM) that models transcription as a whole. The cis-elements that denote where transcription initiates function primarily through nucleosome depletion, and, using a synthetic promoter system, we show that most of these elements are sufficient to initiate transcription in vivo. Hrp1 binding sites are the major characteristic of terminators; these binding sites are often clustered in terminator regions and can terminate transcription bidirectionally. The UM predicts global transcript structure by modeling transcription of the genome using a hidden Markov model whose emissions are the outputs of the initiation and termination classifiers. We validated the novel predictions of the UM with available RNA-seq data and tested it further by directly comparing the transcript structure predicted by the model to the transcription generated by the cell for synthetic DNA segments of random design. We show that the UM identifies transcription start sites more accurately than the initiation classifier alone, indicating that the relative arrangement of promoter and terminator elements influences their function. Our model presents a concrete description of how the cell defines transcript units, explains the existence of nongenic transcripts, and provides insight into genome evolution.


Subject(s)
DNA, Fungal/genetics , Models, Genetic , Saccharomyces cerevisiae/genetics , Transcription Initiation Site , Transcription, Genetic , Binding Sites , Computer Simulation , Genes, Fungal , Genome, Fungal , Nucleosomes/genetics , Promoter Regions, Genetic , Reproducibility of Results , Saccharomyces cerevisiae/metabolism
12.
BMC Bioinformatics ; 16: 156, 2015 May 14.
Article in English | MEDLINE | ID: mdl-25972088

ABSTRACT

BACKGROUND: Tumour samples containing distinct sub-populations of cancer and normal cells present challenges in the development of reproducible biomarkers, as these biomarkers are based on bulk signals from mixed tumour profiles. ISOpure is the only mRNA computational purification method to date that does not require a paired tumour-normal sample, provides a personalized cancer profile for each patient, and has been tested on clinical data. Replacing mixed tumour profiles with ISOpure-preprocessed cancer profiles led to better prognostic gene signatures for lung and prostate cancer. RESULTS: To simplify the integration of ISOpure into standard R-based bioinformatics analysis pipelines, the algorithm has been implemented as an R package. The ISOpureR package performs analogously to the original code in estimating the fraction of cancer cells and the patient cancer mRNA abundance profile from tumour samples in four cancer datasets. CONCLUSIONS: The ISOpureR package estimates the fraction of cancer cells and personalized patient cancer mRNA abundance profile from a mixed tumour profile. This open-source R implementation enables integration into existing computational pipelines, as well as easy testing, modification and extension of the model.


Subject(s)
Algorithms , Computational Biology/methods , Gene Expression Profiling , Prostatic Neoplasms/diagnosis , Prostatic Neoplasms/genetics , Software , Humans , Male , Models, Theoretical , Prognosis
13.
Nucleic Acids Res ; 41(20): 9438-60, 2013 Nov.
Article in English | MEDLINE | ID: mdl-23945942

ABSTRACT

Despite studies that have investigated the interactions of double-stranded RNA-binding proteins like Staufen with RNA in vitro, how they achieve target specificity in vivo remains uncertain. We performed RNA co-immunoprecipitations followed by microarray analysis to identify Staufen-associated mRNAs in early Drosophila embryos. Analysis of the localization and functions of these transcripts revealed a number of potentially novel roles for Staufen. Using computational methods, we identified two sequence features that distinguish Staufen's target transcripts from non-targets. First, these Drosophila transcripts, as well as those human transcripts bound by human Staufen1 and 2, have 3' untranslated regions (UTRs) that are 3-4-fold longer than unbound transcripts. Second, the 3'UTRs of Staufen-bound transcripts are highly enriched for three types of secondary structures. These structures map with high precision to previously identified Staufen-binding regions in Drosophila bicoid and human ARF1 3'UTRs. Our results provide the first systematic genome-wide analysis showing how a double-stranded RNA-binding protein achieves target specificity.


Subject(s)
3' Untranslated Regions , Cytoskeletal Proteins/metabolism , Drosophila Proteins/metabolism , RNA-Binding Proteins/metabolism , Animals , Drosophila/embryology , Drosophila/genetics , Genome, Insect , Humans , Nucleic Acid Conformation , RNA, Double-Stranded/chemistry , RNA, Double-Stranded/metabolism , RNA, Messenger/analysis , RNA, Messenger/metabolism
14.
Nat Genet ; 37(9): 991-6, 2005 Sep.
Article in English | MEDLINE | ID: mdl-16127451

ABSTRACT

Recent mammalian microarray experiments detected widespread transcription and indicated that there may be many undiscovered multiple-exon protein-coding genes. To explore this possibility, we labeled cDNA from unamplified, polyadenylation-selected RNA samples from 37 mouse tissues to microarrays encompassing 1.14 million exon probes. We analyzed these data using GenRate, a Bayesian algorithm that uses a genome-wide scoring function in a factor graph to infer genes. At a stringent exon false detection rate of 2.7%, GenRate detected 12,145 gene-length transcripts and confirmed 81% of the 10,000 most highly expressed known genes. Notably, our analysis showed that most of the 155,839 exons detected by GenRate were associated with known genes, providing microarray-based evidence that most multiple-exon genes have already been identified. GenRate also detected tens of thousands of potential new exons and reconciled discrepancies in current cDNA databases by 'stitching' new transcribed regions into previously annotated genes.


Subject(s)
Computational Biology , DNA, Complementary/chemistry , Databases as Topic , Exons/genetics , Genome , Transcription, Genetic , Algorithms , Animals , Gene Expression Profiling , Humans , Mice , Microarray Analysis , RNA, Messenger/chemistry , RNA, Messenger/metabolism
15.
Nat Biotechnol ; 2024 Jun 11.
Article in English | MEDLINE | ID: mdl-38862616

ABSTRACT

Subclonal reconstruction algorithms use bulk DNA sequencing data to quantify parameters of tumor evolution, allowing an assessment of how cancers initiate, progress and respond to selective pressures. We launched the ICGC-TCGA (International Cancer Genome Consortium-The Cancer Genome Atlas) DREAM Somatic Mutation Calling Tumor Heterogeneity and Evolution Challenge to benchmark existing subclonal reconstruction algorithms. This 7-year community effort used cloud computing to benchmark 31 subclonal reconstruction algorithms on 51 simulated tumors. Algorithms were scored on seven independent tasks, leading to 12,061 total runs. Algorithm choice influenced performance substantially more than tumor features but purity-adjusted read depth, copy-number state and read mappability were associated with the performance of most algorithms on most tasks. No single algorithm was a top performer for all seven tasks and existing ensemble strategies were unable to outperform the best individual methods, highlighting a key research need. All containerized methods, evaluation code and datasets are available to support further assessment of the determinants of subclonal reconstruction accuracy and development of improved methods to understand tumor evolution.

16.
Cell Stem Cell ; 30(12): 1658-1673.e10, 2023 12 07.
Article in English | MEDLINE | ID: mdl-38065069

ABSTRACT

Stem cells regulate their self-renewal and differentiation fate outcomes through both symmetric and asymmetric divisions. m6A RNA methylation controls symmetric commitment and inflammation of hematopoietic stem cells (HSCs) through unknown mechanisms. Here, we demonstrate that the nuclear speckle protein SON is an essential m6A target required for murine HSC self-renewal, symmetric commitment, and inflammation control. Global profiling of m6A identified that m6A mRNA methylation of Son increases during HSC commitment. Upon m6A depletion, Son mRNA increases, but its protein is depleted. Reintroduction of SON rescues defects in HSC symmetric commitment divisions and engraftment. Conversely, Son deletion results in a loss of HSC fitness, while overexpression of SON improves mouse and human HSC engraftment potential by increasing quiescence. Mechanistically, we found that SON rescues MYC and suppresses the METTL3-HSC inflammatory gene expression program, including CCL5, through transcriptional regulation. Thus, our findings define a m6A-SON-CCL5 axis that controls inflammation and HSC fate.


Subject(s)
DNA-Binding Proteins , Hematopoietic Stem Cells , Inflammation , RNA Methylation , Animals , Humans , Mice , Cell Differentiation/genetics , Hematopoietic Stem Cells/metabolism , Methylation , Methyltransferases/genetics , Methyltransferases/metabolism , RNA, Messenger/metabolism , DNA-Binding Proteins/genetics , DNA-Binding Proteins/metabolism , RNA Methylation/genetics
17.
Bioinformatics ; 27(22): 3166-72, 2011 Nov 15.
Article in English | MEDLINE | ID: mdl-21965819

ABSTRACT

MOTIVATION: Lung cancer is often discovered long after its onset, making identifying genes important in its initiation and progression a challenge. By the time the tumors are discovered, we only observe the final sum of changes of the few genes that initiated cancer and thousands of genes that they have influenced. Gene interactions and heterogeneity of samples make it difficult to identify genes consistent between different cohorts. Using gene and gene-product interaction networks, we propose a principled approach to identify a small subset of genes whose network neighbors exhibit consistently high expression change (in cancerous tissue versus normal) regardless of their own expression. We hypothesize that these genes can shed light on the larger scale perturbations in the overall landscape of expression levels. RESULTS: We benchmark our method on simulated data, and show that we can recover a true gene list in noisy measurement data. We then apply our method to four non-small cell lung cancer and two pancreatic cancer cohorts, finding several genes that are consistent within all cohorts of the same cancer type. CONCLUSION: Our model is flexible, robust and identifies gene sets that are more consistent across cohorts than several other approaches. Additionally, our method can be applied on a per-patient basis not requiring large cohorts of patients to find genes of influence. Our approach is generally applicable to gene expression studies where the goal is to identify a small set of influential genes that may in turn explain the much larger set of genome-wide expression changes.


Subject(s)
Gene Regulatory Networks , Genes, Neoplasm , Lung Neoplasms/genetics , Protein Interaction Maps , Carcinoma, Non-Small-Cell Lung/genetics , Gene Expression Profiling , Humans , Lung Neoplasms/metabolism
18.
Nat Methods ; 4(12): 1045-9, 2007 Dec.
Article in English | MEDLINE | ID: mdl-18026111

ABSTRACT

We demonstrate that paired expression profiles of microRNAs (miRNAs) and mRNAs can be used to identify functional miRNA-target relationships with high precision. We used a Bayesian data analysis algorithm, GenMiR++, to identify a network of 1,597 high-confidence target predictions for 104 human miRNAs, which was supported by RNA expression data across 88 tissues and cell types, sequence complementarity and comparative genomics data. We experimentally verified our predictions by investigating the result of let-7b downregulation in retinoblastoma using quantitative reverse transcriptase (RT)-PCR and microarray profiling: some of our verified let-7b targets include CDC25A and BCL7A. Compared to sequence-based predictions, our high-scoring GenMiR++ predictions had much more consistent Gene Ontology annotations and were more accurate predictors of which mRNA levels respond to changes in let-7b levels.


Subject(s)
Gene Expression Profiling/methods , Gene Targeting/methods , MicroRNAs/genetics , Oligonucleotide Array Sequence Analysis/methods , Sequence Analysis, RNA/methods , Base Sequence , Humans , Molecular Sequence Data
19.
Bioinformatics ; 25(8): 1012-8, 2009 Apr 15.
Article in English | MEDLINE | ID: mdl-19088121

ABSTRACT

MOTIVATION: Recognition of specific DNA sequences is a central mechanism by which transcription factors (TFs) control gene expression. Many TF-binding preferences, however, are unknown or poorly characterized, in part due to the difficulty associated with determining their specificity experimentally, and an incomplete understanding of the mechanisms governing sequence specificity. New techniques that estimate the affinity of TFs to all possible k-mers provide a new opportunity to study DNA-protein interaction mechanisms, and may facilitate inference of binding preferences for members of a given TF family when such information is available for other family members. RESULTS: We employed a new dataset consisting of the relative preferences of mouse homeodomains for all eight-base DNA sequences in order to ask how well we can predict the binding profiles of homeodomains when only their protein sequences are given. We evaluated a panel of standard statistical inference techniques, as well as variations of the protein features considered. Nearest neighbour among functionally important residues emerged among the most effective methods. Our results underscore the complexity of TF-DNA recognition, and suggest a rational approach for future analyses of TF families.


Subject(s)
Computational Biology/methods , DNA/chemistry , Sequence Analysis, DNA/methods , Transcription Factors/metabolism , Binding Sites , DNA/metabolism , Transcription Factors/chemistry
20.
Nat Commun ; 11(1): 6247, 2020 12 07.
Article in English | MEDLINE | ID: mdl-33288765

ABSTRACT

Whole-genome sequencing can be used to estimate subclonal populations in tumours and this intra-tumoural heterogeneity is linked to clinical outcomes. Many algorithms have been developed for subclonal reconstruction, but their variabilities and consistencies are largely unknown. We evaluate sixteen pipelines for reconstructing the evolutionary histories of 293 localized prostate cancers from single samples, and eighteen pipelines for the reconstruction of 10 tumours with multi-region sampling. We show that predictions of subclonal architecture and timing of somatic mutations vary extensively across pipelines. Pipelines show consistent types of biases, with those incorporating SomaticSniper and Battenberg preferentially predicting homogenous cancer cell populations and those using MuTect tending to predict multiple populations of cancer cells. Subclonal reconstructions using multi-region sampling confirm that single-sample reconstructions systematically underestimate intra-tumoural heterogeneity, predicting on average fewer than half of the cancer cell populations identified by multi-region sequencing. Overall, these biases suggest caution in interpreting specific architectures and subclonal variants.


Subject(s)
Algorithms , Genetic Heterogeneity , Mutation , Prostatic Neoplasms/genetics , Whole Genome Sequencing/methods , Biomarkers, Tumor/genetics , Clonal Evolution , Clone Cells/metabolism , Computational Biology/methods , DNA Copy Number Variations , Humans , Male , Models, Genetic , Polymorphism, Single Nucleotide , Prostatic Neoplasms/classification , Prostatic Neoplasms/pathology
SELECTION OF CITATIONS
SEARCH DETAIL