Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 42
Filter
Add more filters

Publication year range
1.
Immunity ; 48(4): 812-830.e14, 2018 04 17.
Article in English | MEDLINE | ID: mdl-29628290

ABSTRACT

We performed an extensive immunogenomic analysis of more than 10,000 tumors comprising 33 diverse cancer types by utilizing data compiled by TCGA. Across cancer types, we identified six immune subtypes-wound healing, IFN-γ dominant, inflammatory, lymphocyte depleted, immunologically quiet, and TGF-ß dominant-characterized by differences in macrophage or lymphocyte signatures, Th1:Th2 cell ratio, extent of intratumoral heterogeneity, aneuploidy, extent of neoantigen load, overall cell proliferation, expression of immunomodulatory genes, and prognosis. Specific driver mutations correlated with lower (CTNNB1, NRAS, or IDH1) or higher (BRAF, TP53, or CASP8) leukocyte levels across all cancers. Multiple control modalities of the intracellular and extracellular networks (transcription, microRNAs, copy number, and epigenetic processes) were involved in tumor-immune cell interactions, both across and within immune subtypes. Our immunogenomics pipeline to characterize these heterogeneous tumors and the resulting data are intended to serve as a resource for future targeted studies to further advance the field.


Subject(s)
Genomics/methods , Neoplasms , Adolescent , Adult , Aged , Aged, 80 and over , Child , Female , Humans , Interferon-gamma/genetics , Interferon-gamma/immunology , Macrophages/immunology , Male , Middle Aged , Neoplasms/classification , Neoplasms/genetics , Neoplasms/immunology , Prognosis , Th1-Th2 Balance/physiology , Transforming Growth Factor beta/genetics , Transforming Growth Factor beta/immunology , Wound Healing/genetics , Wound Healing/immunology , Young Adult
2.
Article in English | MEDLINE | ID: mdl-38466528

ABSTRACT

We identified a progenitor cell population highly enriched in samples from invasive and chemo-resistant carcinomas, characterized by a well-defined multigene signature including APOD, DCN, and LUM. This cell population has previously been labeled as consisting of inflammatory cancer-associated fibroblasts (iCAFs). The same signature characterizes naturally occurring fibro-adipogenic progenitors (FAPs) as well as stromal cells abundant in normal adipose tissue. Our analysis of human gene expression databases provides evidence that adipose stromal cells (ASCs) are recruited by tumors and undergo differentiation into CAFs during cancer progression to invasive and chemotherapy-resistant stages.

3.
Bioinformatics ; 40(5)2024 May 02.
Article in English | MEDLINE | ID: mdl-38662553

ABSTRACT

SUMMARY: Existing clustering methods for characterizing cell populations from single-cell RNA sequencing are constrained by several limitations stemming from the fact that clusters often cannot be homogeneous, particularly for transitioning populations. On the other hand, dominant cell populations within samples can be identified independently by their strong gene co-expression signatures using methods unrelated to partitioning. Here, we introduce a clustering method, CASCC (co-expression-assisted single-cell clustering), designed to improve biological accuracy using gene co-expression features identified using an unsupervised adaptive attractor algorithm. CASCC outperformed other methods as evidenced by multiple evaluation metrics, and our results suggest that CASCC can improve the analysis of single-cell transcriptomics, enabling potential new discoveries related to underlying biological mechanisms. AVAILABILITY AND IMPLEMENTATION: The CASCC R package is publicly available at https://github.com/LingyiC/CASCC and https://zenodo.org/doi/10.5281/zenodo.10648327.


Subject(s)
Algorithms , RNA-Seq , Single-Cell Analysis , Software , Single-Cell Analysis/methods , Cluster Analysis , RNA-Seq/methods , Humans , Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , Single-Cell Gene Expression Analysis
5.
PLoS Comput Biol ; 17(7): e1009228, 2021 07.
Article in English | MEDLINE | ID: mdl-34283835

ABSTRACT

During the last ten years, many research results have been referring to a particular type of cancer-associated fibroblasts associated with poor prognosis, invasiveness, metastasis and resistance to therapy in multiple cancer types, characterized by a gene expression signature with prominent presence of genes COL11A1, THBS2 and INHBA. Identifying the underlying biological mechanisms responsible for their creation may facilitate the discovery of targets for potential pan-cancer therapeutics. Using a novel computational approach for single-cell gene expression data analysis identifying the dominant cell populations in a sequence of samples from patients at various stages, we conclude that these fibroblasts are produced by a pan-cancer cellular transition originating from a particular type of adipose-derived stromal cells naturally present in the stromal vascular fraction of normal adipose tissue, having a characteristic gene expression signature. Focusing on a rich pancreatic cancer dataset, we provide a detailed description of the continuous modification of the gene expression profiles of cells as they transition from APOD-expressing adipose-derived stromal cells to COL11A1-expressing cancer-associated fibroblasts, identifying the key genes that participate in this transition. These results also provide an explanation to the well-known fact that the adipose microenvironment contributes to cancer progression.


Subject(s)
Biomarkers, Tumor/genetics , Cancer-Associated Fibroblasts/metabolism , Collagen Type XI/genetics , Neoplasm Invasiveness/genetics , Adipose Tissue/metabolism , Adipose Tissue/pathology , Breast Neoplasms/genetics , Breast Neoplasms/pathology , Cancer-Associated Fibroblasts/pathology , Carcinoma, Pancreatic Ductal/genetics , Carcinoma, Pancreatic Ductal/pathology , Computational Biology , Databases, Factual , Databases, Genetic , Disease Progression , Female , Gene Expression Regulation, Neoplastic , Head and Neck Neoplasms/genetics , Head and Neck Neoplasms/pathology , Humans , Lung Neoplasms/genetics , Lung Neoplasms/pathology , Mesenchymal Stem Cells/metabolism , Mesenchymal Stem Cells/pathology , Neoplasm Invasiveness/pathology , Neoplasm Invasiveness/prevention & control , Ovarian Neoplasms/genetics , Ovarian Neoplasms/pathology , Pancreatic Neoplasms/genetics , Pancreatic Neoplasms/pathology , Single-Cell Analysis , Stromal Cells/metabolism , Stromal Cells/pathology , Transcriptome , Tumor Microenvironment/genetics
6.
Bioinformatics ; 36(11): 3588-3589, 2020 06 01.
Article in English | MEDLINE | ID: mdl-32108864

ABSTRACT

SUMMARY: We developed 2DImpute, an imputation method for correcting false zeros (known as dropouts) in single-cell RNA-sequencing (scRNA-seq) data. It features preventing excessive correction by predicting the false zeros and imputing their values by making use of the interrelationships between both genes and cells in the expression matrix. We showed that 2DImpute outperforms several leading imputation methods by applying it on datasets from various scRNA-seq protocols. AVAILABILITY AND IMPLEMENTATION: The R package of 2DImpute is freely available at GitHub (https://github.com/zky0708/2DImpute). CONTACT: d.anastassiou@columbia.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
RNA-Seq , Software , Sequence Analysis, RNA , Single-Cell Analysis , Exome Sequencing
7.
PLoS Comput Biol ; 9(2): e1002920, 2013.
Article in English | MEDLINE | ID: mdl-23468608

ABSTRACT

Mining gene expression profiles has proven valuable for identifying signatures serving as surrogates of cancer phenotypes. However, the similarities of such signatures across different cancer types have not been strong enough to conclude that they represent a universal biological mechanism shared among multiple cancer types. Here we present a computational method for generating signatures using an iterative process that converges to one of several precise attractors defining signatures representing biomolecular events, such as cell transdifferentiation or the presence of an amplicon. By analyzing rich gene expression datasets from different cancer types, we identified several such biomolecular events, some of which are universally present in all tested cancer types in nearly identical form. Although the method is unsupervised, we show that it often leads to attractors with strong phenotypic associations. We present several such multi-cancer attractors, focusing on three that are prominent and sharply defined in all cases: a mesenchymal transition attractor strongly associated with tumor stage, a mitotic chromosomal instability attractor strongly associated with tumor grade, and a lymphocyte-specific attractor.


Subject(s)
Computational Biology/methods , Models, Biological , Neoplasms/genetics , Algorithms , Data Mining , Databases, Genetic , Epithelial-Mesenchymal Transition , Gene Expression Profiling/methods , Genome/genetics , Humans , Kaplan-Meier Estimate , Kinetochores , Mitosis/genetics , Neoplasms/metabolism , Neoplasms/pathology , Oncogenes , Phenotype , Prognosis
8.
Cancer Res ; 84(5): 648-649, 2024 03 04.
Article in English | MEDLINE | ID: mdl-38437636

ABSTRACT

Cancer aggressiveness has been linked with obesity, and studies have shown that adipose tissue can enhance cancer progression. In this issue of Cancer Research, Hosni and colleagues discover a paracrine mechanism mediated by adipocyte precursor cells through which urothelial carcinomas become resistant to erdafitinib, a recently approved therapy inhibiting fibroblast growth factor receptors (FGFR). They identified neuregulin 1 (NRG1) secreted by adipocyte precursor cells as an activator of HER3 signaling that enables resistance. The NRG1-mediated FGFR inhibitor resistance was amenable to intervention with pertuzumab, an antibody blocking the NRG1/HER3 axis. To investigate the nature of the resistance-associated NRG1-expressing cells in human patients, the authors analyzed published single-cell RNA sequencing data and observed that such cells appear in a cluster assigned as inflammatory cancer-associated fibroblasts (iCAF). Notably, the gene signature corresponding to these CAFs is highly similar to that shared by adipose stromal cells (ASC) in fat tissue and fibro-adipogenic progenitors (FAP) in skeletal muscle of cancer-free individuals. Because fibroblasts with the ASC/FAP signature are enriched in various carcinomas, it is possible that the paracrine signaling conferred by NRG1 is a pan-cancer mechanism of FGFR inhibitor resistance and tumor aggressiveness. See related article by Hosni et al., p. 725.


Subject(s)
Cancer-Associated Fibroblasts , Carcinoma, Transitional Cell , Humans , Adipocytes , Adipose Tissue , Stromal Cells
9.
Nat Biotechnol ; 2024 Jun 11.
Article in English | MEDLINE | ID: mdl-38862616

ABSTRACT

Subclonal reconstruction algorithms use bulk DNA sequencing data to quantify parameters of tumor evolution, allowing an assessment of how cancers initiate, progress and respond to selective pressures. We launched the ICGC-TCGA (International Cancer Genome Consortium-The Cancer Genome Atlas) DREAM Somatic Mutation Calling Tumor Heterogeneity and Evolution Challenge to benchmark existing subclonal reconstruction algorithms. This 7-year community effort used cloud computing to benchmark 31 subclonal reconstruction algorithms on 51 simulated tumors. Algorithms were scored on seven independent tasks, leading to 12,061 total runs. Algorithm choice influenced performance substantially more than tumor features but purity-adjusted read depth, copy-number state and read mappability were associated with the performance of most algorithms on most tasks. No single algorithm was a top performer for all seven tasks and existing ensemble strategies were unable to outperform the best individual methods, highlighting a key research need. All containerized methods, evaluation code and datasets are available to support further assessment of the determinants of subclonal reconstruction accuracy and development of improved methods to understand tumor evolution.

10.
BMC Bioinformatics ; 14: 270, 2013 Sep 08.
Article in English | MEDLINE | ID: mdl-24010487

ABSTRACT

BACKGROUND: DNA pooling constitutes a cost effective alternative in genome wide association studies. In DNA pooling, equimolar amounts of DNA from different individuals are mixed into one sample and the frequency of each allele in each position is observed in a single genotype experiment. The identification of haplotype frequencies from pooled data in addition to single locus analysis is of separate interest within these studies as haplotypes could increase statistical power and provide additional insight. RESULTS: We developed a method for maximum-parsimony haplotype frequency estimation from pooled DNA data based on the sparse representation of the DNA pools in a dictionary of haplotypes. Extensions to scenarios where data is noisy or even missing are also presented. The resulting method is first applied to simulated data based on the haplotypes and their associated frequencies of the AGT gene. We further evaluate our methodology on datasets consisting of SNPs from the first 7Mb of the HapMap CEU population. Noise and missing data were further introduced in the datasets in order to test the extensions of the proposed method. Both HIPPO and HAPLOPOOL were also applied to these datasets to compare performances. CONCLUSIONS: We evaluate our methodology on scenarios where pooling is more efficient relative to individual genotyping; that is, in datasets that contain pools with a small number of individuals. We show that in such scenarios our methodology outperforms state-of-the-art methods such as HIPPO and HAPLOPOOL.


Subject(s)
DNA/chemistry , Gene Frequency/genetics , Genomics/methods , Haplotypes/genetics , Algorithms , DNA/genetics , Databases, Genetic , HapMap Project , Humans , Polymorphism, Single Nucleotide/genetics
11.
Ann Hum Genet ; 76(4): 312-25, 2012 Jul.
Article in English | MEDLINE | ID: mdl-22607042

ABSTRACT

Many large genome-wide association studies include nuclear families with more than one child (trio families), allowing for analysis of differences between siblings (sib pair analysis). Statistical power can be increased when haplotypes are used instead of genotypes. Currently, haplotype inference in families with more than one child can be performed either using the familial information or statistical information derived from the population samples but not both. Building on our recently proposed tree-based deterministic framework (TDS) for trio families, we augment its applicability to general nuclear families. We impose a minimum recombinant approach locally and independently on each multiple children family, while resorting to the population-derived information to solve the remaining ambiguities. Thus our framework incorporates all available information (familial and population) in a given study. We demonstrate that using all the constraints in our approach we can have gains in the accuracy as opposed to breaking the multiple children families to separate trios and resorting to a trio inference algorithm or phasing each family in isolation. We believe that our proposed framework could be the method of choice for haplotype inference in studies that include nuclear families with multiple children. Our software (tds2.0) is downloadable from www.ee.columbia.edu/∼anastas/tds.


Subject(s)
Haplotypes , Models, Genetic , Nuclear Family , Algorithms , Humans , Monte Carlo Method , Pedigree , Siblings
12.
BMC Genet ; 13: 94, 2012 Oct 30.
Article in English | MEDLINE | ID: mdl-23110720

ABSTRACT

BACKGROUND: Typically, the first phase of a genome wide association study (GWAS) includes genotyping across hundreds of individuals and validation of the most significant SNPs. Allelotyping of pooled genomic DNA is a common approach to reduce the overall cost of the study. Knowledge of haplotype structure can provide additional information to single locus analyses. Several methods have been proposed for estimating haplotype frequencies in a population from pooled DNA data. RESULTS: We introduce a technique for haplotype frequency estimation in a population from pooled DNA samples focusing on datasets containing a small number of individuals per pool (2 or 3 individuals) and a large number of markers. We compare our method with the publicly available state-of-the-art algorithms HIPPO and HAPLOPOOL on datasets of varying number of pools and marker sizes. We demonstrate that our algorithm provides improvements in terms of accuracy and computational time over competing methods for large number of markers while demonstrating comparable performance for smaller marker sizes. Our method is implemented in the "Tree-Based Deterministic Sampling Pool" (TDSPool) package which is available for download at http://www.ee.columbia.edu/~anastas/tdspool. CONCLUSIONS: Using a tree-based determinstic sampling technique we present an algorithm for haplotype frequency estimation from pooled data. Our method demonstrates superior performance in datasets with large number of markers and could be the method of choice for haplotype frequency estimation in such datasets.


Subject(s)
Algorithms , Gene Frequency , Haplotypes , DNA , Databases, Genetic , Gene Pool , Genetic Markers , Genome-Wide Association Study , Humans , Models, Genetic
13.
Hum Genet ; 129(2): 161-76, 2011 Feb.
Article in English | MEDLINE | ID: mdl-21076979

ABSTRACT

The human leukocyte antigen (HLA) class II genes HLA-DRB1, -DQA1 and -DQB1 are the strongest genetic factors for type 1 diabetes (T1D). Additional loci in the major histocompatibility complex (MHC) are difficult to identify due to the region's high gene density and complex linkage disequilibrium (LD). To facilitate the association analysis, two novel algorithms were implemented in this study: one for phasing the multi-allelic HLA genotypes in trio families, and one for partitioning the HLA strata in conditional testing. Screening and replication were performed on two large and independent datasets: the Wellcome Trust Case-Control Consortium (WTCCC) dataset of 2,000 cases and 1,504 controls, and the T1D Genetics Consortium (T1DGC) dataset of 2,300 nuclear families. After imputation, the two datasets have 1,941 common SNPs in the MHC, of which 22 were successfully tested and replicated based on the statistical testing stratifying on the detailed DRB1 and DQB1 genotypes. Further conditional tests using the combined dataset confirmed eight novel SNP associations around 31.3 Mb on chromosome 6 (rs3094663, p = 1.66 × 10(-11) and rs2523619, p = 2.77 × 10(-10) conditional on the DR/DQ genotypes). A subsequent LD analysis established TCF19, POU5F1, CCHCR1 and PSORS1C1 as potential causal genes for the observed association.


Subject(s)
Diabetes Mellitus, Type 1/genetics , Polymorphism, Single Nucleotide , Transcription Factors/genetics , Case-Control Studies , Female , Humans , Intracellular Signaling Peptides and Proteins/genetics , Male , Octamer Transcription Factor-3/genetics , Proteins/genetics
14.
BMC Cancer ; 11: 529, 2011 Dec 30.
Article in English | MEDLINE | ID: mdl-22208948

ABSTRACT

BACKGROUND: The biological mechanisms underlying cancer cell motility and invasiveness remain unclear, although it has been hypothesized that they involve some type of epithelial-mesenchymal transition (EMT). METHODS: We used xenograft models of human cancer cells in immunocompromised mice, profiling the harvested tumors separately with species-specific probes and computationally analyzing the results. RESULTS: Here we show that human cancer cells express in vivo a precise multi-cancer invasion-associated gene expression signature that prominently includes many EMT markers, among them the transcription factor Slug, fibronectin, and α-SMA. We found that human, but not mouse, cells express the signature and Slug is the only upregulated EMT-inducing transcription factor. The signature is also present in samples from many publicly available cancer gene expression datasets, suggesting that it is produced by the cancer cells themselves in multiple cancer types, including nonepithelial cancers such as neuroblastoma. Furthermore, we found that the presence of the signature in human xenografted cells was associated with a downregulation of adipocyte markers in the mouse tissue adjacent to the invasive tumor, suggesting that the signature is triggered by contextual microenvironmental interactions when the cancer cells encounter adipocytes, as previously reported. CONCLUSIONS: The known, precise and consistent gene composition of this cancer mesenchymal transition signature, particularly when combined with simultaneous analysis of the adjacent microenvironment, provides unique opportunities for shedding light on the underlying mechanisms of cancer invasiveness as well as identifying potential diagnostic markers and targets for metastasis-inhibiting therapeutics.


Subject(s)
Epithelial-Mesenchymal Transition/genetics , Neoplasms/metabolism , Transcription Factors/metabolism , Animals , Cell Line, Tumor , Collagen Type XI/metabolism , Gene Expression Profiling , Humans , Mice , Microarray Analysis , Neoplasm Invasiveness/genetics , Neoplasms/genetics , Reverse Transcriptase Polymerase Chain Reaction/methods , Snail Family Transcription Factors , Species Specificity
15.
IEEE/ACM Trans Comput Biol Bioinform ; 18(6): 2271-2280, 2021.
Article in English | MEDLINE | ID: mdl-32070995

ABSTRACT

Bulk samples of the same patient are heterogeneous in nature, comprising of different subpopulations (subclones) of cancer cells. Cells in a tumor subclone are characterized by unique mutational genotype profile. Resolving tumor heterogeneity by estimating the genotypes, cellular proportions and the number of subclones present in the tumor can help in understanding cancer progression and treatment. We present a novel method, ChaClone2, to efficiently deconvolve the observed variant allele fractions (VAFs), with consideration for possible effects from copy number aberrations at the mutation loci. Our method describes a state-space formulation of the feature allocation model, deconvolving the observed VAFs from samples of the same patient into three matrices: subclonal total and variant copy numbers for mutated genes, and proportions of subclones in each sample. We describe an efficient sequential Monte Carlo (SMC) algorithm to estimate these matrices. Extensive simulation shows that the ChaClone2 yields better accuracy when compared with other state-of-the-art methods for addressing similar problem and it offers scalability to large datasets. Also, ChaClone2 features that the model parameter estimates can be refined whenever new mutation data of freshly sequenced genomic locations are available. MATLAB code and datasets are available to download at: https://github.com/moyanre/method2.


Subject(s)
Computational Biology/methods , DNA Copy Number Variations/genetics , Mutation/genetics , Neoplasms/genetics , Algorithms , Bayes Theorem , Genetic Heterogeneity , Humans , Monte Carlo Method , Stochastic Processes
16.
Bioinformatics ; 25(11): 1445-6, 2009 Jun 01.
Article in English | MEDLINE | ID: mdl-19297347

ABSTRACT

SUMMARY: We present a visualization tool applied on genome-wide association data, revealing disease-associated haplotypes, epistatically interacting loci, as well as providing visual signatures of multivariate correlations of genetic markers with respect to a phenotype. AVAILABILITY: Freely available on the web at: (http://www.ee.columbia.edu/~anastas/sdplots).


Subject(s)
Computational Biology/methods , Computer Graphics , Phenotype , Polymorphism, Single Nucleotide/genetics , Computer Graphics/standards , Genome-Wide Association Study , Haplotypes , Software , User-Computer Interface
17.
BMC Genet ; 11: 78, 2010 Aug 23.
Article in English | MEDLINE | ID: mdl-20727218

ABSTRACT

BACKGROUND: In genome-wide association studies, thousands of individuals are genotyped in hundreds of thousands of single nucleotide polymorphisms (SNPs). Statistical power can be increased when haplotypes, rather than three-valued genotypes, are used in analysis, so the problem of haplotype phase inference (phasing) is particularly relevant. Several phasing algorithms have been developed for data from unrelated individuals, based on different models, some of which have been extended to father-mother-child "trio" data. RESULTS: We introduce a technique for phasing trio datasets using a tree-based deterministic sampling scheme. We have compared our method with publicly available algorithms PHASE v2.1, BEAGLE v3.0.2 and 2SNP v1.7 on datasets of varying number of markers and trios. We have found that the computational complexity of PHASE makes it prohibitive for routine use; on the other hand 2SNP, though the fastest method for small datasets, was significantly inaccurate. We have shown that our method outperforms BEAGLE in terms of speed and accuracy for small to intermediate dataset sizes in terms of number of trios for all marker sizes examined. Our method is implemented in the "Tree-Based Deterministic Sampling" (TDS) package, available for download at http://www.ee.columbia.edu/~anastas/tds CONCLUSIONS: Using a Tree-Based Deterministic sampling technique, we present an intuitive and conceptually simple phasing algorithm for trio data. The trade off between speed and accuracy achieved by our algorithm makes it a strong candidate for routine use on trio datasets.


Subject(s)
Algorithms , Genome-Wide Association Study/methods , Haplotypes , Humans , Models, Genetic , Polymorphism, Single Nucleotide
18.
Sci Rep ; 10(1): 17199, 2020 10 14.
Article in English | MEDLINE | ID: mdl-33057153

ABSTRACT

Analysis of large gene expression datasets from biopsies of cancer patients can identify co-expression signatures representing particular biomolecular events in cancer. Some of these signatures involve genomically co-localized genes resulting from the presence of copy number alterations (CNAs), for which analysis of the expression of the underlying genes provides valuable information about their combined role as oncogenes or tumor suppressor genes. Here we focus on the discovery and interpretation of such signatures that are present in multiple cancer types due to driver amplifications and deletions in particular regions of the genome after doing a comprehensive analysis combining both gene expression and CNA data from The Cancer Genome Atlas.


Subject(s)
DNA Copy Number Variations/genetics , Neoplasms/genetics , Oncogenes/genetics , Data Analysis , Gene Dosage/genetics , Gene Expression/genetics , Genomics/methods , Humans
19.
Nat Biotechnol ; 38(1): 97-107, 2020 01.
Article in English | MEDLINE | ID: mdl-31919445

ABSTRACT

Tumor DNA sequencing data can be interpreted by computational methods that analyze genomic heterogeneity to infer evolutionary dynamics. A growing number of studies have used these approaches to link cancer evolution with clinical progression and response to therapy. Although the inference of tumor phylogenies is rapidly becoming standard practice in cancer genome analyses, standards for evaluating them are lacking. To address this need, we systematically assess methods for reconstructing tumor subclonality. First, we elucidate the main algorithmic problems in subclonal reconstruction and develop quantitative metrics for evaluating them. Then we simulate realistic tumor genomes that harbor all known clonal and subclonal mutation types and processes. Finally, we benchmark 580 tumor reconstructions, varying tumor read depth, tumor type and somatic variant detection. Our analysis provides a baseline for the establishment of gold-standard methods to analyze tumor heterogeneity.


Subject(s)
Algorithms , Neoplasms/pathology , Clone Cells , Computer Simulation , DNA Copy Number Variations/genetics , Gene Dosage , Genome , Humans , Mutation/genetics , Neoplasms/genetics , Polymorphism, Single Nucleotide/genetics , Reference Standards
20.
Bioinformatics ; 24(1): 46-55, 2008 Jan 01.
Article in English | MEDLINE | ID: mdl-18024972

ABSTRACT

MOTIVATION: Conserved motifs often represent biological significance, providing insight on biological aspects such as gene transcription regulation, biomolecular secondary structure, presence of non-coding RNAs and evolution history. With the increasing number of sequenced genomic data, faster and more accurate tools are needed to automate the process of motif discovery. RESULTS: We propose a deterministic sequential Monte Carlo (DSMC) motif discovery technique based on the position weight matrix (PWM) model to locate conserved motifs in a given set of nucleotide sequences, and extend our model to search for instances of the motif with insertions/deletions. We show that the proposed method can be used to align the motif where there are insertions and deletions found in different instances of the motif, which cannot be satisfactorily done using other multiple alignment and motif discovery algorithms. AVAILABILITY: MATLAB code is available at http://www.ee.columbia.edu/~kcliang


Subject(s)
Algorithms , DNA/genetics , Pattern Recognition, Automated/methods , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Base Sequence , Computer Simulation , Models, Genetic , Models, Statistical , Molecular Sequence Data , Monte Carlo Method
SELECTION OF CITATIONS
SEARCH DETAIL