Pesquisa | Prevenção e Controle de Câncer

Inferring copy number and genotype in tumour exome data.

Amarasinghe, Kaushalya C; Li, Jason; Hunter, Sally M; Ryland, Georgina L; Cowin, Prue A; Campbell, Ian G; Halgamuge, Saman K.

BMC Genomics ; 15: 732, 2014 Aug 28.

Artigo em Inglês | MEDLINE | ID: mdl-25167919

RESUMO

BACKGROUND: Using whole exome sequencing to predict aberrations in tumours is a cost effective alternative to whole genome sequencing, however is predominantly used for variant detection and infrequently utilised for detection of somatic copy number variation. RESULTS: We propose a new method to infer copy number and genotypes using whole exome data from paired tumour/normal samples. Our algorithm uses two Hidden Markov Models to predict copy number and genotypes and computationally resolves polyploidy/aneuploidy, normal cell contamination and signal baseline shift. Our method makes explicit detection on chromosome arm level events, which are commonly found in tumour samples. The methods are combined into a package named ADTEx (Aberration Detection in Tumour Exome). We applied our algorithm to a cohort of 17 in-house generated and 18 TCGA paired ovarian cancer/normal exomes and evaluated the performance by comparing against the copy number variations and genotypes predicted using Affymetrix SNP 6.0 data of the same samples. Further, we carried out a comparison study to show that ADTEx outperformed its competitors in terms of precision and F-measure. CONCLUSIONS: Our proposed method, ADTEx, uses both depth of coverage ratios and B allele frequencies calculated from whole exome sequencing data, to predict copy number variations along with their genotypes. ADTEx is implemented as a user friendly software package using Python and R statistical language. Source code and sample data are freely available under GNU license (GPLv3) at http://adtex.sourceforge.net/.

Assuntos

Variações do Número de Cópias de DNA , Exoma , Genótipo , Neoplasias/genética , Algoritmos , Aberrações Cromossômicas , Biologia Computacional/métodos , Feminino , Genômica/métodos , Técnicas de Genotipagem , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Perda de Heterozigosidade , Neoplasias Ovarianas/genética , Polimorfismo de Nucleotídeo Único , Poliploidia , Reprodutibilidade dos Testes , Sensibilidade e Especificidade

Bioinformatics pipelines for targeted resequencing and whole-exome sequencing of human and mouse genomes: a virtual appliance approach for instant deployment.

Li, Jason; Doyle, Maria A; Saeed, Isaam; Wong, Stephen Q; Mar, Victoria; Goode, David L; Caramia, Franco; Doig, Ken; Ryland, Georgina L; Thompson, Ella R; Hunter, Sally M; Halgamuge, Saman K; Ellul, Jason; Dobrovic, Alexander; Campbell, Ian G; Papenfuss, Anthony T; McArthur, Grant A; Tothill, Richard W.

PLoS One ; 9(4): e95217, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-24752294

RESUMO

Targeted resequencing by massively parallel sequencing has become an effective and affordable way to survey small to large portions of the genome for genetic variation. Despite the rapid development in open source software for analysis of such data, the practical implementation of these tools through construction of sequencing analysis pipelines still remains a challenging and laborious activity, and a major hurdle for many small research and clinical laboratories. We developed TREVA (Targeted REsequencing Virtual Appliance), making pre-built pipelines immediately available as a virtual appliance. Based on virtual machine technologies, TREVA is a solution for rapid and efficient deployment of complex bioinformatics pipelines to laboratories of all sizes, enabling reproducible results. The analyses that are supported in TREVA include: somatic and germline single-nucleotide and insertion/deletion variant calling, copy number analysis, and cohort-based analyses such as pathway and significantly mutated genes analyses. TREVA is flexible and easy to use, and can be customised by Linux-based extensions if required. TREVA can also be deployed on the cloud (cloud computing), enabling instant access without investment overheads for additional hardware. TREVA is available at http://bioinformatics.petermac.org/treva/.

Assuntos

Biologia Computacional/métodos , Exoma/genética , Genoma Humano/genética , Análise de Sequência de DNA , Interface Usuário-Computador , Animais , Humanos , Melanoma/genética , Camundongos , Mutação/genética

CoNVEX: copy number variation estimation in exome sequencing data using HMM.

Amarasinghe, Kaushalya C; Li, Jason; Halgamuge, Saman K.

BMC Bioinformatics ; 14 Suppl 2: S2, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23368785

RESUMO

BACKGROUND: One of the main types of genetic variations in cancer is Copy Number Variations (CNV). Whole exome sequencing (WES) is a popular alternative to whole genome sequencing (WGS) to study disease specific genomic variations. However, finding CNV in Cancer samples using WES data has not been fully explored. RESULTS: We present a new method, called CoNVEX, to estimate copy number variation in whole exome sequencing data. It uses ratio of tumour and matched normal average read depths at each exonic region, to predict the copy gain or loss. The useful signal produced by WES data will be hindered by the intrinsic noise present in the data itself. This limits its capacity to be used as a highly reliable CNV detection source. Here, we propose a method that consists of discrete wavelet transform (DWT) to reduce noise. The identification of copy number gains/losses of each targeted region is performed by a Hidden Markov Model (HMM). CONCLUSION: HMM is frequently used to identify CNV in data produced by various technologies including Array Comparative Genomic Hybridization (aCGH) and WGS. Here, we propose an HMM to detect CNV in cancer exome data. We used modified data from 1000 Genomes project to evaluate the performance of the proposed method. Using these data we have shown that CoNVEX outperforms the existing methods significantly in terms of precision. Overall, CoNVEX achieved a sensitivity of more than 92% and a precision of more than 50%.

Assuntos

Variações do Número de Cópias de DNA , Exoma , Neoplasias/genética , Hibridização Genômica Comparativa , Éxons , Genômica/métodos , Humanos , Cadeias de Markov , Modelos Estatísticos

Genome classification by gene distribution: an overlapping subspace clustering approach.

Li, Jason; Halgamuge, Saman K; Tang, Sen-Lin.

BMC Evol Biol ; 8: 116, 2008 Apr 23.

Artigo em Inglês | MEDLINE | ID: mdl-18430250

RESUMO

BACKGROUND: Genomes of lower organisms have been observed with a large amount of horizontal gene transfers, which cause difficulties in their evolutionary study. Bacteriophage genomes are a typical example. One recent approach that addresses this problem is the unsupervised clustering of genomes based on gene order and genome position, which helps to reveal species relationships that may not be apparent from traditional phylogenetic methods. RESULTS: We propose the use of an overlapping subspace clustering algorithm for such genome classification problems. The advantage of subspace clustering over traditional clustering is that it can associate clusters with gene arrangement patterns, preserving genomic information in the clusters produced. Additionally, overlapping capability is desirable for the discovery of multiple conserved patterns within a single genome, such as those acquired from different species via horizontal gene transfers. The proposed method involves a novel strategy to vectorize genomes based on their gene distribution. A number of existing subspace clustering and biclustering algorithms were evaluated to identify the best framework upon which to develop our algorithm; we extended a generic subspace clustering algorithm called HARP to incorporate overlapping capability. The proposed algorithm was assessed and applied on bacteriophage genomes. The phage grouping results are consistent overall with the Phage Proteomic Tree and showed common genomic characteristics among the TP901-like, Sfi21-like and sk1-like phage groups. Among 441 phage genomes, we identified four significantly conserved distribution patterns structured by the terminase, portal, integrase, holin and lysin genes. We also observed a subgroup of Sfi21-like phages comprising a distinctive divergent genome organization and identified nine new phage members to the Sfi21-like genus: Staphylococcus 71, phiPVL108, Listeria A118, 2389, Lactobacillus phi AT3, A2, Clostridium phi3626, Geobacillus GBSV1, and Listeria monocytogenes PSA. CONCLUSION: The method described in this paper can assist evolutionary study through objectively classifying genomes based on their resemblance in gene order, gene content and gene positions. The method is suitable for application to genomes with high genetic exchange and various conserved gene arrangement, as demonstrated through our application on phages.

Assuntos

Análise por Conglomerados , Genoma , Modelos Genéticos , Proteômica/métodos , Algoritmos , Bacteriófagos/genética , Ordem dos Genes , Genoma Viral , Reconhecimento Automatizado de Padrão , Biblioteca de Peptídeos

An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data.

Hsu, Arthur L; Tang, Sen-Lin; Halgamuge, Saman K.

Bioinformatics ; 19(16): 2131-40, 2003 Nov 01.

Artigo em Inglês | MEDLINE | ID: mdl-14594719

RESUMO

MOTIVATION: Current Self-Organizing Maps (SOMs) approaches to gene expression pattern clustering require the user to predefine the number of clusters likely to be expected. Hierarchical clustering methods used in this area do not provide unique partitioning of data. We describe an unsupervised dynamic hierarchical self-organizing approach, which suggests an appropriate number of clusters, to perform class discovery and marker gene identification in microarray data. In the process of class discovery, the proposed algorithm identifies corresponding sets of predictor genes that best distinguish one class from other classes. The approach integrates merits of hierarchical clustering with robustness against noise known from self-organizing approaches. RESULTS: The proposed algorithm applied to DNA microarray data sets of two types of cancers has demonstrated its ability to produce the most suitable number of clusters. Further, the corresponding marker genes identified through the unsupervised algorithm also have a strong biological relationship to the specific cancer class. The algorithm tested on leukemia microarray data, which contains three leukemia types, was able to determine three major and one minor cluster. Prediction models built for the four clusters indicate that the prediction strength for the smaller cluster is generally low, therefore labelled as uncertain cluster. Further analysis shows that the uncertain cluster can be subdivided further, and the subdivisions are related to two of the original clusters. Another test performed using colon cancer microarray data has automatically derived two clusters, which is consistent with the number of classes in data (cancerous and normal). AVAILABILITY: JAVA software of dynamic SOM tree algorithm is available upon request for academic use. SUPPLEMENTARY INFORMATION: A comparison of rectangular and hexagonal topologies for GSOM is available from http://www.mame.mu.oz.au/mechatronics/journalinfo/Hsu2003supp.pdf

Assuntos

Algoritmos , Biomarcadores Tumorais/genética , Neoplasias do Colo/classificação , Neoplasias do Colo/genética , Perfilação da Expressão Gênica/métodos , Leucemia/classificação , Leucemia/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise por Conglomerados , Regulação Neoplásica da Expressão Gênica/genética , Marcadores Genéticos/genética , Testes Genéticos/métodos , Humanos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA