RESUMEN
Formalin-fixed, paraffin-embedded (FFPE) material tends to yield degraded DNA and is thus suboptimal for use in many downstream applications. We describe an integrated analysis of genotype, loss of heterozygosity (LOH), and copy number for DNA derived from FFPE tissues using oligonucleotide microarrays containing over 500K single nucleotide polymorphisms. A prequalifying PCR test predicted the performance of FFPE DNA on the microarrays better than age of FFPE sample. Although genotyping efficiency and reliability were reduced for FFPE DNA when compared with fresh samples, closer examination revealed methods to improve performance at the expense of variable reduction in resolution. Important steps were also identified that enable equivalent copy number and LOH profiles from paired FFPE and fresh frozen tumor samples. In conclusion, we have shown that the Mapping 500K arrays can be used with FFPE-derived samples to produce genotype, copy number, and LOH predictions, and we provide guidelines and suggestions for application of these samples to this integrated system.
Asunto(s)
Genoma Humano , Pérdida de Heterocigocidad , Neoplasias/genética , Neoplasias de la Mama/genética , Neoplasias de la Mama/patología , Carcinoma Endometrioide/genética , Mapeo Cromosómico , Neoplasias Colorrectales/genética , Neoplasias Colorrectales/patología , ADN de Neoplasias/genética , ADN de Neoplasias/aislamiento & purificación , Femenino , Formaldehído , Dosificación de Gen , Genotipo , Humanos , Repeticiones de Microsatélite , Neoplasias/patología , Análisis de Secuencia por Matrices de Oligonucleótidos , Neoplasias Ováricas/genética , Adhesión en Parafina , Reacción en Cadena de la Polimerasa , Fijación del TejidoRESUMEN
BACKGROUND: DNA copy number aberration (CNA) is one of the key characteristics of cancer cells. Recent studies demonstrated the feasibility of utilizing high density single nucleotide polymorphism (SNP) genotyping arrays to detect CNA. Compared with the two-color array-based comparative genomic hybridization (array-CGH), the SNP arrays offer much higher probe density and lower signal-to-noise ratio at the single SNP level. To accurately identify small segments of CNA from SNP array data, segmentation methods that are sensitive to CNA while resistant to noise are required. RESULTS: We have developed a highly sensitive algorithm for the edge detection of copy number data which is especially suitable for the SNP array-based copy number data. The method consists of an over-sensitive edge-detection step and a test-based forward-backward edge selection step. CONCLUSION: Using simulations constructed from real experimental data, the method shows high sensitivity and specificity in detecting small copy number changes in focused regions. The method is implemented in an R package FASeg, which includes data processing and visualization utilities, as well as libraries for processing Affymetrix SNP array data.
Asunto(s)
Algoritmos , Rotura Cromosómica , Amplificación de Genes/genética , Eliminación de Gen , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Polimorfismo de Nucleótido Simple/genética , Línea Celular Tumoral , Genoma Humano/genética , HumanosRESUMEN
MOTIVATION: The identification of signatures of positive selection can provide important insights into recent evolutionary history in human populations. Current methods mostly rely on allele frequency determination or focus on one or a small number of candidate chromosomal regions per study. With the availability of large-scale genotype data, efficient approaches for an unbiased whole genome scan are becoming necessary. METHODS: We have developed a new method, the whole genome long-range haplotype test (WGLRH), which uses genome-wide distributions to test for recent positive selection. Adapted from the long-range haplotype (LRH) test, the WGLRH test uses patterns of linkage disequilibrium (LD) to identify regions with extremely low historic recombination. Common haplotypes with significantly longer than expected ranges of LD given their frequencies are identified as putative signatures of recent positive selection. In addition, we have also determined the ancestral alleles of SNPs by genotyping chimpanzee and gorilla DNA, and have identified SNPs where the non-ancestral alleles have risen to extremely high frequencies in human populations, termed 'flipped SNPs'. Combining the haplotype test and the flipped SNPs determination, the WGLRH test serves as an unbiased genome-wide screen for regions under putative selection, and is potentially applicable to the study of other human populations. RESULTS: Using WGLRH and high-density oligonucleotide arrays interrogating 116 204 SNPs, we rapidly identified putative regions of positive selection in three populations (Asian, Caucasian, African-American), and extended these observations to a fourth population, Yoruba, with data obtained from the International HapMap consortium. We mapped significant regions to annotated genes. While some regions overlap with genes previously suggested to be under positive selection, many of the genes have not been previously implicated in natural selection and offer intriguing possibilities for further study. AVAILABILITY: the programs for the WGLRH algorithm are freely available and can be downloaded at http://www.affymetrix.com/support/supplement/WGLRH_program.zip.
Asunto(s)
Evolución Biológica , Mapeo Cromosómico/métodos , Variación Genética/genética , Genética de Población , Genoma Humano/genética , Impresión Genómica/genética , Selección Genética , Algoritmos , Animales , Evolución Molecular , Humanos , Polimorfismo de Nucleótido Simple/genética , Análisis de Secuencia de ADN/métodos , Programas InformáticosRESUMEN
We have developed a robust algorithm for copy number analysis of the human genome using high-density oligonucleotide microarrays containing 116,204 single-nucleotide polymorphisms. The advantages of this algorithm include the improvement of signal-to-noise (S/N) ratios and the use of an optimized reference. The raw S/N ratios were improved by accounting for the length and GC content of the PCR products using quadratic regressions. The use of constitutional DNA, when available, gives the lowest SD values (0.16 +/- 0.03) and also enables allele-based copy number detection in cancer genomes, which can unmask otherwise concealed allelic imbalances. In the absence of constitutional DNA, optimized selection of multiple normal references with the highest S/N ratios, in combination with the data regressions, dramatically improves SD values from 0.67 +/- 0.12 to 0.18 +/- 0.03. These improvements allow for highly reliable comparison of data across different experimental conditions, detection of allele-based copy number changes, and more accurate estimations of the range and magnitude of copy number aberrations. This algorithm has been implemented in a software package called Copy Number Analyzer for Affymetrix GeneChip Mapping 100K arrays (CNAG). Overall, these enhancements make CNAG a useful tool for high-resolution detection of copy number alterations which can help in the understanding of the pathogenesis of cancers and other diseases as well as in exploring the complexities of the human genome.
Asunto(s)
Algoritmos , Dosificación de Gen , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Alelos , Línea Celular Tumoral , Genoma Humano , Genotipo , Humanos , Pérdida de Heterocigocidad , Neoplasias Pulmonares/genética , Polimorfismo de Nucleótido Simple , Valores de Referencia , Reproducibilidad de los Resultados , Procesamiento de Señales Asistido por ComputadorRESUMEN
The cause of mental retardation in one-third to one-half of all affected individuals is unknown. Microscopically detectable chromosomal abnormalities are the most frequently recognized cause, but gain or loss of chromosomal segments that are too small to be seen by conventional cytogenetic analysis has been found to be another important cause. Array-based methods offer a practical means of performing a high-resolution survey of the entire genome for submicroscopic copy-number variants. We studied 100 children with idiopathic mental retardation and normal results of standard chromosomal analysis, by use of whole-genome sampling analysis with Affymetrix GeneChip Human Mapping 100K arrays. We found de novo deletions as small as 178 kb in eight cases, de novo duplications as small as 1.1 Mb in two cases, and unsuspected mosaic trisomy 9 in another case. This technology can detect at least twice as many potentially pathogenic de novo copy-number variants as conventional cytogenetic analysis can in people with mental retardation.
Asunto(s)
Aberraciones Cromosómicas , Discapacidad Intelectual/diagnóstico , Análisis de Secuencia por Matrices de Oligonucleótidos , Niño , Dosificación de Gen , Genoma Humano , Humanos , Eliminación de SecuenciaRESUMEN
Mutation of the human genome ranges from single base-pair changes to whole-chromosome aneuploidy. Karyotyping, fluorescence in situ hybridization, and comparative genome hybridization are currently used to detect chromosome abnormalities of clinical significance. These methods, although powerful, suffer from limitations in speed, ease of use, and resolution, and they do not detect copy-neutral chromosomal aberrations--for example, uniparental disomy (UPD). We have developed a high-throughput approach for assessment of DNA copy-number changes, through use of high-density synthetic oligonucleotide arrays containing 116,204 single-nucleotide polymorphisms, spaced at an average distance of 23.6 kb across the genome. Using this approach, we analyzed samples that failed conventional karyotypic analysis, and we detected amplifications and deletions across a wide range of sizes (1.3-145.9 Mb), identified chromosomes containing anonymous chromatin, and used genotype data to determine the molecular origin of two cases of UPD. Furthermore, our data provided independent confirmation for a case that had been misinterpreted by karyotype analysis. The high resolution of our approach provides more-precise breakpoint mapping, which allows subtle phenotypic heterogeneity to be distinguished at a molecular level. The accurate genotype information provided on these arrays enables the identification of copy-neutral loss-of-heterozygosity events, and the minimal requirement of DNA (250 ng per array) allows rapid analysis of samples without the need for cell culture. This technology overcomes many limitations currently encountered in routine clinical diagnostic laboratories tasked with accurate and rapid diagnosis of chromosomal abnormalities.
Asunto(s)
Aberraciones Cromosómicas , Mapeo Cromosómico , Genoma Humano , Análisis de Secuencia por Matrices de Oligonucleótidos , Polimorfismo de Nucleótido Simple/genética , Cromosomas Humanos , ADN/análisis , HumanosRESUMEN
We mapped histone H3 lysine 4 di- and trimethylation and lysine 9/14 acetylation across the nonrepetitive portions of human chromosomes 21 and 22 and compared patterns of lysine 4 dimethylation for several orthologous human and mouse loci. Both chromosomes show punctate sites enriched for modified histones. Sites showing trimethylation correlate with transcription starts, while those showing mainly dimethylation occur elsewhere in the vicinity of active genes. Punctate methylation patterns are also evident at the cytokine and IL-4 receptor loci. The Hox clusters present a strikingly different picture, with broad lysine 4-methylated regions that overlay multiple active genes. We suggest these regions represent active chromatin domains required for the maintenance of Hox gene expression. Methylation patterns at orthologous loci are strongly conserved between human and mouse even though many methylated sites do not show sequence conservation notably higher than background. This suggests that the DNA elements that direct the methylation represent only a small fraction of the region or lie at some distance from the site.
Asunto(s)
Cromatina/genética , Cromosomas Humanos Par 21/genética , Cromosomas Humanos Par 22/genética , Histonas/genética , Proteínas de Homeodominio/genética , Acetilación , Animales , Cromatina/metabolismo , Mapeo Cromosómico/métodos , Cromosomas Humanos Par 21/metabolismo , Cromosomas Humanos Par 22/metabolismo , Genoma , Histonas/metabolismo , Proteínas de Homeodominio/metabolismo , Humanos , Lisina/metabolismo , Metilación , Ratones , Receptores de Interleucina-4/genéticaRESUMEN
Sites of transcription of polyadenylated and nonpolyadenylated RNAs for 10 human chromosomes were mapped at 5-base pair resolution in eight cell lines. Unannotated, nonpolyadenylated transcripts comprise the major proportion of the transcriptional output of the human genome. Of all transcribed sequences, 19.4, 43.7, and 36.9% were observed to be polyadenylated, nonpolyadenylated, and bimorphic, respectively. Half of all transcribed sequences are found only in the nucleus and for the most part are unannotated. Overall, the transcribed portions of the human genome are predominantly composed of interlaced networks of both poly A+ and poly A- annotated transcripts and unannotated transcripts of unknown function. This organization has important implications for interpreting genotype-phenotype associations, regulation of gene expression, and the definition of a gene.