RESUMO
Detailed comprehensive knowledge of the structures of individual long-range telomere-terminal haplotypes are needed to understand their impact on telomere function, and to delineate the population structure and evolution of subtelomere regions. However, the abundance of large evolutionarily recent segmental duplications and high levels of large structural variations have complicated both the mapping and sequence characterization of human subtelomere regions. Here, we use high throughput optical mapping of large single DNA molecules in nanochannel arrays for 154 human genomes from 26 populations to present a comprehensive look at human subtelomere structure and variation. The results catalog many novel long-range subtelomere haplotypes and determine the frequencies and contexts of specific subtelomeric duplicons on each chromosome arm, helping to clarify the currently ambiguous nature of many specific subtelomere structures as represented in the current reference sequence (HG38). The organization and content of some duplicons in subtelomeres appear to show both chromosome arm and population-specific trends. Based upon these trends we estimate a timeline for the spread of these duplication blocks.
Assuntos
Genoma Humano , População/genética , Telômero/genética , Evolução Molecular , Haplótipos , Humanos , Sequenciamento por Nanoporos/métodosRESUMO
BACKGROUND: Human subtelomeric DNA regulates the length and stability of adjacent telomeres that are critical for cellular function, and contains many gene/pseudogene families. Large evolutionarily recent segmental duplications and associated structural variation in human subtelomeres has made complete sequencing and assembly of these regions difficult to impossible for many loci, complicating or precluding a wide range of genetic analyses to investigate their function. RESULTS: We present a hybrid assembly method, NanoPore Guided REgional Assembly Tool (NPGREAT), which combines Linked-Read data with mapped ultralong nanopore reads spanning subtelomeric segmental duplications to potentially overcome these difficulties. Linked-Read sets of DNA sequences identified by matches with 1-copy subtelomere sequence adjacent to segmental duplications are assembled and extended into the segmental duplication regions using Regional Extension of Assemblies using Linked-Reads (REXTAL). Mapped telomere-containing ultralong nanopore reads are then used to provide contiguity and correct orientation for matching REXTAL sequence contigs as well as identification/correction of any misassemblies. Our method was tested for a subset of representative subtelomeres with ultralong nanopore read coverage in the haploid human cell line CHM13. A 10X Linked-Read dataset from CHM13 was combined with ultralong nanopore reads from the same genome to provide improved subtelomere assemblies. Comparison of Nanopore-only assemblies using SHASTA with our NPGREAT assemblies in the distal-most subtelomere regions showed that NPGREAT produced higher-quality and more complete assemblies than SHASTA alone when these regions had low ultralong nanopore coverage (such as cases where large segmental duplications were immediately adjacent to (TTAGGG) tracts). CONCLUSION: In genomic regions with large segmental duplications adjacent to telomeres, NPGREAT offers an alternative economical approach to improving assembly accuracy and coverage using linked-read datasets when more expensive HiFi datasets of 10-20 kb reads are unavailable.
Assuntos
Nanoporos , Humanos , Genômica , Telômero/genética , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodosRESUMO
BACKGROUND: Telomeric DNA is typically comprised of G-rich tandem repeat motifs and maintained by telomerase (Greider CW, Blackburn EH; Cell 51:887-898; 1987). In eukaryotes lacking telomerase, a variety of DNA repair and DNA recombination based pathways for telomere maintenance have evolved in organisms normally dependent upon telomerase for telomere elongation (Webb CJ, Wu Y, Zakian VA; Cold Spring Harb Perspect Biol 5:a012666; 2013); collectively called Alternative Lengthening of Telomeres (ALT) pathways. By measuring (TTAGGG) n tract lengths from the same large DNA molecules that were optically mapped, we simultaneously analyzed telomere length dynamics and subtelomere-linked structural changes at a large number of specific subtelomeric loci in the ALT-positive cell lines U2OS, SK-MEL-2 and Saos-2. RESULTS: Our results revealed loci-specific ALT telomere features. For example, while each subtelomere included examples of single molecules with terminal (TTAGGG) n tracts as well as examples of recombinant telomeric single molecules, the ratio of these molecules was subtelomere-specific, ranging from 33:1 (19p) to 1:25 (19q) in U2OS. The Saos-2 cell line shows a similar percentage of recombinant telomeres. The frequency of recombinant subtelomeres of SK-MEL-2 (11%) is about half that of U2OS and Saos-2 (24 and 19% respectively). Terminal (TTAGGG) n tract lengths and heterogeneity levels, the frequencies of telomere signal-free ends, and the frequency and size of retained internal telomere-like sequences (ITSs) at recombinant telomere fusion junctions all varied according to the specific subtelomere involved in a particular cell line. Very large linear extrachromosomal telomere repeat (ECTR) DNA molecules were found in all three cell lines; these are in principle capable of templating synthesis of new long telomere tracts via break-induced repair (BIR) long-tract DNA synthesis mechanisms and contributing to the very long telomere tract length and heterogeneity characteristic of ALT cells. Many of longest telomere tracts (both end-telomeres and linear ECTRs) displayed punctate CRISPR/Cas9-dependent (TTAGGG) n labeling patterns indicative of interspersion of stretches of non-canonical telomere repeats. CONCLUSION: Identifying individual subtelomeres and characterizing linked telomere (TTAGGG) n tract lengths and structural changes using our new single-molecule methodologies reveals the structural consequences of telomere damage, repair and recombination mechanisms in human ALT cells in unprecedented molecular detail and significant differences in different ALT-positive cell lines.
Assuntos
Homeostase do Telômero , Telômero/química , Linhagem Celular Tumoral , DNA/química , Humanos , Sequências Repetitivas de Ácido NucleicoRESUMO
Telomeres and tumor suppressor protein TP53 (p53) function in genome protection, but a direct role of p53 at telomeres has not yet been described. Here, we have identified non-canonical p53-binding sites within the human subtelomeres that suppress the accumulation of DNA damage at telomeric repeat DNA. These non-canonical subtelomeric p53-binding sites conferred transcription enhancer-like functions that include an increase in local histone H3K9 and H3K27 acetylation and stimulation of subtelomeric transcripts, including telomere repeat-containing RNA (TERRA). p53 suppressed formation of telomere-associated γH2AX and prevented telomere DNA degradation in response to DNA damage stress. Our findings indicate that p53 provides a direct chromatin-associated protection to human telomeres, as well as other fragile genomic sites. We propose that p53-associated chromatin modifications enhance local DNA repair or protection to provide a previously unrecognized tumor suppressor function of p53.
Assuntos
Proteínas de Transporte/metabolismo , Dano ao DNA/genética , Telômero/metabolismo , Proteína Supressora de Tumor p53/metabolismo , Proteínas de Transporte/genética , Células HCT116 , Humanos , Ligação Proteica , Telômero/genética , Proteína Supressora de Tumor p53/genéticaRESUMO
We have developed a novel method that enables global subtelomere and haplotype-resolved analysis of telomere lengths at the single-molecule level. An in vitro CRISPR/Cas9 RNA-directed nickase system directs the specific labeling of human (TTAGGG)n DNA tracts in genomes that have also been barcoded using a separate nickase enzyme that recognizes a 7-bp motif genome-wide. High-throughput imaging and analysis of large DNA single molecules from genomes labeled in this fashion using a nanochannel array system permits mapping through subtelomere repeat element (SRE) regions to unique chromosomal DNA while simultaneously measuring the (TTAGGG)n tract length at the end of each large telomere-terminal DNA segment. The methodology also permits subtelomere and haplotype-resolved analyses of SRE organization and variation, providing a window into the population dynamics and potential functions of these complex and structurally variant telomere-adjacent DNA regions. At its current stage of development, the assay can be used to identify and characterize telomere length distributions of 30-35 discrete telomeres simultaneously and accurately. The assay's utility is demonstrated using early versus late passage and senescent human diploid fibroblasts, documenting the anticipated telomere attrition on a global telomere-by-telomere basis as well as identifying subtelomere-specific biases for critically short telomeres. Similarly, we present the first global single-telomere-resolved analyses of two cancer cell lines.
Assuntos
Mapeamento Cromossômico/métodos , Imagem Individual de Molécula/métodos , Telômero/genética , Sistemas CRISPR-Cas , Linhagem Celular Tumoral , Desoxirribonuclease I/metabolismo , Humanos , Nanotecnologia , Encurtamento do TelômeroRESUMO
Accurate maps and DNA sequences for human subtelomere regions, along with detailed knowledge of subtelomere variation and long-range telomere-terminal haplotypes in individuals, are critical for understanding telomere function and its roles in human biology. Here, we use a highly automated whole genome mapping technology in nano-channel arrays to analyze large terminal human chromosome segments extending from chromosome-specific subtelomere sequences through subtelomeric repeat regions to terminal (TTAGGG)n repeat tracts. We establish detailed maps for subtelomere gap regions in the human reference sequence, detect many new large subtelomeric variants and demonstrate the feasibility of long-range haplotyping through segmentally duplicated subtelomere regions. These features make the method a uniquely valuable new tool for improving the quality of genome assemblies in complex DNA regions. Based on single molecule mapping of telomere-terminal DNA fragments, we provide proof of principle for a novel method to estimate telomere lengths linked to distinguishable telomeric haplotypes; this single-telomere genotyping method may ultimately enable delineation of human cis elements involved in telomere length regulation.
Assuntos
Mapeamento Cromossômico/métodos , Haplótipos , Telômero/genética , Automação , DNA , Estudos de Viabilidade , Variação Genética , Humanos , Sequências Repetitivas de Ácido NucleicoRESUMO
We have developed a new, sequence-specific DNA labeling strategy that will dramatically improve DNA mapping in complex and structurally variant genomic regions, as well as facilitate high-throughput automated whole-genome mapping. The method uses the Cas9 D10A protein, which contains a nuclease disabling mutation in one of the two nuclease domains of Cas9, to create a guide RNA-directed DNA nick in the context of an in vitro-assembled CRISPR-CAS9-DNA complex. Fluorescent nucleotides are then incorporated adjacent to the nicking site with a DNA polymerase to label the guide RNA-determined target sequences. This labeling strategy is very powerful in targeting repetitive sequences as well as in barcoding genomic regions and structural variants not amenable to current labeling methods that rely on uneven distributions of restriction site motifs in the DNA. Importantly, it renders the labeled double-stranded DNA available in long intact stretches for high-throughput analysis in nanochannel arrays as well as for lower throughput targeted analysis of labeled DNA regions using alternative methods for stretching and imaging the labeled long DNA molecules. Thus, this method will dramatically improve both automated high-throughput genome-wide mapping as well as targeted analyses of complex regions containing repetitive and structurally variant DNA.
Assuntos
Proteínas de Bactérias/química , Sistemas CRISPR-Cas , Mapeamento Cromossômico/métodos , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , DNA/química , Endonucleases/química , Marcação In Situ das Extremidades Cortadas/métodos , Substituição de Aminoácidos , Proteínas de Bactérias/genética , Proteína 9 Associada à CRISPR , Cromossomos Artificiais Bacterianos/química , Cromossomos Artificiais Bacterianos/metabolismo , DNA/genética , Desoxirribonuclease I/química , Desoxirribonuclease I/genética , Endonucleases/genética , Corantes Fluorescentes/química , Genoma Humano , HIV-1/química , HIV-1/genética , Humanos , Mutação , Plasmídeos/química , Plasmídeos/metabolismo , Estrutura Terciária de Proteína , RNA Guia de Cinetoplastídeos/química , RNA Guia de Cinetoplastídeos/genéticaRESUMO
Mapping genome-wide data to human subtelomeres has been problematic due to the incomplete assembly and challenges of low-copy repetitive DNA elements. Here, we provide updated human subtelomere sequence assemblies that were extended by filling telomere-adjacent gaps using clone-based resources. A bioinformatic pipeline incorporating multiread mapping for annotation of the updated assemblies using short-read data sets was developed and implemented. Annotation of subtelomeric sequence features as well as mapping of CTCF and cohesin binding sites using ChIP-seq data sets from multiple human cell types confirmed that CTCF and cohesin bind within 3 kb of the start of terminal repeat tracts at many, but not all, subtelomeres. CTCF and cohesin co-occupancy were also enriched near internal telomere-like sequence (ITS) islands and the nonterminal boundaries of subtelomere repeat elements (SREs) in transformed lymphoblastoid cell lines (LCLs) and human embryonic stem cell (ES) lines, but were not significantly enriched in the primary fibroblast IMR90 cell line. Subtelomeric CTCF and cohesin sites predicted by ChIP-seq using our bioinformatics pipeline (but not predicted when only uniquely mapping reads were considered) were consistently validated by ChIP-qPCR. The colocalized CTCF and cohesin sites in SRE regions are candidates for mediating long-range chromatin interactions in the transcript-rich SRE region. A public browser for the integrated display of short-read sequence-based annotations relative to key subtelomere features such as the start of each terminal repeat tract, SRE identity and organization, and subtelomeric gene models was established.
Assuntos
Proteínas de Ciclo Celular/genética , Proteínas Cromossômicas não Histona/genética , Genoma Humano , Proteínas Repressoras/genética , Telômero/genética , Sequências Repetidas Terminais , Sequência de Bases , Fator de Ligação a CCCTC , Linhagem Celular , Células-Tronco Embrionárias/metabolismo , Fibroblastos/metabolismo , Humanos , Anotação de Sequência Molecular/métodos , Dados de Sequência Molecular , Ligação Proteica , Proteínas Repressoras/metabolismo , CoesinasRESUMO
Telomere-repeat-encoding RNA (referred to as TERRA) has been identified as a potential component of yeast and mammalian telomeres. We show here that TERRA RNA interacts with several telomere-associated proteins, including telomere repeat factors 1 (TRF1) and 2 (TRF2), subunits of the origin recognition complex (ORC), heterochromatin protein 1 (HP1), histone H3 trimethyl K9 (H3 K9me3), and members of the DNA-damage-sensing pathway. siRNA depletion of TERRA caused an increase in telomere dysfunction-induced foci, aberrations in metaphase telomeres, and a loss of histone H3 K9me3 and ORC at telomere repeat DNA. Previous studies found that TRF2 amino-terminal GAR domain recruited ORC to telomeres. We now show that TERRA RNA can interact directly with the TRF2 GAR and ORC1 to form a stable ternary complex. We conclude that TERRA facilitates TRF2 interaction with ORC and plays a central role in telomere structural maintenance and heterochromatin formation.
Assuntos
Heterocromatina/metabolismo , Complexo de Reconhecimento de Origem/metabolismo , RNA Nuclear/metabolismo , Telômero/metabolismo , Proteína 2 de Ligação a Repetições Teloméricas/metabolismo , Sítios de Ligação , Homólogo 5 da Proteína Cromobox , Proteínas Cromossômicas não Histona/metabolismo , Aberrações Cromossômicas , Metilação de DNA , Células HCT116 , Histonas/metabolismo , Humanos , Estrutura Terciária de Proteína , Interferência de RNA , Telômero/ultraestrutura , Proteína 1 de Ligação a Repetições Teloméricas/metabolismo , Proteína 2 de Ligação a Repetições Teloméricas/genética , Fatores de Tempo , TransfecçãoRESUMO
The contribution of human subtelomeric DNA and chromatin organization to telomere integrity and chromosome end protection is not yet understood in molecular detail. Here, we show by ChIP-Seq that most human subtelomeres contain a CTCF- and cohesin-binding site within â¼1-2 kb of the TTAGGG repeat tract and adjacent to a CpG-islands implicated in TERRA transcription control. ChIP-Seq also revealed that RNA polymerase II (RNAPII) was enriched at sites adjacent to the CTCF sites and extending towards the telomere repeat tracts. Mutation of CTCF-binding sites in plasmid-borne promoters reduced transcriptional activity in an orientation-dependent manner. Depletion of CTCF by shRNA led to a decrease in TERRA transcription, and a loss of cohesin and RNAPII binding to the subtelomeres. Depletion of either CTCF or cohesin subunit Rad21 caused telomere-induced DNA damage foci (TIF) formation, and destabilized TRF1 and TRF2 binding to the TTAGGG proximal subtelomere DNA. These findings indicate that CTCF and cohesin are integral components of most human subtelomeres, and important for the regulation of TERRA transcription and telomere end protection.
Assuntos
Proteínas de Ciclo Celular/metabolismo , Cromatina/genética , Proteínas Cromossômicas não Histona/metabolismo , Proteínas de Ligação a DNA/genética , Regulação da Expressão Gênica , Proteínas Repressoras/metabolismo , Telômero/genética , Fatores de Transcrição/genética , Transcrição Gênica , Fator de Ligação a CCCTC , Proteínas de Ciclo Celular/genética , Células Cultivadas , Imunoprecipitação da Cromatina , Proteínas Cromossômicas não Histona/genética , Ilhas de CpG/genética , Ensaio de Desvio de Mobilidade Eletroforética , Imunofluorescência , Humanos , Luciferases/metabolismo , Neoplasias/genética , Neoplasias/patologia , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Fosfoproteínas/genética , Fosfoproteínas/metabolismo , Regiões Promotoras Genéticas/genética , RNA Polimerase II/genética , RNA Polimerase II/metabolismo , RNA Mensageiro/genética , Reação em Cadeia da Polimerase em Tempo Real , Proteínas Repressoras/genética , Reação em Cadeia da Polimerase Via Transcriptase Reversa , CoesinasRESUMO
BACKGROUND: The ring chromosome 20 syndrome (R20) is a rare genetic disorder associated with a refractory electroclinical epilepsy syndrome and variably expressed comorbidities of intellectual disability and dysmorphism. METHODS: To understand the structure and composition of the ring chromosome 20 (r(20)) in this patient cohort, blood specimens from 28 affected individuals were analysed by cytogenetic, fluorescence in situ hybridisation, and/or high resolution whole genome single nucleotide polymorphism array analysis. RESULTS: These studies revealed two distinct groups of patients. Group 1 (N=21) was mosaic for the r(20) and a normal cell line with no detectable deletions or duplications of chromosome 20 in either cell line. The mosaic nature of these rings suggests a postzygotic origin with formation of the ring by fusion of the telomeric regions with no apparent loss of subtelomeric or telomeric DNA. Group 2 (N=7) had non-mosaic ring chromosomes with a deletion at one or both ends of the chromosome, near the ring fusion point. The non-mosaic nature of these rings is consistent with a meiotic origin. The age of onset of seizures was significantly lower in the non-mosaic patients (group 2, median age of onset 2.1 years) than in the mosaic patients (group 1, median age of onset 6.0 years). Patients from group 2 had more extensive comorbidities. CONCLUSIONS: These studies demonstrate that r(20) is molecularly heterogeneous and formed by two distinct mechanisms, which, in turn, produce different phenotypic spectrums.
Assuntos
Cromossomos Humanos Par 20/genética , Cromossomos em Anel , Idade de Início , Células Cultivadas , Bandeamento Cromossômico , Deleção Cromossômica , Humanos , Hibridização in Situ Fluorescente , Polimorfismo de Nucleotídeo Único/genética , Convulsões/epidemiologia , Convulsões/genética , Convulsões/patologia , SíndromeRESUMO
In humans, the telomere consists of tandem 5'TTAGGG3' DNA repeats on both ends of all 46 chromosomes. Telomere shortening has been linked to aging and age-related diseases. Similarly, telomere length changes have been associated with chemical exposure, molecular-level DNA damage, and tumor development. Telomere elongation has been associated to tumor development, caused due to chemical exposure and molecular-level DNA damage. The methods used to study these effects mostly rely on average telomere length as a biomarker. The mechanisms regulating subtelomere-specific and haplotype-specific telomere lengths in humans remain understudied and poorly understood, primarily because of technical limitations in obtaining these data for all chromosomes. Recent studies have shown that it is the short telomeres that are crucial in preserving chromosome stability. The identity and frequency of specific critically short telomeres potentially is a useful biomarker for studying aging, age-related diseases, and cancer. Here, we will briefly review the role of telomere length, its measurement, and our recent single-molecule telomere length measurement assay. With this assay, one can measure individual telomere lengths as well as identify their physically linked subtelomeric DNA. This assay can also positively detect telomere loss, characterize novel subtelomeric variants, haplotypes, and previously uncharacterized recombined subtelomeres. We will also discuss its applications in aging cells and cancer cells, highlighting the utility of the single molecule telomere length assay.
Assuntos
Ensaios de Triagem em Larga Escala , Telômero , Humanos , NanotecnologiaRESUMO
Genomic regions of high segmental duplication content and/or structural variation have led to gaps and misassemblies in the human reference sequence, and are refractory to assembly from whole-genome short-read datasets. Human subtelomere regions are highly enriched in both segmental duplication content and structural variations, and as a consequence are both impossible to assemble accurately and highly variable from individual to individual. Recently, we developed a pipeline for improved region-specific assembly called Regional Extension of Assemblies Using Linked-Reads (REXTAL). In this study, we evaluate REXTAL and genome-wide assembly (Supernova) approaches on 10X Genomics linked-reads data sets partitioned and barcoded using the Gel Bead in Emulsion (GEM) microfluidic method. Our results describe the accuracy and relative performance of these two approaches using the reference-based assessment module of QUAST. We show that REXTAL dramatically outperforms the Supernova whole genome assembler in subtelomeric segmental duplication regions, and results in highly accurate assemblies. Nearly all of the REXTAL "misassemblies" identified using default QUAST parameters simply pinpoint locations of tandem repeat arrays in the reference sequence where the repeat array length differs from that in the cognate REXTAL assembly by 1000 bp.
Assuntos
Estruturas Cromossômicas/genética , Genômica/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Genoma Humano/genética , HumanosRESUMO
Two brothers, with dissimilar clinical features, were each found to have different abnormalities of chromosome 20 by subtelomere fluorescence in situ hybridization (FISH). The proband had deletion of 20p subtelomere and duplication of 20q subtelomere, while his brother was found to have a duplication of 20p subtelomere and deletion of 20q subtelomere. Parental cytogenetic studies were initially thought to be normal, both by G-banding and by subtelomere FISH analysis. Since chromosome 20 is a metacentric chromosome and an inversion was suspected, we used anchored FISH to assist in identifying a possible inversion. This approach employed concomitant hybridization of a FISH probe to the short (p) arm of chromosome 20 with the 20q subtelomere probe. We identified a cytogenetically non-visible, mosaic pericentric inversion of one of the maternal chromosome 20 homologs, providing a mechanistic explanation for the chromosomal abnormalities present in these brothers. Array comparative genomic hybridization (CGH) with both a custom-made BAC and cosmid-based subtelomere specific array (TEL array) and a commercially available SNP-based array confirmed and further characterized these rearrangements, identifying this as the largest pericentric inversion of chromosome 20 described to date. TEL array data indicate that the 20p breakpoint is defined by BAC RP11-978M13, approximately 900 kb from the pter; SNP array data reveal this breakpoint to occur within BAC RP11-978M13. The 20q breakpoint is defined by BAC RP11-93B14, approximately 1.7 Mb from the qter, by TEL array; SNP array data refine this breakpoint to within a gap between BACs on the TEL array (i.e., between RP11-93B14 and proximal BAC RP11-765G16).
Assuntos
Cromossomos Humanos Par 20/genética , Criança , Aberrações Cromossômicas , Bandeamento Cromossômico , Inversão Cromossômica , Cromossomos Artificiais Bacterianos , Hibridização Genômica Comparativa , Cosmídeos , Saúde da Família , Feminino , Deleção de Genes , Humanos , Hibridização in Situ Fluorescente , Lactente , Masculino , Mães , Fenótipo , Telômero/ultraestruturaRESUMO
Human subtelomere regions are highly enriched in large segmental duplications and structural variants, leading to many gaps and misassemblies in these regions. We develop a novel method, NPGREAT (NanoPore Guided REgional Assembly Tool), which combines Nanopore ultralong read datasets and short-read assemblies derived from 10x linked-reads to efficiently assemble these subtelomere regions into a single continuous sequence. We show that with the use of ultralong Nanopore reads as a guide, the highly accurate shorter linked-read sequence contigs are correctly oriented, ordered, spaced and extended. In the rare cases where a linked-read sequence contig contains inaccurately assembled segments, the use of Nanopore reads allows for detection and correction of this error. We tested NPGREAT on four representative subtelomeres of the NA12878 human genome (10p, 16p, 19q and 20p). The results demonstrate that the final computed assembly of each subtelomere is accurate and complete.
RESUMO
Human subtelomere regions contain numerous gene-rich segments and are susceptible to germline rearrangements. The availability of diagnostic test kits to detect subtelomeric rearrangements has resulted in the diagnosis of numerous abnormalities with clinical implications including congenital heart abnormalities and mental retardation. Several of these have been described as clinically recognizable syndromes (e.g., deletion of 1p, 3p, 5q, 6p, 9q, and 22q). Given this, fine-mapping of subtelomeric breakpoints is of increasing importance to the assessment of genotype-phenotype correlations in these recognized syndromes as well as to the identification of additional syndromes. We developed a BAC and cosmid-based DNA array (TEL array) with high-resolution coverage of 10 Mb-sized subtelomeric regions, and used it to analyze 42 samples from unrelated patients with subtelomeric rearrangements whose breakpoints were previously either unmapped or mapped at a lower resolution than that achievable with the TEL array. Six apparently recurrent subtelomeric breakpoint loci were localized to genomic regions containing segmental duplication, copy number variation, and sequence gaps. Small (1 Mb or less) candidate gene regions for clinical phenotypes in separate patients were identified for 3p, 6q, 9q, and 10p deletions as well as for a 19q duplication. In addition to fine-mapping nearly all of the expected breakpoints, several previously unidentified rearrangements were detected.
Assuntos
Deleção Cromossômica , Mapeamento Cromossômico/métodos , Duplicação Gênica , Hibridização de Ácido Nucleico , Telômero/genética , Quebra Cromossômica , Cromossomos Artificiais Bacterianos/química , Cromossomos Humanos Par 10 , Cromossomos Humanos Par 9 , Análise Citogenética , Feminino , Haplótipos , Humanos , Masculino , Hibridização de Ácido Nucleico/métodos , Análise de Sequência com Séries de OligonucleotídeosRESUMO
It is currently impossible to get complete de-novo assembly of segmentally duplicated genome regions using genome-wide short-read datasets. Here, we devise a new computational method called Regional Extension of Assemblies Using Linked-Reads (REXTAL) for improved region-specific assembly of segmental duplication-containing DNA, leveraging genomic short-read datasets generated from large DNA molecules partitioned and barcoded using the "Gel Bead in Emulsion" (GEM) microfluidic method (Zheng et al., 2016). We show that using REXTAL, it is possible to extend assembly of single-copy diploid DNA into adjacent, otherwise inaccessible subtelomere segmental duplication regions and other subtelomeric gap regions. Moreover, REXTAL is computationally more efficient for the directed assembly of such regions from multiple genomes (e.g., for the comparison of structural variation) than genome-wide assembly approaches.
RESUMO
PURPOSE: Cancer results from complex interactions of multiple variables at the biologic, individual, and social levels. Compared to other levels, social effects that occur geospatially in neighborhoods are not as well-studied, and empiric methods to assess these effects are limited. We propose a novel Neighborhood-Wide Association Study(NWAS), analogous to genome-wide association studies(GWAS), that utilizes high-dimensional computing approaches from biology to comprehensively and empirically identify neighborhood factors associated with disease. METHODS: Pennsylvania Cancer Registry data were linked to U.S. Census data. In a successively more stringent multiphase approach, we evaluated the association between neighborhood (n = 14,663 census variables) and prostate cancer aggressiveness(PCA) with n = 6,416 aggressive (Stage≥3/Gleason grade≥7 cases) vs. n = 70,670 non-aggressive (Stage<3/Gleason grade<7) cases in White men. Analyses accounted for age, year of diagnosis, spatial correlation, and multiple-testing. We used generalized estimating equations in Phase 1 and Bayesian mixed effects models in Phase 2 to calculate odds ratios(OR) and confidence/credible intervals(CI). In Phase 3, principal components analysis grouped correlated variables. RESULTS: We identified 17 new neighborhood variables associated with PCA. These variables represented income, housing, employment, immigration, access to care, and social support. The top hits or most significant variables related to transportation (OR = 1.05;CI = 1.001-1.09) and poverty (OR = 1.07;CI = 1.01-1.12). CONCLUSIONS: This study introduces the application of high-dimensional, computational methods to large-scale, publically-available geospatial data. Although NWAS requires further testing, it is hypothesis-generating and addresses gaps in geospatial analysis related to empiric assessment. Further, NWAS could have broad implications for many diseases and future precision medicine studies focused on multilevel risk factors of disease.
Assuntos
Acessibilidade aos Serviços de Saúde , Renda , Invasividade Neoplásica/patologia , Neoplasias da Próstata/diagnóstico , Características de Residência , Apoio Social , Idoso , Humanos , Masculino , Pessoa de Meia-Idade , Gradação de Tumores , Pobreza , Neoplasias da Próstata/patologia , Fatores de Risco , Índice de Gravidade de DoençaRESUMO
Background: Multilevel frameworks suggest neighborhood circumstances influence biology; however, this relationship is not well studied. Telomere length (TL) shortening has been associated with individual-level and neighborhood-level exposures and disease and may provide insights into underlying biologic mechanisms linking neighborhood with biology. To support neighborhood-biology investigations, we sought to determine the independent effect of neighborhood exposures on TL using standard multilevel linear regression models and quantile regression, a nonlinear, social science method applicable for testing the biologic hypothesis that extremes of the TL distribution are related to poor outcomes.Methods: In a multicenter, cross-sectional study, blood TL was measured in 1,488 individuals from 127 census tracts in three U.S. regions using terminal restriction fragment assays. Multilevel linear and quantile regression models were adjusted for individual-level race, education, perceived stress, and depression. Neighborhood exposures included population density, urban/residential crowding, residential stability/mobility, and socioeconomic status.Results: TL was not associated with any neighborhood variable using linear models, but quantile regression revealed inverse associations between population density and urban crowding at the lower tails of the TL distribution [5th (population density P = 0.03; urban crowding P = 0.002), 50th (both P < 0.001), 75th percentiles (both P < 0.001)]. TL was related to residential stability at the upper tail (95th percentile P = 0.006).Conclusions: Findings support the use of nonlinear statistical methods in TL research and suggest that neighborhood exposures can result in biological effects.Impact: TL may serve as an underlying example of a biologic mechanism that can link neighborhood with biology, thus supporting multilevel investigations in future studies. Cancer Epidemiol Biomarkers Prev; 26(4); 553-60. ©2017 AACRSee all the articles in this CEBP Focus section, "Geospatial Approaches to Cancer Control and Population Sciences."
Assuntos
Características de Residência/classificação , Encurtamento do Telômero , Telômero/fisiologia , População Urbana , Adulto , Biomarcadores/sangue , Estudos Transversais , Feminino , Humanos , Modelos Lineares , Masculino , Pessoa de Meia-Idade , Classe Social , Inquéritos e Questionários , Estados UnidosRESUMO
BACKGROUND: Leukocyte telomere length(LTL) has been associated with age, self-reported race/ethnicity, gender, education, and psychosocial factors, including perceived stress, and depression. However, inconsistencies in associations of LTL with disease and other phenotypes exist across studies. Population characteristics, including race/ethnicity, laboratory methods, and statistical approaches in LTL have not been comprehensively studied and could explain inconsistent LTL associations. METHODS: LTL was measured using Southern Blot in 1510 participants from a multi-ethnic, multi-center study combining data from 3 centers with different population characteristics and laboratory processing methods. Main associations between LTL and psychosocial factors and LTL and race/ethnicity were evaluated and then compared across generalized estimating equations(GEE) and linear regression models. Statistical models were adjusted for factors typically associated with LTL(age, gender, cancer status) and also accounted for factors related to center differences, including laboratory methods(i.e., DNA extraction). Associations between LTL and psychosocial factors were also evaluated within race/ethnicity subgroups (Non-hispanic Whites, African Americans, and Hispanics). RESULTS: Beyond adjustment for age, gender, and cancer status, additional adjustments for DNA extraction and clustering by center were needed given their effects on LTL measurements. In adjusted GEE models, longer LTL was associated with African American race (Beta(ß)(standard error(SE)) = 0.09(0.04), p-value = 0.04) and Hispanic ethnicity (ß(SE) = 0.06(0.01), p-value = 0.02) compared to Non-Hispanic Whites. Longer LTL was also associated with less than a high school education compared to having greater than a high school education (ß(SE) = 0.06(0.02), p-value = 0.04). LTL was inversely related to perceived stress (ß(SE) = -0.02(0.003), p<0.001). In subgroup analyses, there was a negative association with LTL in African Americans with a high school education versus those with greater than a high school education(ß(SE) = -0.11(0.03), p-value<0.001). CONCLUSIONS: Laboratory methods and population characteristics that differ by center can influence telomere length associations in multicenter settings, but these effects could be addressed through statistical adjustments. Proper evaluation of potential sources of bias can allow for combined multicenter analyses and may resolve some inconsistencies in reporting of LTL associations. Further, biologic effects on LTL may differ under certain psychosocial and racial/ethnic circumstances and could impact future health disparity studies.