Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
1.
Am J Hum Genet ; 111(8): 1700-1716, 2024 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-38991590

RESUMO

The secreted mucins MUC5AC and MUC5B are large glycoproteins that play critical defensive roles in pathogen entrapment and mucociliary clearance. Their respective genes contain polymorphic and degenerate protein-coding variable number tandem repeats (VNTRs) that make the loci difficult to investigate with short reads. We characterize the structural diversity of MUC5AC and MUC5B by long-read sequencing and assembly of 206 human and 20 nonhuman primate (NHP) haplotypes. We find that human MUC5B is largely invariant (5,761-5,762 amino acids [aa]); however, seven haplotypes have expanded VNTRs (6,291-7,019 aa). In contrast, 30 allelic variants of MUC5AC encode 16 distinct proteins (5,249-6,325 aa) with cysteine-rich domain and VNTR copy-number variation. We group MUC5AC alleles into three phylogenetic clades: H1 (46%, ∼5,654 aa), H2 (33%, ∼5,742 aa), and H3 (7%, ∼6,325 aa). The two most common human MUC5AC variants are smaller than NHP gene models, suggesting a reduction in protein length during recent human evolution. Linkage disequilibrium and Tajima's D analyses reveal that East Asians carry exceptionally large blocks with an excess of rare variation (p < 0.05) at MUC5AC. To validate this result, we use Locityper for genotyping MUC5AC haplogroups in 2,600 unrelated samples from the 1000 Genomes Project. We observe a signature of positive selection in H1 among East Asians and a depletion of the likely ancestral haplogroup (H3). In Europeans, H3 alleles show an excess of common variation and deviate from Hardy-Weinberg equilibrium (p < 0.05), consistent with heterozygote advantage and balancing selection. This study provides a generalizable strategy to characterize complex protein-coding VNTRs for improved disease associations.


Assuntos
Alelos , Variação Genética , Haplótipos , Repetições Minissatélites , Mucina-5AC , Mucina-5B , Filogenia , Humanos , Mucina-5B/genética , Animais , Mucina-5AC/genética , Mucina-5AC/metabolismo , Repetições Minissatélites/genética , Variações do Número de Cópias de DNA , Primatas/genética
2.
bioRxiv ; 2024 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-38562829

RESUMO

The secreted mucins MUC5AC and MUC5B play critical defensive roles in airway pathogen entrapment and mucociliary clearance by encoding large glycoproteins with variable number tandem repeats (VNTRs). These polymorphic and degenerate protein coding VNTRs make the loci difficult to investigate with short reads. We characterize the structural diversity of MUC5AC and MUC5B by long-read sequencing and assembly of 206 human and 20 nonhuman primate (NHP) haplotypes. We find that human MUC5B is largely invariant (5761-5762aa); however, seven haplotypes have expanded VNTRs (6291-7019aa). In contrast, 30 allelic variants of MUC5AC encode 16 distinct proteins (5249-6325aa) with cysteine-rich domain and VNTR copy number variation. We grouped MUC5AC alleles into three phylogenetic clades: H1 (46%, ~5654aa), H2 (33%, ~5742aa), and H3 (7%, ~6325aa). The two most common human MUC5AC variants are smaller than NHP gene models, suggesting a reduction in protein length during recent human evolution. Linkage disequilibrium (LD) and Tajima's D analyses reveal that East Asians carry exceptionally large MUC5AC LD blocks with an excess of rare variation (p<0.05). To validate this result, we used Locityper for genotyping MUC5AC haplogroups in 2,600 unrelated samples from the 1000 Genomes Project. We observed signatures of positive selection in H1 and H2 among East Asians and a depletion of the likely ancestral haplogroup (H3). In Africans and Europeans, H3 alleles show an excess of common variation and deviate from Hardy-Weinberg equilibrium, consistent with heterozygote advantage and balancing selection. This study provides a generalizable strategy to characterize complex protein coding VNTRs for improved disease associations.

3.
EMBO Mol Med ; 14(7): e15608, 2022 07 07.
Artigo em Inglês | MEDLINE | ID: mdl-35698786

RESUMO

The highly conserved Elongator complex is a translational regulator that plays a critical role in neurodevelopment, neurological diseases, and brain tumors. Numerous clinically relevant variants have been reported in the catalytic Elp123 subcomplex, while no missense mutations in the accessory subcomplex Elp456 have been described. Here, we identify ELP4 and ELP6 variants in patients with developmental delay, epilepsy, intellectual disability, and motor dysfunction. We determine the structures of human and murine Elp456 subcomplexes and locate the mutated residues. We show that patient-derived mutations in Elp456 affect the tRNA modification activity of Elongator in vitro as well as in human and murine cells. Modeling the pathogenic variants in mice recapitulates the clinical features of the patients and reveals neuropathology that differs from the one caused by previously characterized Elp123 mutations. Our study demonstrates a direct correlation between Elp4 and Elp6 mutations, reduced Elongator activity, and neurological defects. Foremost, our data indicate previously unrecognized differences of the Elp123 and Elp456 subcomplexes for individual tRNA species, in different cell types and in different key steps during the neurodevelopment of higher organisms.


Assuntos
RNA de Transferência , Proteínas de Saccharomyces cerevisiae , Animais , Camundongos , Subunidades Proteicas/química , Subunidades Proteicas/genética , Subunidades Proteicas/metabolismo , RNA de Transferência/química , RNA de Transferência/genética , RNA de Transferência/metabolismo , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/metabolismo
4.
Science ; 376(6588): eabj6965, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35357917

RESUMO

Despite their importance in disease and evolution, highly identical segmental duplications (SDs) are among the last regions of the human reference genome (GRCh38) to be fully sequenced. Using a complete telomere-to-telomere human genome (T2T-CHM13), we present a comprehensive view of human SD organization. SDs account for nearly one-third of the additional sequence, increasing the genome-wide estimate from 5.4 to 7.0% [218 million base pairs (Mbp)]. An analysis of 268 human genomes shows that 91% of the previously unresolved T2T-CHM13 SD sequence (68.3 Mbp) better represents human copy number variation. Comparing long-read assemblies from human (n = 12) and nonhuman primate (n = 5) genomes, we systematically reconstruct the evolution and structural haplotype diversity of biomedically relevant and duplicated genes. This analysis reveals patterns of structural heterozygosity and evolutionary differences in SD organization between humans and other primates.


Assuntos
Variações do Número de Cópias de DNA , Duplicação Gênica , Genoma Humano , Duplicações Segmentares Genômicas , Evolução Molecular , Proteínas Ativadoras de GTPase/genética , Humanos , Polimorfismo de Nucleotídeo Único , Proteínas Proto-Oncogênicas/genética
5.
Genome Res ; 30(9): 1291-1305, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32801147

RESUMO

Complete and accurate genome assemblies form the basis of most downstream genomic analyses and are of critical importance. Recent genome assembly projects have relied on a combination of noisy long-read sequencing and accurate short-read sequencing, with the former offering greater assembly continuity and the latter providing higher consensus accuracy. The recently introduced Pacific Biosciences (PacBio) HiFi sequencing technology bridges this divide by delivering long reads (>10 kbp) with high per-base accuracy (>99.9%). Here we present HiCanu, a modification of the Canu assembler designed to leverage the full potential of HiFi reads via homopolymer compression, overlap-based error correction, and aggressive false overlap filtering. We benchmark HiCanu with a focus on the recovery of haplotype diversity, major histocompatibility complex (MHC) variants, satellite DNAs, and segmental duplications. For diploid human genomes sequenced to 30× HiFi coverage, HiCanu achieved superior accuracy and allele recovery compared to the current state of the art. On the effectively haploid CHM13 human cell line, HiCanu achieved an NG50 contig size of 77 Mbp with a per-base consensus accuracy of 99.999% (QV50), surpassing recent assemblies of high-coverage, ultralong Oxford Nanopore Technologies (ONT) reads in terms of both accuracy and continuity. This HiCanu assembly correctly resolves 337 out of 341 validation BACs sampled from known segmental duplications and provides the first preliminary assemblies of nine complete human centromeric regions. Although gaps and errors still remain within the most challenging regions of the genome, these results represent a significant advance toward the complete assembly of human genomes.


Assuntos
Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Alelos , Animais , Linhagem Celular , Duplicação Cromossômica , DNA de Neoplasias , DNA Satélite , Drosophila/genética , Genoma Humano , Haplótipos , Humanos , Reprodutibilidade dos Testes , Software
6.
Genes (Basel) ; 11(2)2020 02 18.
Artigo em Inglês | MEDLINE | ID: mdl-32085667

RESUMO

POTE (prostate, ovary, testis, and placenta expressed) genes belong to a primate-specific gene family expressed in prostate, ovary, and testis as well as in several cancers including breast, prostate, and lung cancers. Due to their tumor-specific expression, POTEs are potential oncogenes, therapeutic targets, and biomarkers for these malignancies. This gene family maps within human and primate segmental duplications with a copy number ranging from two to 14 in different species. Due to the high sequence identity among the gene copies, specific efforts are needed to assemble these loci in order to correctly define the organization and evolution of the gene family. Using single-molecule, real-time (SMRT) sequencing, in silico analyses, and molecular cytogenetics, we characterized the structure, copy number, and chromosomal distribution of the POTE genes, as well as their expression in normal and disease tissues, and provided a comparative analysis of the POTE organization and gene structure in primate genomes. We were able, for the first time, to de novo sequence and assemble a POTE tandem duplication in marmoset that is misassembled and collapsed in the reference genome, thus revealing the presence of a second POTE copy. Taken together, our findings provide comprehensive insights into the evolutionary dynamics of the primate-specific POTE gene family, involving gene duplications, deletions, and long interspersed nuclear element (LINE) transpositions to explain the actual repertoire of these genes in human and primate genomes.


Assuntos
Família Multigênica , Ovário/química , Placenta/química , Primatas/genética , Próstata/química , Testículo/química , Animais , Mapeamento Cromossômico , Simulação por Computador , Evolução Molecular , Feminino , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Humanos , Masculino , Gravidez , Imagem Individual de Molécula , Distribuição Tecidual
7.
Nat Genet ; 52(2): 146-159, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-32060489

RESUMO

In many repeat diseases, such as Huntington's disease (HD), ongoing repeat expansions in affected tissues contribute to disease onset, progression and severity. Inducing contractions of expanded repeats by exogenous agents is not yet possible. Traditional approaches would target proteins driving repeat mutations. Here we report a compound, naphthyridine-azaquinolone (NA), that specifically binds slipped-CAG DNA intermediates of expansion mutations, a previously unsuspected target. NA efficiently induces repeat contractions in HD patient cells as well as en masse contractions in medium spiny neurons of HD mouse striatum. Contractions are specific for the expanded allele, independently of DNA replication, require transcription across the coding CTG strand and arise by blocking repair of CAG slip-outs. NA-induced contractions depend on active expansions driven by MutSß. NA injections in HD mouse striatum reduce mutant HTT protein aggregates, a biomarker of HD pathogenesis and severity. Repeat-structure-specific DNA ligands are a novel avenue to contract expanded repeats.


Assuntos
Proteína Huntingtina/genética , Doença de Huntington/genética , Naftiridinas/farmacologia , Quinolonas/farmacologia , Expansão das Repetições de Trinucleotídeos/efeitos dos fármacos , Animais , Corpo Estriado/efeitos dos fármacos , DNA/metabolismo , Reparo de Erro de Pareamento de DNA/efeitos dos fármacos , Replicação do DNA/efeitos dos fármacos , Modelos Animais de Doenças , Humanos , Proteína Huntingtina/metabolismo , Doença de Huntington/tratamento farmacológico , Doença de Huntington/patologia , Masculino , Camundongos , Camundongos Transgênicos , Instabilidade de Microssatélites , Mutação , Ribonucleases/metabolismo , Proteína de Ligação a TATA-Box/genética , Transcrição Gênica
8.
Nat Commun ; 11(1): 255, 2020 01 14.
Artigo em Inglês | MEDLINE | ID: mdl-31937769

RESUMO

Copy number variants (CNVs) are suggested to have a widespread impact on the human genome and phenotypes. To understand the role of CNVs across human diseases, we examine the CNV genomic landscape of 100,028 unrelated individuals of European ancestry, using SNP and CGH array datasets. We observe an average CNV burden of ~650 kb, identifying a total of 11,314 deletion, 5625 duplication, and 2746 homozygous deletion CNV regions (CNVRs). In all, 13.7% are unreported, 58.6% overlap with at least one gene, and 32.8% interrupt coding exons. These CNVRs are significantly more likely to overlap OMIM genes (2.94-fold), GWAS loci (1.52-fold), and non-coding RNAs (1.44-fold), compared with random distribution (P < 1 × 10-3). We uncover CNV associations with four major disease categories, including autoimmune, cardio-metabolic, oncologic, and neurological/psychiatric diseases, and identify several drug-repurposing opportunities. Our results demonstrate robust frequency definition for large-scale rare variant association studies, identify CNVs associated with major disease categories, and illustrate the pleiotropic impact of CNVs in human disease.


Assuntos
Variações do Número de Cópias de DNA , Predisposição Genética para Doença/genética , Genoma Humano/genética , População Branca/genética , Hibridização Genômica Comparativa , Bases de Dados Genéticas , Loci Gênicos , Predisposição Genética para Doença/etnologia , Estudo de Associação Genômica Ampla , Humanos , Anotação de Sequência Molecular , Polimorfismo de Nucleotídeo Único
9.
Clin Genet ; 97(2): 338-346, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31674007

RESUMO

The genotype-first approach has been successfully applied and has elucidated several subtypes of autism spectrum disorder (ASD). However, it requires very large cohorts because of the extensive genetic heterogeneity. We investigate the alternate possibility of whether phenotype-specific genes can be identified from a small group of patients with specific phenotype(s). To identify novel genes associated with ASD and abnormal head circumference using a phenotype-to-genotype approach, we performed whole-exome sequencing on 67 families with ASD and abnormal head circumference. Clinically relevant pathogenic or likely pathogenic variants account for 23.9% of patients with microcephaly or macrocephaly, and 81.25% of those variants or genes are head-size associated. Significantly, recurrent pathogenic mutations were identified in two macrocephaly genes (PTEN, CHD8) in this small cohort. De novo mutations in several candidate genes (UBN2, BIRC6, SYNE1, and KCNMA1) were detected, as well as one new candidate gene (TNPO3) implicated in ASD and related neurodevelopmental disorders. We identify genotype-phenotype correlations for head-size-associated ASD genes and novel candidate genes for further investigation. Our results also suggest a phenotype-to-genotype strategy would accelerate the elucidation of genotype-phenotype relationships for ASD by using phenotype-restricted cohorts.


Assuntos
Transtorno do Espectro Autista/genética , Estudos de Associação Genética/métodos , Predisposição Genética para Doença/genética , Cabeça/crescimento & desenvolvimento , Transtorno do Espectro Autista/sangue , Transtorno do Espectro Autista/complicações , Estudos de Coortes , Proteínas do Citoesqueleto/genética , Proteínas de Ligação a DNA/genética , Feminino , Genótipo , Cabeça/anatomia & histologia , Humanos , Mutação INDEL , Proteínas Inibidoras de Apoptose/genética , Subunidades alfa do Canal de Potássio Ativado por Cálcio de Condutância Alta/genética , Masculino , Megalencefalia/complicações , Megalencefalia/genética , Microcefalia/complicações , Microcefalia/genética , Proteínas do Tecido Nervoso/genética , PTEN Fosfo-Hidrolase/genética , Fenótipo , Polimorfismo de Nucleotídeo Único , Fatores de Transcrição/genética , Sequenciamento do Exoma , beta Carioferinas/genética
10.
Ann Hum Genet ; 84(2): 125-140, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-31711268

RESUMO

The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective standalone technology for de novo assembly of human genomes.


Assuntos
Biomarcadores/análise , Variação Genética , Genoma Humano , Haploidia , Mola Hidatiforme/genética , Análise de Sequência de DNA/métodos , Análise de Célula Única/métodos , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Anotação de Sequência Molecular , Gravidez
11.
Am J Hum Genet ; 105(5): 947-958, 2019 11 07.
Artigo em Inglês | MEDLINE | ID: mdl-31668704

RESUMO

Human-specific duplications at chromosome 16p11.2 mediate recurrent pathogenic 600 kbp BP4-BP5 copy-number variations, which are among the most common genetic causes of autism. These copy-number polymorphic duplications are under positive selection and include three to eight copies of BOLA2, a gene involved in the maturation of cytosolic iron-sulfur proteins. To investigate the potential advantage provided by the rapid expansion of BOLA2, we assessed hematological traits and anemia prevalence in 379,385 controls and individuals who have lost or gained copies of BOLA2: 89 chromosome 16p11.2 BP4-BP5 deletion carriers and 56 reciprocal duplication carriers in the UK Biobank. We found that the 16p11.2 deletion is associated with anemia (18/89 carriers, 20%, p = 4e-7, OR = 5), particularly iron-deficiency anemia. We observed similar enrichments in two clinical 16p11.2 deletion cohorts, which included 6/63 (10%) and 7/20 (35%) unrelated individuals with anemia, microcytosis, low serum iron, or low blood hemoglobin. Upon stratification by BOLA2 copy number, our data showed an association between low BOLA2 dosage and the above phenotypes (8/15 individuals with three copies, 53%, p = 1e-4). In parallel, we analyzed hematological traits in mice carrying the 16p11.2 orthologous deletion or duplication, as well as Bola2+/- and Bola2-/- animals. The Bola2-deficient mice and the mice carrying the deletion showed early evidence of iron deficiency, including a mild decrease in hemoglobin, lower plasma iron, microcytosis, and an increased red blood cell zinc-protoporphyrin-to-heme ratio. Our results indicate that BOLA2 participates in iron homeostasis in vivo, and its expansion has a potential adaptive role in protecting against iron deficiency.


Assuntos
Anemia/genética , Transtorno Autístico/genética , Duplicação Cromossômica/genética , Cromossomos Humanos Par 16/genética , Homeostase/genética , Proteínas/genética , Animais , Deleção Cromossômica , Transtornos Cromossômicos/genética , Variações do Número de Cópias de DNA/genética , Feminino , Genótipo , Heterozigoto , Humanos , Ferro , Masculino , Fenótipo
12.
Nat Methods ; 16(1): 88-94, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30559433

RESUMO

We have developed a computational method based on polyploid phasing of long sequence reads to resolve collapsed regions of segmental duplications within genome assemblies. Segmental Duplication Assembler (SDA; https://github.com/mvollger/SDA ) constructs graphs in which paralogous sequence variants define the nodes and long-read sequences provide attraction and repulsion edges, enabling the partition and assembly of long reads corresponding to distinct paralogs. We apply it to single-molecule, real-time sequence data from three human genomes and recover 33-79 megabase pairs (Mb) of duplications in which approximately half of the loci are diverged (<99.8%) compared to the reference genome. We show that the corresponding sequence is highly accurate (>99.9%) and that the diverged sequence corresponds to copy-number-variable paralogs that are absent from the human reference genome. Our method can be applied to other complex genomes to resolve the last gene-rich gaps, improve duplicate gene annotation, and better understand copy-number-variant genetic diversity at the base-pair level.


Assuntos
Biologia Computacional , Duplicações Segmentares Genômicas , Análise de Sequência de DNA/métodos , Genoma Humano , Humanos , Anotação de Sequência Molecular
13.
Res Comput Mol Biol ; 10229: 117-133, 2017 May.
Artigo em Inglês | MEDLINE | ID: mdl-28808695

RESUMO

While the rise of single-molecule sequencing systems has enabled an unprecedented rise in the ability to assemble complex regions of the genome, long segmental duplications in the genome still remain a challenging frontier in assembly. Segmental duplications are at the same time both gene rich and prone to large structural rearrangements, making the resolution of their sequences important in medical and evolutionary studies. Duplicated sequences that are collapsed in mammalian de novo assemblies are rarely identical; after a sequence is duplicated, it begins to acquire paralog specific variants. In this paper, we study the problem of resolving the variations in multicopy long-segmental duplications by developing and utilizing algorithms for polyploid phasing. We develop two algorithms: the first one is targeted at maximizing the likelihood of observing the reads given the underlying haplotypes using discrete matrix completion. The second algorithm is based on correlation clustering and exploits an assumption, which is often satisfied in these duplications, that each paralog has a sizable number of paralog-specific variants. We develop a detailed simulation methodology, and demonstrate the superior performance of the proposed algorithms on an array of simulated datasets. We measure the likelihood score as well as reconstruction accuracy, i.e., what fraction of the reads are clustered correctly. In both the performance metrics, we find that our algorithms dominate existing algorithms on more than 93% of the datasets. While the discrete matrix completion performs better on likelihood score, the correlation clustering algorithm performs better on reconstruction accuracy due to the stronger regularization inherent in the algorithm. We also show that our correlation-clustering algorithm can reconstruct on an average 7.0 haplotypes in 10-copy duplication data-sets whereas existing algorithms reconstruct less than 1 copy on average.

14.
Proc Natl Acad Sci U S A ; 112(52): E7223-9, 2015 Dec 29.
Artigo em Inglês | MEDLINE | ID: mdl-26668394

RESUMO

NK-lysin is an antimicrobial peptide and effector protein in the host innate immune system. It is coded by a single gene in humans and most other mammalian species. In this study, we provide evidence for the existence of four NK-lysin genes in a repetitive region on cattle chromosome 11. The NK2A, NK2B, and NK2C genes are tandemly arrayed as three copies in ∼30-35-kb segments, located 41.8 kb upstream of NK1. All four genes are functional, albeit with differential tissue expression. NK1, NK2A, and NK2B exhibited the highest expression in intestine Peyer's patch, whereas NK2C was expressed almost exclusively in lung. The four peptide products were synthesized ex vivo, and their antimicrobial effects against both Gram-positive and Gram-negative bacteria were confirmed with a bacteria-killing assay. Transmission electron microcopy indicated that bovine NK-lysins exhibited their antimicrobial activities by lytic action in the cell membranes. In summary, the single NK-lysin gene in other mammals has expanded to a four-member gene family by tandem duplications in cattle; all four genes are transcribed, and the synthetic peptides corresponding to the core regions are biologically active and likely contribute to innate immunity in ruminants.


Assuntos
Bovinos/genética , Dosagem de Genes , Família Multigênica , Proteolipídeos/genética , Sequência de Aminoácidos , Animais , Sequência de Bases , Cromossomos de Mamíferos/genética , Escherichia coli/efeitos dos fármacos , Escherichia coli/crescimento & desenvolvimento , Escherichia coli/ultraestrutura , Perfilação da Expressão Gênica , Ordem dos Genes , Microscopia Eletrônica de Transmissão , Dados de Sequência Molecular , Especificidade de Órgãos/genética , Peptídeos/farmacologia , Filogenia , Proteolipídeos/classificação , Proteolipídeos/farmacologia , Homologia de Sequência de Aminoácidos , Homologia de Sequência do Ácido Nucleico
15.
Genes Immun ; 16(1): 24-34, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25338678

RESUMO

Germline variation at immunoglobulin (IG) loci is critical for pathogen-mediated immunity, but establishing complete haplotype sequences in these regions has been problematic because of complex sequence architecture and diploid source DNA. We sequenced BAC clones from the effectively haploid human hydatidiform mole cell line, CHM1htert, across the light chain IG loci, kappa (IGK) and lambda (IGL), creating single haplotype representations of these regions. The IGL haplotype generated here is 1.25 Mb of contiguous sequence, including four novel IGLV alleles, one novel IGLC allele, and an 11.9-kb insertion. The CH17 IGK haplotype consists of two 644 kb proximal and 466 kb distal contigs separated by a large gap of unknown size; these assemblies added 49 kb of unique sequence extending into this gap. Our analysis also resulted in the characterization of seven novel IGKV alleles and a 16.7-kb region exhibiting signatures of interlocus sequence exchange between distal and proximal IGKV gene clusters. Genetic diversity in IGK/IGL was compared with that of the IG heavy chain (IGH) locus within the same haploid genome, revealing threefold (IGK) and sixfold (IGL) higher diversity in the IGH locus, potentially associated with increased levels of segmental duplication and the telomeric location of IGH.


Assuntos
Genes de Cadeia Leve de Imunoglobulina , Mola Hidatiforme/genética , Linhagem Celular Tumoral , Cromossomos Artificiais Bacterianos , Feminino , Genes de Cadeia Pesada de Imunoglobulina , Humanos , Dados de Sequência Molecular , Polimorfismo de Nucleotídeo Único , Gravidez
16.
Genome Res ; 24(12): 2066-76, 2014 12.
Artigo em Inglês | MEDLINE | ID: mdl-25373144

RESUMO

A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly.


Assuntos
Genoma Humano , Haplótipos , Mola Hidatiforme/genética , Alelos , Mapeamento Cromossômico , Cromossomos Artificiais Bacterianos , Biologia Computacional/métodos , Feminino , Genômica/métodos , Heterozigoto , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Polimorfismo de Nucleotídeo Único , Gravidez , Sequências Repetitivas de Ácido Nucleico , Duplicações Segmentares Genômicas , Análise de Sequência de DNA
17.
Nat Commun ; 5: 4954, 2014 Sep 18.
Artigo em Inglês | MEDLINE | ID: mdl-25232744

RESUMO

Next-generation sequencing recently revealed that recurrent disruptive mutations in a few genes may account for 1% of sporadic autism cases. Coupling these novel genetic data to empirical assays of protein function can illuminate crucial molecular networks. Here we demonstrate the power of the approach, performing the first functional analyses of TBR1 variants identified in sporadic autism. De novo truncating and missense mutations disrupt multiple aspects of TBR1 function, including subcellular localization, interactions with co-regulators and transcriptional repression. Missense mutations inherited from unaffected parents did not disturb function in our assays. We show that TBR1 homodimerizes, that it interacts with FOXP2, a transcription factor implicated in speech/language disorders, and that this interaction is disrupted by pathogenic mutations affecting either protein. These findings support the hypothesis that de novo mutations in sporadic autism have severe functional consequences. Moreover, they uncover neurogenetic mechanisms that bridge different neurodevelopmental disorders involving language deficits.


Assuntos
Transtorno do Espectro Autista/genética , Mutação , Proteínas com Domínio T/genética , Sequência de Aminoácidos , Linhagem Celular Tumoral , Criança , Pré-Escolar , Dimerização , Feminino , Fatores de Transcrição Forkhead/metabolismo , Células HEK293 , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Transtornos da Linguagem/genética , Masculino , Dados de Sequência Molecular , Mutagênese Sítio-Dirigida , Mutação de Sentido Incorreto , Mapeamento de Interação de Proteínas , Homologia de Sequência de Aminoácidos , Técnicas do Sistema de Duplo-Híbrido
18.
PLoS One ; 8(10): e75949, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24124524

RESUMO

Standard methods of DNA sequence analysis assume that sequences evolve independently, yet this assumption may not be appropriate for segmental duplications that exchange variants via interlocus gene conversion (IGC). Here, we use high quality multiple sequence alignments from well-annotated segmental duplications to systematically identify IGC signals in the human reference genome. Our analysis combines two complementary methods: (i) a paralog quartet method that uses DNA sequence simulations to identify a statistical excess of sites consistent with inter-paralog exchange, and (ii) the alignment-based method implemented in the GENECONV program. One-quarter (25.4%) of the paralog families in our analysis harbor clear IGC signals by the quartet approach. Using GENECONV, we identify 1477 gene conversion tracks that cumulatively span 1.54 Mb of the genome. Our analyses confirm the previously reported high rates of IGC in subtelomeric regions and Y-chromosome palindromes, and identify multiple novel IGC hotspots, including the pregnancy specific glycoproteins and the neuroblastoma breakpoint gene families. Although the duplication history of a paralog family is described by a single tree, we show that IGC has introduced incredible site-to-site variation in the evolutionary relationships among paralogs in the human genome. Our findings indicate that IGC has left significant footprints in patterns of sequence diversity across segmental duplications in the human genome, out-pacing the contributions of single base mutation by orders of magnitude. Collectively, the IGC signals we report comprise a catalog that will provide a critical reference for interpreting observed patterns of DNA sequence variation across duplicated genomic regions, including targets of recent adaptive evolution in humans.


Assuntos
Duplicações Segmentares Genômicas/genética , Análise de Sequência de DNA/métodos , Genoma Humano/genética , Humanos , Mutação
19.
Am J Hum Genet ; 92(4): 530-46, 2013 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-23541343

RESUMO

The immunoglobulin heavy-chain locus (IGH) encodes variable (IGHV), diversity (IGHD), joining (IGHJ), and constant (IGHC) genes and is responsible for antibody heavy-chain biosynthesis, which is vital to the adaptive immune response. Programmed V-(D)-J somatic rearrangement and the complex duplicated nature of the locus have impeded attempts to reconcile its genomic organization based on traditional B-lymphocyte derived genetic material. As a result, sequence descriptions of germline variation within IGHV are lacking, haplotype inference using traditional linkage disequilibrium methods has been difficult, and the human genome reference assembly is missing several expressed IGHV genes. By using a hydatidiform mole BAC clone resource, we present the most complete haplotype of IGHV, IGHD, and IGHJ gene regions derived from a single chromosome, representing an alternate assembly of ∼1 Mbp of high-quality finished sequence. From this we add 101 kbp of previously uncharacterized sequence, including functional IGHV genes, and characterize four large germline copy-number variants (CNVs). In addition to this germline reference, we identify and characterize eight CNV-containing haplotypes from a panel of nine diploid genomes of diverse ethnic origin, discovering previously unmapped IGHV genes and an additional 121 kbp of insertion sequence. We genotype four of these CNVs by using PCR in 425 individuals from nine human populations. We find that all four are highly polymorphic and show considerable evidence of stratification (Fst = 0.3-0.5), with the greatest differences observed between African and Asian populations. These CNVs exhibit weak linkage disequilibrium with SNPs from two commercial arrays in most of the populations tested.


Assuntos
Variações do Número de Cópias de DNA/genética , Fusão Gênica/genética , Genes de Cadeia Pesada de Imunoglobulina , Haplótipos/genética , Mola Hidatiforme/genética , Cadeias Pesadas de Imunoglobulinas/genética , Região Variável de Imunoglobulina/genética , Alelos , Cromossomos Artificiais Bacterianos , Feminino , Genética Populacional , Genótipo , Humanos , Dados de Sequência Molecular , Gravidez , Análise de Sequência de DNA , Recombinação V(D)J
20.
Science ; 338(6114): 1619-22, 2012 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-23160955

RESUMO

Exome sequencing studies of autism spectrum disorders (ASDs) have identified many de novo mutations but few recurrently disrupted genes. We therefore developed a modified molecular inversion probe method enabling ultra-low-cost candidate gene resequencing in very large cohorts. To demonstrate the power of this approach, we captured and sequenced 44 candidate genes in 2446 ASD probands. We discovered 27 de novo events in 16 genes, 59% of which are predicted to truncate proteins or disrupt splicing. We estimate that recurrent disruptive mutations in six genes-CHD8, DYRK1A, GRIN2B, TBR1, PTEN, and TBL1XR1-may contribute to 1% of sporadic ASDs. Our data support associations between specific genes and reciprocal subphenotypes (CHD8-macrocephaly and DYRK1A-microcephaly) and replicate the importance of a ß-catenin-chromatin-remodeling network to ASD etiology.


Assuntos
Transtornos Globais do Desenvolvimento Infantil/genética , Estudos de Associação Genética , Mutação , Análise de Sequência de DNA/métodos , Cefalometria , Criança , Pré-Escolar , Montagem e Desmontagem da Cromatina , Estudos de Coortes , Sondas de DNA , Proteínas de Ligação a DNA/genética , Exoma , Feminino , Predisposição Genética para Doença , Humanos , Masculino , Megalencefalia/genética , Microcefalia/genética , Proteínas Nucleares/genética , PTEN Fosfo-Hidrolase/genética , Proteínas Serina-Treonina Quinases/genética , Proteínas Tirosina Quinases/genética , Receptores Citoplasmáticos e Nucleares/genética , Receptores de N-Metil-D-Aspartato/genética , Proteínas Repressoras/genética , Proteínas com Domínio T/genética , Fatores de Transcrição/genética , beta Catenina/genética , beta Catenina/metabolismo , Quinases Dyrk
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA