Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Am J Hum Genet ; 2024 Jul 03.
Artigo em Inglês | MEDLINE | ID: mdl-38991590

RESUMO

The secreted mucins MUC5AC and MUC5B are large glycoproteins that play critical defensive roles in pathogen entrapment and mucociliary clearance. Their respective genes contain polymorphic and degenerate protein-coding variable number tandem repeats (VNTRs) that make the loci difficult to investigate with short reads. We characterize the structural diversity of MUC5AC and MUC5B by long-read sequencing and assembly of 206 human and 20 nonhuman primate (NHP) haplotypes. We find that human MUC5B is largely invariant (5,761-5,762 amino acids [aa]); however, seven haplotypes have expanded VNTRs (6,291-7,019 aa). In contrast, 30 allelic variants of MUC5AC encode 16 distinct proteins (5,249-6,325 aa) with cysteine-rich domain and VNTR copy-number variation. We group MUC5AC alleles into three phylogenetic clades: H1 (46%, ∼5,654 aa), H2 (33%, ∼5,742 aa), and H3 (7%, ∼6,325 aa). The two most common human MUC5AC variants are smaller than NHP gene models, suggesting a reduction in protein length during recent human evolution. Linkage disequilibrium and Tajima's D analyses reveal that East Asians carry exceptionally large blocks with an excess of rare variation (p < 0.05) at MUC5AC. To validate this result, we use Locityper for genotyping MUC5AC haplogroups in 2,600 unrelated samples from the 1000 Genomes Project. We observe a signature of positive selection in H1 among East Asians and a depletion of the likely ancestral haplogroup (H3). In Europeans, H3 alleles show an excess of common variation and deviate from Hardy-Weinberg equilibrium (p < 0.05), consistent with heterozygote advantage and balancing selection. This study provides a generalizable strategy to characterize complex protein-coding VNTRs for improved disease associations.

2.
bioRxiv ; 2024 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-38562829

RESUMO

The secreted mucins MUC5AC and MUC5B play critical defensive roles in airway pathogen entrapment and mucociliary clearance by encoding large glycoproteins with variable number tandem repeats (VNTRs). These polymorphic and degenerate protein coding VNTRs make the loci difficult to investigate with short reads. We characterize the structural diversity of MUC5AC and MUC5B by long-read sequencing and assembly of 206 human and 20 nonhuman primate (NHP) haplotypes. We find that human MUC5B is largely invariant (5761-5762aa); however, seven haplotypes have expanded VNTRs (6291-7019aa). In contrast, 30 allelic variants of MUC5AC encode 16 distinct proteins (5249-6325aa) with cysteine-rich domain and VNTR copy number variation. We grouped MUC5AC alleles into three phylogenetic clades: H1 (46%, ~5654aa), H2 (33%, ~5742aa), and H3 (7%, ~6325aa). The two most common human MUC5AC variants are smaller than NHP gene models, suggesting a reduction in protein length during recent human evolution. Linkage disequilibrium (LD) and Tajima's D analyses reveal that East Asians carry exceptionally large MUC5AC LD blocks with an excess of rare variation (p<0.05). To validate this result, we used Locityper for genotyping MUC5AC haplogroups in 2,600 unrelated samples from the 1000 Genomes Project. We observed signatures of positive selection in H1 and H2 among East Asians and a depletion of the likely ancestral haplogroup (H3). In Africans and Europeans, H3 alleles show an excess of common variation and deviate from Hardy-Weinberg equilibrium, consistent with heterozygote advantage and balancing selection. This study provides a generalizable strategy to characterize complex protein coding VNTRs for improved disease associations.

3.
Bioinformatics ; 39(39 Suppl 1): i279-i287, 2023 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-37387146

RESUMO

MOTIVATION: Low-copy repeats (LCRs) or segmental duplications are long segments of duplicated DNA that cover > 5% of the human genome. Existing tools for variant calling using short reads exhibit low accuracy in LCRs due to ambiguity in read mapping and extensive copy number variation. Variants in more than 150 genes overlapping LCRs are associated with risk for human diseases. METHODS: We describe a short-read variant calling method, ParascopyVC, that performs variant calling jointly across all repeat copies and utilizes reads independent of mapping quality in LCRs. To identify candidate variants, ParascopyVC aggregates reads mapped to different repeat copies and performs polyploid variant calling. Subsequently, paralogous sequence variants that can differentiate repeat copies are identified using population data and used for estimating the genotype of variants for each repeat copy. RESULTS: On simulated whole-genome sequence data, ParascopyVC achieved higher precision (0.997) and recall (0.807) than three state-of-the-art variant callers (best precision = 0.956 for DeepVariant and best recall = 0.738 for GATK) in 167 LCR regions. Benchmarking of ParascopyVC using the genome-in-a-bottle high-confidence variant calls for HG002 genome showed that it achieved a very high precision of 0.991 and a high recall of 0.909 across LCR regions, significantly better than FreeBayes (precision = 0.954 and recall = 0.822), GATK (precision = 0.888 and recall = 0.873) and DeepVariant (precision = 0.983 and recall = 0.861). ParascopyVC demonstrated a consistently higher accuracy (mean F1 = 0.947) than other callers (best F1 = 0.908) across seven human genomes. AVAILABILITY AND IMPLEMENTATION: ParascopyVC is implemented in Python and is freely available at https://github.com/tprodanov/ParascopyVC.


Assuntos
Variações do Número de Cópias de DNA , Duplicações Segmentares Genômicas , Humanos , Sequenciamento Completo do Genoma , Benchmarking , Genoma Humano
4.
BMC Med Genomics ; 11(Suppl 1): 13, 2018 02 13.
Artigo em Inglês | MEDLINE | ID: mdl-29504914

RESUMO

BACKGROUND: Cystic fibrosis (CF) is one of the most common life-threatening genetic disorders. Around 2000 variants in the CFTR gene have been identified, with some proportion known to be pathogenic and 300 disease-causing mutations have been characterized in detail by CFTR2 database, which complicates its analysis with conventional methods. METHODS: We conducted next-generation sequencing (NGS) in a cohort of 89 adult patients negative for p.Phe508del homozygosity. Complete clinical and demographic information were available for 84 patients. RESULTS: By combining MLPA with NGS, we identified disease-causing alleles in all the CF patients. Importantly, in 10% of cases, standard bioinformatics pipelines were inefficient in identifying causative mutations. Class IV-V mutations were observed in 38 (45%) cases, predominantly ones with pancreatic sufficient CF disease; rest of the patients had Class I-III mutations. Diabetes was seen only in patients homozygous for class I-III mutations. We found that 12% of the patients were heterozygous for more than two pathogenic CFTR mutations. Two patients were observed with p.[Arg1070Gln, Ser466*] complex allele which was associated with milder pulmonary obstructions (FVC 107 and 109% versus 67%, CI 95%: 63-72%; FEV 90 and 111% versus 47%, CI 95%: 37-48%). For the first time p.[Phe508del, Leu467Phe] complex allele was reported, observed in four patients (5%). CONCLUSION: NGS can be a more information-gaining technology compared to standard methods. Combined with its equivalent diagnostic performance, it can therefore be implemented in the clinical practice, although careful validation is still required.


Assuntos
Biomarcadores/análise , Regulador de Condutância Transmembrana em Fibrose Cística/deficiência , Fibrose Cística/genética , Fibrose Cística/patologia , Estudos de Associação Genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mutação , Adulto , Estudos de Coortes , Regulador de Condutância Transmembrana em Fibrose Cística/genética , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Adulto Jovem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA