Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 354
Filtrar
1.
Nature ; 2021 Apr 07.
Artigo em Inglês | MEDLINE | ID: mdl-33828295

RESUMO

The complete assembly of each human chromosome is essential for understanding human biology and evolution1,2. Here we use complementary long-read sequencing technologies to complete the linear assembly of human chromosome 8. Our assembly resolves the sequence of five previously long-standing gaps, including a 2.08-Mb centromeric α-satellite array, a 644-kb copy number polymorphism in the ß-defensin gene cluster that is important for disease risk, and an 863-kb variable number tandem repeat at chromosome 8q21.2 that can function as a neocentromere. We show that the centromeric α-satellite array is generally methylated except for a 73-kb hypomethylated region of diverse higher-order α-satellites enriched with CENP-A nucleosomes, consistent with the location of the kinetochore. In addition, we confirm the overall organization and methylation pattern of the centromere in a diploid human genome. Using a dual long-read sequencing approach, we complete high-quality draft assemblies of the orthologous centromere from chromosome 8 in chimpanzee, orangutan and macaque to reconstruct its evolutionary history. Comparative and phylogenetic analyses show that the higher-order α-satellite structure evolved in the great ape ancestor with a layered symmetry, in which more ancient higher-order repeats locate peripherally to monomeric α-satellites. We estimate that the mutation rate of centromeric satellite DNA is accelerated by more than 2.2-fold compared to the unique portions of the genome, and this acceleration extends into the flanking sequence.

2.
Am J Hum Genet ; 108(3): 383-385, 2021 03 04.
Artigo em Inglês | MEDLINE | ID: mdl-33667390

RESUMO

This article is based on the address given by the author at the 2020 virtual meeting of the American Society of Human Genetics (ASHG) on October 26, 2020. The video of the original address can be found at the ASHG website. Photo credit: Clare McLean.


Assuntos
Genética Médica/história , Genética Humana/história , Distinções e Prêmios , História do Século XXI , Humanos , Estados Unidos
3.
Am J Hum Genet ; 2021 Mar 26.
Artigo em Inglês | MEDLINE | ID: mdl-33789087

RESUMO

Virtually all genome sequencing efforts in national biobanks, complex and Mendelian disease programs, and medical genetic initiatives are reliant upon short-read whole-genome sequencing (srWGS), which presents challenges for the detection of structural variants (SVs) relative to emerging long-read WGS (lrWGS) technologies. Given this ubiquity of srWGS in large-scale genomics initiatives, we sought to establish expectations for routine SV detection from this data type by comparison with lrWGS assembly, as well as to quantify the genomic properties and added value of SVs uniquely accessible to each technology. Analyses from the Human Genome Structural Variation Consortium (HGSVC) of three families captured ~11,000 SVs per genome from srWGS and ~25,000 SVs per genome from lrWGS assembly. Detection power and precision for SV discovery varied dramatically by genomic context and variant class: 9.7% of the current GRCh38 reference is defined by segmental duplication (SD) and simple repeat (SR), yet 91.4% of deletions that were specifically discovered by lrWGS localized to these regions. Across the remaining 90.3% of reference sequence, we observed extremely high (93.8%) concordance between technologies for deletions in these datasets. In contrast, lrWGS was superior for detection of insertions across all genomic contexts. Given that non-SD/SR sequences encompass 95.9% of currently annotated disease-associated exons, improved sensitivity from lrWGS to discover novel pathogenic deletions in these currently interpretable genomic regions is likely to be incremental. However, these analyses highlight the considerable added value of assembly-based lrWGS to create new catalogs of insertions and transposable elements, as well as disease-associated repeat expansions in genomic sequences that were previously recalcitrant to routine assessment.

4.
Am J Med Genet A ; 185(4): 1039-1046, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33439542

RESUMO

Since the introduction of next-generation sequencing, an increasing number of disorders have been discovered to have genetic etiology. To address diverse clinical questions and coordinate research activities that arise with the identification of these rare disorders, we developed the Human Disease Genes website series (HDG website series): an international digital library that records detailed information on the clinical phenotype of novel genetic variants in the human genome (https://humandiseasegenes.info/). Each gene website is moderated by a dedicated team of clinicians and researchers, focused on specific genes, and provides up-to-date-including unpublished-clinical information. The HDG website series is expanding rapidly with 424 genes currently adopted by 325 moderators from across the globe. On average, a gene website has detailed phenotypic information of 14.4 patients. There are multiple examples of added value, one being the ARID1B gene website, which was recently utilized in research to collect clinical information of 81 new patients. Additionally, several gene websites have more data available than currently published in the literature. In conclusion, the HDG website series provides an easily accessible, open and up-to-date clinical data resource for patients with pathogenic variants of individual genes. This is a valuable resource not only for clinicians dealing with rare genetic disorders such as developmental delay and autism, but other professionals working in diagnostics and basic research. Since the HDG website series is a dynamic platform, its data also include the phenotype of yet unpublished patients curated by professionals providing higher quality clinical detail to improve management of these rare disorders.

5.
Nat Biotechnol ; 2020 Dec 07.
Artigo em Inglês | MEDLINE | ID: mdl-33288906

RESUMO

Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing1,2 with continuous long-read or high-fidelity3 sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms.

6.
J Autism Dev Disord ; 2020 Nov 11.
Artigo em Inglês | MEDLINE | ID: mdl-33175317

RESUMO

Self-injurious behaviors (SIB) are elevated in autism spectrum disorder (ASD) and related genetic disorders, but the genetic and biological mechanisms that contribute to SIB in ASD are poorly understood. This study examined rates and predictors of SIB in 112 individuals with disruptive mutations to ASD-risk genes. Current SIB were reported in 30% of participants and associated with poorer cognitive and adaptive skills. History of severe abdominal pain predicted higher rates of SIB and SIB severity after controlling for age and adaptive behavior; individuals with a history of severe abdominal pain were eight times more likely to exhibit SIB than those with no history. Future research is needed to examine associations between genetic risk, pain, and SIB in this population.

7.
Front Immunol ; 11: 2136, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33072076

RESUMO

An incomplete ascertainment of genetic variation within the highly polymorphic immunoglobulin heavy chain locus (IGH) has hindered our ability to define genetic factors that influence antibody-mediated processes. Due to locus complexity, standard high-throughput approaches have failed to accurately and comprehensively capture IGH polymorphism. As a result, the locus has only been fully characterized two times, severely limiting our knowledge of human IGH diversity. Here, we combine targeted long-read sequencing with a novel bioinformatics tool, IGenotyper, to fully characterize IGH variation in a haplotype-specific manner. We apply this approach to eight human samples, including a haploid cell line and two mother-father-child trios, and demonstrate the ability to generate high-quality assemblies (>98% complete and >99% accurate), genotypes, and gene annotations, identifying 2 novel structural variants and 15 novel IGH alleles. We show multiplexing allows for scaling of the approach without impacting data quality, and that our genotype call sets are more accurate than short-read (>35% increase in true positives and >97% decrease in false-positives) and array/imputation-based datasets. This framework establishes a desperately needed foundation for leveraging IG genomic data to study population-level variation in antibody-mediated immunity, critical for bettering our understanding of disease risk, and responses to vaccines and therapeutics.

8.
Genome Res ; 30(11): 1680-1693, 2020 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-33093070

RESUMO

Rhesus macaque is an Old World monkey that shared a common ancestor with human ∼25 Myr ago and is an important animal model for human disease studies. A deep understanding of its genetics is therefore required for both biomedical and evolutionary studies. Among structural variants, inversions represent a driving force in speciation and play an important role in disease predisposition. Here we generated a genome-wide map of inversions between human and macaque, combining single-cell strand sequencing with cytogenetics. We identified 375 total inversions between 859 bp and 92 Mbp, increasing by eightfold the number of previously reported inversions. Among these, 19 inversions flanked by segmental duplications overlap with recurrent copy number variants associated with neurocognitive disorders. Evolutionary analyses show that in 17 out of 19 cases, the Hominidae orientation of these disease-associated regions is always derived. This suggests that duplicated sequences likely played a fundamental role in generating inversions in humans and great apes, creating architectures that nowadays predispose these regions to disease-associated genetic instability. Finally, we identified 861 genes mapping at 156 inversions breakpoints, with some showing evidence of differential expression in human and macaque cell lines, thus highlighting candidates that might have contributed to the evolution of species-specific features. This study depicts the most accurate fine-scale map of inversions between human and macaque using a two-pronged integrative approach, such as single-cell strand sequencing and cytogenetics, and represents a valuable resource toward understanding of the biology and evolution of primate species.

9.
Autism Res ; 13(10): 1659-1669, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-32918531

RESUMO

Approximately one-fourth of autism spectrum disorder (ASD) cases are associated with a disruptive genetic variant. Many of these ASD genotypes have been described previously, and are characterized by unique constellations of medical, psychiatric, developmental, and behavioral features. Development of precision medicine care for affected individuals has been challenging due to the phenotypic heterogeneity that exists even within each genetic subtype. In the present study, we identify developmental milestones that predict cognitive and adaptive outcomes for five of the most common ASD genotypes. Sixty-five youth with a known pathogenic variant involving ADNP, CHD8, DYRK1A, GRIN2B, or SCN2A genes participated in cognitive and adaptive testing. Exploratory linear regressions were used to identify developmental milestones that predicted cognitive and adaptive outcomes within each gene group. We hypothesized that the earliest and most predictive milestones would vary across gene groups, but would be consistent across outcomes within each genetic subtype. Within the ADNP group, age of walking predicted cognitive outcomes, while age of first words predicted adaptive behaviors. Age of phrases predicted adaptive functioning in the CHD8 group, but cognitive outcomes were not clearly associated with early developmental milestones. Verbal milestones were the strongest predictors of cognitive and adaptive outcomes for individuals with mutations to DYRK1A, GRIN2B, or SCN2A. These trends inform decisions about treatment planning and long-term expectations for affected individuals, and they add to the growing body of research linking molecular genetic function to brain development and phenotypic outcomes. LAY SUMMARY: Researchers have found many genetic causes of autism including mutations to ADNP, CHD8, DYRK1A, GRIN2B, and SCN2A genes. We found that each genetic cause had different early developmental milestones that explained the overall functioning of the children when they were older. Depending on the genetic cause, the age that a child first starts walking and/or talking may help to better understand and support a child's development who has a mutation to one of the above genes. Autism Res 2020, 13: 1659-1669. © 2020 International Society for Autism Research and Wiley Periodicals LLC.

10.
Genome Biol ; 21(1): 202, 2020 08 10.
Artigo em Inglês | MEDLINE | ID: mdl-32778141

RESUMO

BACKGROUND: The complex interspersed pattern of segmental duplications in humans is responsible for rearrangements associated with neurodevelopmental disease, including the emergence of novel genes important in human brain evolution. We investigate the evolution of LCR16a, a putative driver of this phenomenon that encodes one of the most rapidly evolving human-ape gene families, nuclear pore interacting protein (NPIP). RESULTS: Comparative analysis shows that LCR16a has independently expanded in five primate lineages over the last 35 million years of primate evolution. The expansions are associated with independent lineage-specific segmental duplications flanking LCR16a leading to the emergence of large interspersed duplication blocks at non-orthologous chromosomal locations in each primate lineage. The intron-exon structure of the NPIP gene family has changed dramatically throughout primate evolution with different branches showing characteristic gene models yet maintaining an open reading frame. In the African ape lineage, we detect signatures of positive selection that occurred after a transition to more ubiquitous expression among great ape tissues when compared to Old World and New World monkeys. Mouse transgenic experiments from baboon and human genomic loci confirm these expression differences and suggest that the broader ape expression pattern arose due to mutational changes that emerged in cis. CONCLUSIONS: LCR16a promotes serial interspersed duplications and creates hotspots of genomic instability that appear to be an ancient property of primate genomes. Dramatic changes to NPIP gene structure and altered tissue expression preceded major bouts of positive selection in the African ape lineage, suggestive of a gene undergoing strong adaptive evolution.

11.
Genome Res ; 30(9): 1291-1305, 2020 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-32801147

RESUMO

Complete and accurate genome assemblies form the basis of most downstream genomic analyses and are of critical importance. Recent genome assembly projects have relied on a combination of noisy long-read sequencing and accurate short-read sequencing, with the former offering greater assembly continuity and the latter providing higher consensus accuracy. The recently introduced Pacific Biosciences (PacBio) HiFi sequencing technology bridges this divide by delivering long reads (>10 kbp) with high per-base accuracy (>99.9%). Here we present HiCanu, a modification of the Canu assembler designed to leverage the full potential of HiFi reads via homopolymer compression, overlap-based error correction, and aggressive false overlap filtering. We benchmark HiCanu with a focus on the recovery of haplotype diversity, major histocompatibility complex (MHC) variants, satellite DNAs, and segmental duplications. For diploid human genomes sequenced to 30× HiFi coverage, HiCanu achieved superior accuracy and allele recovery compared to the current state of the art. On the effectively haploid CHM13 human cell line, HiCanu achieved an NG50 contig size of 77 Mbp with a per-base consensus accuracy of 99.999% (QV50), surpassing recent assemblies of high-coverage, ultralong Oxford Nanopore Technologies (ONT) reads in terms of both accuracy and continuity. This HiCanu assembly correctly resolves 337 out of 341 validation BACs sampled from known segmental duplications and provides the first preliminary assemblies of nine complete human centromeric regions. Although gaps and errors still remain within the most challenging regions of the genome, these results represent a significant advance toward the complete assembly of human genomes.

12.
Am J Hum Genet ; 107(3): 445-460, 2020 09 03.
Artigo em Inglês | MEDLINE | ID: mdl-32750315

RESUMO

Tandem repeats are proposed to contribute to human-specific traits, and more than 40 tandem repeat expansions are known to cause neurological disease. Here, we characterize a human-specific 69 bp variable number tandem repeat (VNTR) in the last intron of WDR7, which exhibits striking variability in both copy number and nucleotide composition, as revealed by long-read sequencing. In addition, greater repeat copy number is significantly enriched in three independent cohorts of individuals with sporadic amyotrophic lateral sclerosis (ALS). Each unit of the repeat forms a stem-loop structure with the potential to produce microRNAs, and the repeat RNA can aggregate when expressed in cells. We leveraged its remarkable sequence variability to align the repeat in 288 samples and uncover its mechanism of expansion. We found that the repeat expands in the 3'-5' direction, in groups of repeat units divisible by two. The expansion patterns we observed were consistent with duplication events, and a replication error called template switching. We also observed that the VNTR is expanded in both Denisovan and Neanderthal genomes but is fixed at one copy or fewer in non-human primates. Evaluating the repeat in 1000 Genomes Project samples reveals that some repeat segments are solely present or absent in certain geographic populations. The large size of the repeat unit in this VNTR, along with our multiplexed sequencing strategy, provides an unprecedented opportunity to study mechanisms of repeat expansion, and a framework for evaluating the roles of VNTRs in human evolution and disease.


Assuntos
Proteínas Adaptadoras de Transdução de Sinal/genética , Esclerose Amiotrófica Lateral/genética , Evolução Molecular , Sequências de Repetição em Tandem/genética , Idoso , Doença de Alzheimer/genética , Doença de Alzheimer/patologia , Esclerose Amiotrófica Lateral/patologia , Expansão das Repetições de DNA/genética , Feminino , Regulação da Expressão Gênica/genética , Humanos , Masculino , Repetições Minissatélites/genética , Fenótipo , Especificidade da Espécie
13.
Nat Biotechnol ; 38(9): 1044-1053, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32686750

RESUMO

De novo assembly of a human genome using nanopore long-read sequences has been reported, but it used more than 150,000 CPU hours and weeks of wall-clock time. To enable rapid human genome assembly, we present Shasta, a de novo long-read assembler, and polishing algorithms named MarginPolish and HELEN. Using a single PromethION nanopore sequencer and our toolkit, we assembled 11 highly contiguous human genomes de novo in 9 d. We achieved roughly 63× coverage, 42-kb read N50 values and 6.5× coverage in reads >100 kb using three flow cells per sample. Shasta produced a complete haploid human genome assembly in under 6 h on a single commercial compute node. MarginPolish and HELEN polished haploid assemblies to more than 99.9% identity (Phred quality score QV = 30) with nanopore reads alone. Addition of proximity-ligation sequencing enabled near chromosome-level scaffolds for all 11 genomes. We compare our assembly performance to existing methods for diploid, haploid and trio-binned human samples and report superior accuracy and speed.


Assuntos
Genoma Humano/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento por Nanoporos , Análise de Sequência de DNA/métodos , Algoritmos , Benchmarking , Cromossomos Humanos/genética , Aprendizado Profundo , Genômica , Antígenos HLA/genética , Haploidia , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Análise de Sequência de DNA/normas
14.
Genet Med ; 22(11): 1838-1850, 2020 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-32694869

RESUMO

PURPOSE: Nontruncating variants in SMARCA2, encoding a catalytic subunit of SWI/SNF chromatin remodeling complex, cause Nicolaides-Baraitser syndrome (NCBRS), a condition with intellectual disability and multiple congenital anomalies. Other disorders due to SMARCA2 are unknown. METHODS: By next-generation sequencing, we identified candidate variants in SMARCA2 in 20 individuals from 18 families with a syndromic neurodevelopmental disorder not consistent with NCBRS. To stratify variant interpretation, we functionally analyzed SMARCA2 variants in yeasts and performed transcriptomic and genome methylation analyses on blood leukocytes. RESULTS: Of 20 individuals, 14 showed a recognizable phenotype with recurrent features including epicanthal folds, blepharophimosis, and downturned nasal tip along with variable degree of intellectual disability (or blepharophimosis intellectual disability syndrome [BIS]). In contrast to most NCBRS variants, all SMARCA2 variants associated with BIS are localized outside the helicase domains. Yeast phenotype assays differentiated NCBRS from non-NCBRS SMARCA2 variants. Transcriptomic and DNA methylation signatures differentiated NCBRS from BIS and those with nonspecific phenotype. In the remaining six individuals with nonspecific dysmorphic features, clinical and molecular data did not permit variant reclassification. CONCLUSION: We identified a novel recognizable syndrome named BIS associated with clustered de novo SMARCA2 variants outside the helicase domains, phenotypically and molecularly distinct from NCBRS.

15.
Nat Rev Genet ; 21(10): 597-614, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32504078

RESUMO

Over the past decade, long-read, single-molecule DNA sequencing technologies have emerged as powerful players in genomics. With the ability to generate reads tens to thousands of kilobases in length with an accuracy approaching that of short-read sequencing technologies, these platforms have proven their ability to resolve some of the most challenging regions of the human genome, detect previously inaccessible structural variants and generate some of the first telomere-to-telomere assemblies of whole chromosomes. Long-read sequencing technologies will soon permit the routine assembly of diploid genomes, which will revolutionize genomics by revealing the full spectrum of human genetic variation, resolving some of the missing heritability and leading to the discovery of novel mechanisms of disease.

16.
Autism Res ; 13(8): 1300-1310, 2020 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-32597026

RESUMO

Individuals with 16p11.2 copy number variant (CNV) show considerable phenotypic heterogeneity. Although autism spectrum disorder (ASD) is reported in approximately 20-23% of individuals with 16p11.2 CNVs, ASD-associated symptoms are observed in those without a clinical ASD diagnosis. Previous work has shown that genetic variation and prenatal and perinatal birth complications influence ASD risk and symptom severity. This study examined the impact of genetic and environmental risk factors on phenotypic heterogeneity among 16p11.2 CNV carriers. Participants included individuals with a 16p11.2 deletion (N = 96) or duplication (N = 77) with exome sequencing from the Simons VIP study. The presence of prenatal factors, perinatal events, additional genetic events, and gender was studied. Regression analyses examined the contribution of each risk factor on ASD symptomatology, cognitive functioning, and adaptive abilities. For deletion carriers, perinatal and additional genetic events were associated with increased ASD symptomatology and decrements in cognitive and adaptive functioning. For duplication carriers, secondary genetic events were associated with greater cognitive impairments. Being female sex was a protective factor for both deletion and duplication carriers. Our findings suggest that ASD-associated risk factors contribute to the variability in symptom presentation in individuals with 16p11.2 CNVs. LAY SUMMARY: There are a wide range of autism spectrum disorder (ASD) symptoms and abilities observed for individuals with genetic changes of the 16p11.2 region. Here, we found perinatal complications contributed to more severe ASD symptoms (deletion carriers) and additional genetic mutations contributed to decreased cognitive abilities (deletion and duplication carriers). A potential protective factor was also observed for females with 16p11.2 variations. Autism Res 2020, 13: 1300-1310. © 2020 International Society for Autism Research, Wiley Periodicals, Inc.

17.
Nat Genet ; 52(8): 849-858, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32541924

RESUMO

Inversions play an important role in disease and evolution but are difficult to characterize because their breakpoints map to large repeats. We increased by sixfold the number (n = 1,069) of previously reported great ape inversions by using single-cell DNA template strand and long-read sequencing. We find that the X chromosome is most enriched (2.5-fold) for inversions, on the basis of its size and duplication content. There is an excess of differentially expressed primate genes near the breakpoints of large (>100 kilobases (kb)) inversions but not smaller events. We show that when great ape lineage-specific duplications emerge, they preferentially (approximately 75%) occur in an inverted orientation compared to that at their ancestral locus. We construct megabase-pair scale haplotypes for individual chromosomes and identify 23 genomic regions that have recurrently toggled between a direct and an inverted state over 15 million years. The direct orientation is most frequently the derived state for human polymorphisms that predispose to recurrent copy number variants associated with neurodevelopmental disease.


Assuntos
Inversão Cromossômica/genética , Genoma/genética , Hominidae/genética , Animais , Cromossomos/genética , Variações do Número de Cópias de DNA/genética , Evolução Molecular , Feminino , Haplótipos/genética , Humanos , Masculino
18.
Am J Hum Genet ; 106(5): 587-595, 2020 05 07.
Artigo em Inglês | MEDLINE | ID: mdl-32359473

RESUMO

Despite evidence that deleterious variants in the same genes are implicated across multiple neurodevelopmental and neuropsychiatric disorders, there has been considerable interest in identifying genes that, when mutated, confer risk that is largely specific for autism spectrum disorder (ASD). Here, we review the findings and limitations of recent efforts to identify relatively "autism-specific" genes, efforts which focus on rare variants of large effect size that are thought to account for the observed phenotypes. We present a divergent interpretation of published evidence; discuss practical and theoretical issues related to studying the relationships between rare, large-effect deleterious variants and neurodevelopmental phenotypes; and describe potential future directions of this research. We argue that there is currently insufficient evidence to establish meaningful ASD specificity of any genes based on large-effect rare-variant data.


Assuntos
Transtorno do Espectro Autista/diagnóstico , Transtorno do Espectro Autista/genética , Incerteza , Estudos de Coortes , Testes Genéticos , Genótipo , Humanos , Reprodutibilidade dos Testes
20.
Genes (Basel) ; 11(2)2020 02 18.
Artigo em Inglês | MEDLINE | ID: mdl-32085667

RESUMO

POTE (prostate, ovary, testis, and placenta expressed) genes belong to a primate-specific gene family expressed in prostate, ovary, and testis as well as in several cancers including breast, prostate, and lung cancers. Due to their tumor-specific expression, POTEs are potential oncogenes, therapeutic targets, and biomarkers for these malignancies. This gene family maps within human and primate segmental duplications with a copy number ranging from two to 14 in different species. Due to the high sequence identity among the gene copies, specific efforts are needed to assemble these loci in order to correctly define the organization and evolution of the gene family. Using single-molecule, real-time (SMRT) sequencing, in silico analyses, and molecular cytogenetics, we characterized the structure, copy number, and chromosomal distribution of the POTE genes, as well as their expression in normal and disease tissues, and provided a comparative analysis of the POTE organization and gene structure in primate genomes. We were able, for the first time, to de novo sequence and assemble a POTE tandem duplication in marmoset that is misassembled and collapsed in the reference genome, thus revealing the presence of a second POTE copy. Taken together, our findings provide comprehensive insights into the evolutionary dynamics of the primate-specific POTE gene family, involving gene duplications, deletions, and long interspersed nuclear element (LINE) transpositions to explain the actual repertoire of these genes in human and primate genomes.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...