Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 20 de 2.100
Filtrer
1.
Article de Anglais | MEDLINE | ID: mdl-38862430

RÉSUMÉ

Tandem duplication (TD) is a major type of structural variations (SVs) that plays an important role in novel gene formation and human diseases. However, TDs are often missed or incorrectly classified as insertions by most modern SV detection methods due to the lack of specialized operation on TD-related mutational signals. Herein, we developed a TD detection module for the Pindel tool, referred to as Pindel-TD, based on a TD-specific pattern growth approach. Pindel-TD is capable of detecting TDs with a wide size range at single nucleotide resolution. Using simulated and real read data from HG002, we demonstrated that Pindel-TD outperforms other leading methods in terms of precision, recall, F1-score, and robustness. Furthermore, by applying Pindel-TD to data generated from the K562 cancer cell line, we identified a TD located at the seventh exon of SAGE1, providing an explanation for its high expression. Pindel-TD is available for non-commercial use at https://github.com/xjtu-omics/pindel.


Sujet(s)
Logiciel , Humains , Cellules K562 , Duplication de gène , Séquences répétées en tandem/génétique , Algorithmes
2.
Gene ; 926: 148644, 2024 Oct 30.
Article de Anglais | MEDLINE | ID: mdl-38851366

RÉSUMÉ

The non-coding regions of the mitochondrial DNAs (mtDNAs) of hares, rabbits, and pikas (Lagomorpha) contain short (∼20 bp) and long (130-160 bp) tandem repeats, absent in related mammalian orders. In the presented study, we provide in-depth analysis for mountain hare (Lepus timidus) and brown hare (L. europaeus) mtDNA non-coding regions, together with a species- and population-level analysis of tandem repeat variation. Mountain hare short tandem repeats (SRs) as well as other analyzed hare species consist of two conserved 10 bp motifs, with only brown hares exhibiting a single, more variable motif. Long tandem repeats (LRs) also differ in sequence and copy number between species. Mountain hares have four to seven LRs, median value five, while brown hares exhibit five to nine LRs, median value six. Interestingly, introgressed mountain hare mtDNA in brown hares obtained an intermediate LR length distribution, with median copy number being the same as with conspecific brown hare mtDNA. In contrast, transfer of brown hare mtDNA into cultured mtDNA-less mountain hare cells maintained the original LR number, whereas the reciprocal transfer caused copy number instability, suggesting that cellular environment rather than the nuclear genomic background plays a role in the LR maintenance. Due to their dynamic nature and separation from other known conserved sequence elements on the non-coding region of hare mitochondrial genomes, the tandem repeat elements likely to represent signatures of ancient genetic rearrangements. clarifying the nature and dynamics of these rearrangements may shed light on the possible role of NCR repeated elements in mitochondria and in species evolution.


Sujet(s)
ADN mitochondrial , Évolution moléculaire , Génome mitochondrial , Lepus , Polymorphisme génétique , Spécificité d'espèce , Séquences répétées en tandem , Animaux , Lepus/génétique , Séquences répétées en tandem/génétique , ADN mitochondrial/génétique , Phylogenèse
6.
PLoS Genet ; 20(5): e1011296, 2024 May.
Article de Anglais | MEDLINE | ID: mdl-38814980

RÉSUMÉ

Exceptions to Mendelian inheritance often highlight novel chromosomal behaviors. The maize Pl1-Rhoades allele conferring plant pigmentation can display inheritance patterns deviating from Mendelian expectations in a behavior known as paramutation. However, the chromosome features mediating such exceptions remain unknown. Here we show that small RNA production reflecting RNA polymerase IV function within a distal downstream set of five tandem repeats is coincident with meiotically-heritable repression of the Pl1-Rhoades transcription unit. A related pl1 haplotype with three, but not one with two, repeat units also displays the trans-homolog silencing typifying paramutations. 4C interactions, CHD3a-dependent small RNA profiles, nuclease sensitivity, and polyadenylated RNA levels highlight a repeat subregion having regulatory potential. Our comparative and mutant analyses show that transcriptional repression of Pl1-Rhoades correlates with 24-nucleotide RNA production and cytosine methylation at this subregion indicating the action of a specific DNA-dependent RNA polymerase complex. These findings support a working model in which pl1 paramutation depends on trans-chromosomal RNA-directed DNA methylation operating at a discrete cis-linked and copy-number-dependent transcriptional regulatory element.


Sujet(s)
Allèles , Méthylation de l'ADN , Régulation de l'expression des gènes végétaux , Séquences répétées en tandem , Zea mays , Zea mays/génétique , Méthylation de l'ADN/génétique , Séquences répétées en tandem/génétique , Mutation , Pigmentation/génétique , Haplotypes , Protéines végétales/génétique , Protéines végétales/métabolisme
7.
Virus Res ; 345: 199390, 2024 Jul.
Article de Anglais | MEDLINE | ID: mdl-38710287

RÉSUMÉ

Cnaphalocrocis medinalis granulovirus (CnmeGV), belonging to Betabaculovirus cnamedinalis, can infect the rice pest, the rice leaf roller. In 1979, a CnmeGV isolate, CnmeGV-EP, was collected from Enping County, China. In 2014, we collected another CnmeGV isolate, CnmeGV-EPDH3, at the same location and obtained the complete virus genome sequence using Illumina and ONT sequencing technologies. By combining these two virus isolates, we updated the genome annotation of CnmeGV and conducted an in-depth analysis of its genome features. CnmeGV genome contains abundant tandem repeat sequences, and the repeating units in the homologous regions (hrs) exhibit overlapping and nested patterns. The genetic variations within EPDH3 population show the high stability of CnmeGV genome, and tandem repeats are the only region of high genetic variation in CnmeGV genome replication. Some defective viral genomes formed by recombination were found within the population. Comparison analysis of the two virus isolates collected from Enping showed that the proteins encoded by the CnmeGV-specific genes were less conserved relative to the baculovirus core genes. At the genomic level, there are a large number of SNPs and InDels between the two virus isolates, especially in and around the bro genes and hrs. Additionally, we discovered that CnmeGV acquired a segment of non-ORF sequence from its host, which does not provide any new proteins but rather serves as redundant genetic material integrated into the viral genome. Furthermore, we observed that the host's transposon piggyBac has inserted into some virus genes. Together, dsDNA viruses could acquire non-coding genetic material from their hosts to expand the size of their genomes. These findings provide new insights into the evolution of dsDNA viruses.


Sujet(s)
Variation génétique , Génome viral , Animaux , Phylogenèse , Chine , Granulovirus/génétique , Granulovirus/classification , Granulovirus/isolement et purification , Séquençage du génome entier , Oryza/virologie , Séquences répétées en tandem/génétique , Maladies des plantes/virologie , Recombinaison génétique
8.
Cancer Sci ; 115(6): 1851-1865, 2024 Jun.
Article de Anglais | MEDLINE | ID: mdl-38581120

RÉSUMÉ

Aberrant expression of forkhead box transcription factor 1 (FOXM1) plays critical roles in a variety of human malignancies and predicts poor prognosis. However, little is known about the crosstalk between FOXM1 and long noncoding RNAs (lncRNAs) in tumorigenesis. The present study identifies a previously uncharacterized lncRNA XLOC_008672 in gastric cancer (GC), which is regulated by FOXM1 and possesses multiple copies of tandem repetitive sequences. LncRNA microarrays are used to screen differentially expressed lncRNAs in FOXM1 knockdown GC cells, and then the highest fold downregulation lncRNA XLOC_008672 is screened out. Sequence analysis reveals that the new lncRNA contains 62 copies of 37-bp tandem repeats. It is transcriptionally activated by FOXM1 and functions as a downstream effector of FOXM1 in GC cells through in vitro and in vivo functional assays. Elevated expression of XLOC_008672 is found in GC tissues and indicates worse prognosis. Mechanistically, XLOC_008672 can bind to small nuclear ribonucleoprotein polypeptide A (SNRPA), thereby enhancing mRNA stability of Ras-GTPase-activating protein SH3 domain-binding protein 1 (G3BP1) and, consequently, facilitating GC cell proliferation and migration. Our study discovers a new uncharacterized lncRNA XLOC_008672 involved in GC carcinogenesis and progression. Targeting FOXM1/XLOC_008672/SNRPA/G3BP1 signaling axis might be a promising therapeutic strategy for GC.


Sujet(s)
Carcinogenèse , Prolifération cellulaire , Protéine M1 à motif en tête de fourche , Régulation de l'expression des gènes tumoraux , ARN long non codant , Tumeurs de l'estomac , Animaux , Femelle , Humains , Mâle , Souris , Lignée cellulaire tumorale , Mouvement cellulaire/génétique , Prolifération cellulaire/génétique , Helicase , Protéine M1 à motif en tête de fourche/génétique , Protéine M1 à motif en tête de fourche/métabolisme , Souris nude , Protéines liant le poly-adp-ribose/génétique , Protéines liant le poly-adp-ribose/métabolisme , Pronostic , RNA helicases , Protéines à motif de reconnaissance de l'ARN/génétique , Protéines à motif de reconnaissance de l'ARN/métabolisme , ARN long non codant/génétique , ARN long non codant/métabolisme , Tumeurs de l'estomac/génétique , Tumeurs de l'estomac/anatomopathologie , Tumeurs de l'estomac/métabolisme , Séquences répétées en tandem/génétique
9.
Cell ; 187(9): 2336-2341.e5, 2024 Apr 25.
Article de Anglais | MEDLINE | ID: mdl-38582080

RÉSUMÉ

The Genome Aggregation Database (gnomAD), widely recognized as the gold-standard reference map of human genetic variation, has largely overlooked tandem repeat (TR) expansions, despite the fact that TRs constitute ∼6% of our genome and are linked to over 50 human diseases. Here, we introduce the TR-gnomAD (https://wlcb.oit.uci.edu/TRgnomAD), a biobank-scale reference of 0.86 million TRs derived from 338,963 whole-genome sequencing (WGS) samples of diverse ancestries (39.5% non-European samples). TR-gnomAD offers critical insights into ancestry-specific disease prevalence using disparities in TR unit number frequencies among ancestries. Moreover, TR-gnomAD is able to differentiate between common, presumably benign TR expansions, which are prevalent in TR-gnomAD, from those potentially pathogenic TR expansions, which are found more frequently in disease groups than within TR-gnomAD. Together, TR-gnomAD is an invaluable resource for researchers and physicians to interpret TR expansions in individuals with genetic diseases.


Sujet(s)
Génome humain , Séquences répétées en tandem , Humains , Séquences répétées en tandem/génétique , Séquençage du génome entier , Bases de données génétiques , Expansion de séquence répétée de l'ADN/génétique , Étude d'association pangénomique
10.
EBioMedicine ; 101: 105027, 2024 Mar.
Article de Anglais | MEDLINE | ID: mdl-38418263

RÉSUMÉ

BACKGROUND: Cardiomyopathy is a clinically and genetically heterogeneous heart condition that can lead to heart failure and sudden cardiac death in childhood. While it has a strong genetic basis, the genetic aetiology for over 50% of cardiomyopathy cases remains unknown. METHODS: In this study, we analyse the characteristics of tandem repeats from genome sequence data of unrelated individuals diagnosed with cardiomyopathy from Canada and the United Kingdom (n = 1216) and compare them to those found in the general population. We perform burden analysis to identify genomic and epigenomic features that are impacted by rare tandem repeat expansions (TREs), and enrichment analysis to identify functional pathways that are involved in the TRE-associated genes in cardiomyopathy. We use Oxford Nanopore targeted long-read sequencing to validate repeat size and methylation status of one of the most recurrent TREs. We also compare the TRE-associated genes to those that are dysregulated in the heart tissues of individuals with cardiomyopathy. FINDINGS: We demonstrate that tandem repeats that are rarely expanded in the general population are predominantly expanded in cardiomyopathy. We find that rare TREs are disproportionately present in constrained genes near transcriptional start sites, have high GC content, and frequently overlap active enhancer H3K27ac marks, where expansion-related DNA methylation may reduce gene expression. We demonstrate the gene silencing effect of expanded CGG tandem repeats in DIP2B through promoter hypermethylation. We show that the enhancer-associated loci are found in genes that are highly expressed in human cardiomyocytes and are differentially expressed in the left ventricle of the heart in individuals with cardiomyopathy. INTERPRETATION: Our findings highlight the underrecognized contribution of rare tandem repeat expansions to the risk of cardiomyopathy and suggest that rare TREs contribute to ∼4% of cardiomyopathy risk. FUNDING: Government of Ontario (RKCY), The Canadian Institutes of Health Research PJT 175329 (RKCY), The Azrieli Foundation (RKCY), SickKids Catalyst Scholar in Genetics (RKCY), The University of Toronto McLaughlin Centre (RKCY, SM), Ted Rogers Centre for Heart Research (SM), Data Sciences Institute at the University of Toronto (SM), The Canadian Institutes of Health Research PJT 175034 (SM), The Canadian Institutes of Health Research ENP 161429 under the frame of ERA PerMed (SM, RL), Heart and Stroke Foundation of Ontario & Robert M Freedom Chair in Cardiovascular Science (SM), Bitove Family Professorship of Adult Congenital Heart Disease (EO), Canada Foundation for Innovation (SWS, JR), Canada Research Chair (PS), Genome Canada (PS, JR), The Canadian Institutes of Health Research (PS).


Sujet(s)
Cardiomyopathies , Cardiopathies congénitales , Humains , Adulte , Cardiopathies congénitales/génétique , Séquences répétées en tandem/génétique , Méthylation de l'ADN , Cardiomyopathies/génétique , Ontario , Protéines de tissu nerveux/génétique
11.
PLoS One ; 19(1): e0295595, 2024.
Article de Anglais | MEDLINE | ID: mdl-38271341

RÉSUMÉ

Mitochondria are known to play an essential role in the cell. These organelles contain their own DNA, which is divided in a coding and non-coding region (NCR). While much of the NCR's function is unknown, tandem repeats have been observed in several vertebrates, with extreme intra-individual, intraspecific and interspecific variation. Taking advantage of a new complete reference for the mitochondrial genome of the Afro-European Barn Owl (Tyto alba), as well as 172 whole genome-resequencing; we (i) describe the reference mitochondrial genome with a special focus on the repeats in the NCR, (ii) quantify the variation in number of copies between individuals, and (iii) explore the possible factors associated with the variation in the number of repetitions. The reference mitochondrial genome revealed a long (256bp) and a short (80bp) tandem repeat in the NCR region. The re-sequenced genomes showed a great variation in number of copies between individuals, with 4 to 38 copies of the Long and 6 to 135 copies of the short repeat. Among the factors associated with this variation between individuals, the tissue used for extraction was the most significant. The exact mechanisms of the formations of these repeats are still to be discovered and understanding them will help explain the maintenance of the polymorphism in the number of copies, as well as their interactions with the metabolism, the aging and health of the individuals.


Sujet(s)
Génome mitochondrial , Strigiformes , Animaux , Humains , Variations de nombre de copies de segment d'ADN , Strigiformes/génétique , Séquence nucléotidique , Séquences répétées en tandem/génétique
12.
Biochem Biophys Res Commun ; 692: 149349, 2024 Jan 15.
Article de Anglais | MEDLINE | ID: mdl-38056160

RÉSUMÉ

While it is well established that a mere 2% of human DNA nucleotides are involved in protein coding, the remainder of the DNA plays a vital role in the preservation of normal cellular genetic function. A significant proportion of tandem repeats (TRs) are present in non-coding DNA. TRs - specific sequences of nucleotides that entail numerous repetitions of a given fragment. In this study, we employed our novel algorithm grounded in finite automata theory, which we refer to as Dafna, to investigate for the first time the likelihood of these nucleotide sequences forming non-canonical DNA structures (NS). Such structures include G-quadruplexes, i-motifs, hairpins, and triplexes. The tandem repeats under consideration in our research encompassed sequences containing 1 to 6 nucleotides per repeated fragment. For comparison, we employed a set of randomly generated sequences of the same length (60 nucleotides) as a benchmark. The outcomes of our research exposed a disparity between the potential for NS formation in random sequences and tandem repeats. Our findings affirm that the propensity of DNA and RNA to form NS is closely tied to various genetic disorders, including Huntington's disease, Fragile X syndrome, and Friedreich's ataxia. In the concluding discussion, we present a proposal for a new therapeutic mechanism to address these diseases. This novel approach revolves around the ability of specific nucleic acid fragments to form multiple types of NS.


Sujet(s)
Pertinence clinique , Séquences répétées en tandem , Humains , Séquences répétées en tandem/génétique , ADN/composition chimique , Séquence nucléotidique , Nucléotides
13.
Transl Psychiatry ; 13(1): 402, 2023 Dec 20.
Article de Anglais | MEDLINE | ID: mdl-38123544

RÉSUMÉ

Tandem repeats (TRs) are prevalent throughout the genome, constituting at least 3% of the genome, and often highly polymorphic. The high mutation rate of TRs, which can be orders of magnitude higher than single-nucleotide polymorphisms and indels, indicates that they are likely to make significant contributions to phenotypic variation, yet their contribution to schizophrenia has been largely ignored by recent genome-wide association studies (GWAS). Tandem repeat expansions are already known causative factors for over 50 disorders, while common tandem repeat variation is increasingly being identified as significantly associated with complex disease and gene regulation. The current review summarizes key background concepts of tandem repeat variation as pertains to disease risk, elucidating their potential for schizophrenia association. An overview of next-generation sequencing-based methods that may be applied for TR genome-wide identification is provided, and some key methodological challenges in TR analyses are delineated.


Sujet(s)
Étude d'association pangénomique , Schizophrénie , Humains , Schizophrénie/génétique , Génome humain , Séquences répétées en tandem/génétique , Polymorphisme de nucléotide simple
14.
Sci Adv ; 9(47): eadj1261, 2023 11 24.
Article de Anglais | MEDLINE | ID: mdl-37992162

RÉSUMÉ

The biological role of the repetitive DNA sequences in the human genome remains an outstanding question. Recent long-read human genome assemblies have allowed us to identify a function for one of these repetitive regions. We have uncovered a tandem array of conserved primate-specific retrogenes encoding the protein Elongin A3 (ELOA3), a homolog of the RNA polymerase II (RNAPII) elongation factor Elongin A (ELOA). Our genomic analysis shows that the ELOA3 gene cluster is conserved among primates and the number of ELOA3 gene repeats is variable in the human population and across primate species. Moreover, the gene cluster has undergone concerted evolution and homogenization within primates. Our biochemical studies show that ELOA3 functions as a promoter-associated RNAPII pause-release elongation factor with distinct biochemical and functional features from its ancestral homolog, ELOA. We propose that the ELOA3 gene cluster has evolved to fulfil a transcriptional regulatory function unique to the primate lineage that can be targeted to regulate cellular hyperproliferation.


Sujet(s)
Facteurs élongation chaîne peptidique , RNA polymerase II , Animaux , Humains , RNA polymerase II/génétique , RNA polymerase II/métabolisme , Facteurs élongation chaîne peptidique/génétique , Primates/génétique , Élongine/génétique , Famille multigénique , Séquences répétées en tandem/génétique
15.
Nat Commun ; 14(1): 6746, 2023 10 24.
Article de Anglais | MEDLINE | ID: mdl-37875492

RÉSUMÉ

De novo protein design methods can create proteins with folds not yet seen in nature. These methods largely focus on optimizing the compatibility between the designed sequence and the intended conformation, without explicit consideration of protein folding pathways. Deeply knotted proteins, whose topologies may introduce substantial barriers to folding, thus represent an interesting test case for protein design. Here we report our attempts to design proteins with trefoil (31) and pentafoil (51) knotted topologies. We extended previously described algorithms for tandem repeat protein design in order to construct deeply knotted backbones and matching designed repeat sequences (N = 3 repeats for the trefoil and N = 5 for the pentafoil). We confirmed the intended conformation for the trefoil design by X ray crystallography, and we report here on this protein's structure, stability, and folding behaviour. The pentafoil design misfolded into an asymmetric structure (despite a 5-fold symmetric sequence); two of the four repeat-repeat units matched the designed backbone while the other two diverged to form local contacts, leading to a trefoil rather than pentafoil knotted topology. Our results also provide insights into the folding of knotted proteins.


Sujet(s)
Pliage des protéines , Protéines , Conformation des protéines , Protéines/génétique , Protéines/composition chimique , Domaines protéiques , Séquences répétées en tandem/génétique
16.
Emerg Top Life Sci ; 7(3): 361-381, 2023 Dec 14.
Article de Anglais | MEDLINE | ID: mdl-37905568

RÉSUMÉ

Long-read sequencing platforms provide unparalleled access to the structure and composition of all classes of tandemly repeated DNA from STRs to satellite arrays. This review summarizes our current understanding of their organization within the human genome, their importance with respect to disease, as well as the advances and challenges in understanding their genetic diversity and functional effects. Novel computational methods are being developed to visualize and associate these complex patterns of human variation with disease, expression, and epigenetic differences. We predict accurate characterization of this repeat-rich form of human variation will become increasingly relevant to both basic and clinical human genetics.


Sujet(s)
ADN , Séquences répétées en tandem , Humains , Séquences répétées en tandem/génétique , Épigenèse génétique
17.
PLoS One ; 18(9): e0290890, 2023.
Article de Anglais | MEDLINE | ID: mdl-37729217

RÉSUMÉ

Protein regions consisting of arrays of tandem repeats are known to bind other molecular partners, including nucleic acid molecules. Although the interactions between repeat proteins and DNA are already widely explored, studies characterising tandem repeat RNA-binding proteins are lacking. We performed a large-scale analysis of human proteins devoted to expanding the knowledge about tandem repeat proteins experimentally reported as RNA-binding molecules. This work is timely because of the release of a full set of accurate structural models for the human proteome amenable to repeat detection using structural methods. The main goal of our analysis was to build a comprehensive set of human RNA-binding proteins that contain repeats at the sequence or structure level. Our results showed that the combination of sequence and structural methods finds significantly more tandem repeat proteins than either method alone. We identified 219 tandem repeat proteins that bind RNA molecules and characterised the overlap between repeat regions and RNA-binding regions as a first step towards assessing their functional relationship. We observed differences in the characteristics of repeat regions predicted by sequence-based or structure-based methods in terms of their sequence composition, their functions and their protein domains.


Sujet(s)
Savoir , Protéines de liaison à l'ARN , Humains , Maquettes de structure , Protéines de liaison à l'ARN/génétique , Séquences répétées en tandem/génétique , ARN/génétique
18.
BMC Ecol Evol ; 23(1): 55, 2023 09 26.
Article de Anglais | MEDLINE | ID: mdl-37749487

RÉSUMÉ

BACKGROUND: The sturgeon group has been economically significant worldwide due to caviar production. Sturgeons consist of 27 species in the world. Mitogenome data could be used to infer genetic diversity and investigate the evolutionary history of sturgeons. A limited number of complete mitogenomes in this family were sequenced. Here, we annotated the mitochondrial Huso huso genome, which revealed new aspects of this species. RESULTS: In this species, the mitochondrial genome consisted of 13 genes encoding proteins, 22tRNA and 2rRNA, and two non-coding regions that followed other vertebrates. In addition, H. huso had a pseudo-tRNA-Glu between ND6 and Cytb and a 52-nucleotide tandem repeat with two replications in 12S rRNA. This duplication event is probably related to the slipped strand during replication, which could remain in the strand due to mispairing during replication. Furthermore, an 82 bp repeat sequence with three replications was observed in the D-loop control region, which is usually visible in different species. Regulatory elements were also seen in the control region of the mitochondrial genome, which included termination sequences and conserved regulatory blocks. Genomic compounds showed the highest conservation in rRNA and tRNA, while protein-encoded genes and nonencoded regions had the highest divergence. The mitochondrial genome was phylogenetically assayed using 12 protein-encoding genes. CONCLUSIONS: In H. huso sequencing, we identified a distinct genome organization relative to other species that have never been reported. In recent years, along with the advancement in sequencing identified more genome rearrangements. However, it is an essential aspect of researching the evolution of the mitochondrial genome that needs to be recognized.


Sujet(s)
Génome mitochondrial , Animaux , Génome mitochondrial/génétique , Poissons/génétique , Séquences répétées en tandem/génétique , ARN de transfert/génétique
19.
J Struct Biol ; 215(4): 108023, 2023 12.
Article de Anglais | MEDLINE | ID: mdl-37652396

RÉSUMÉ

Tandem Repeat Proteins (TRPs) are a class of proteins with repetitive amino acid sequences that have been studied extensively for over two decades. Different features at the level of sequence, structure, function and evolution have been attributed to them by various authors. And yet many of its salient features appear only when looking at specific subclasses of protein tandem repeats. Here, we attempt to rationalize the existing knowledge on Tandem Repeat Proteins (TRPs) by pointing out several dichotomies. The emerging picture is more nuanced than generally assumed and allows us to draw some boundaries of what is not a "proper" TRP. We conclude with an operational definition of a specific subset, which we have denominated STRPs (Structural Tandem Repeat Proteins), which separates a subclass of tandem repeats with distinctive features from several other less well-defined types of repeats. We believe that this definition will help researchers in the field to better characterize the biological meaning of this large yet largely understudied group of proteins.


Sujet(s)
Protéines , Séquences répétées en tandem , Protéines/génétique , Protéines/composition chimique , Séquences répétées en tandem/génétique , Séquence d'acides aminés
20.
Nature ; 621(7978): 344-354, 2023 Sep.
Article de Anglais | MEDLINE | ID: mdl-37612512

RÉSUMÉ

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.


Sujet(s)
Chromosomes Y humains , Génomique , Analyse de séquence d'ADN , Humains , Séquence nucléotidique , Chromosomes Y humains/génétique , ADN satellite/génétique , Variation génétique/génétique , Génétique des populations , Génomique/méthodes , Génomique/normes , Hétérochromatine/génétique , Famille multigénique/génétique , Normes de référence , Duplications génomiques segmentaires/génétique , Analyse de séquence d'ADN/normes , Séquences répétées en tandem/génétique , Télomère/génétique
SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE
...