Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Cell ; 185(16): 3025-3040.e6, 2022 08 04.
Artigo em Inglês | MEDLINE | ID: mdl-35882231

RESUMO

Non-allelic recombination between homologous repetitive elements contributes to evolution and human genetic disorders. Here, we combine short- and long-DNA read sequencing of repeat elements with a new bioinformatics pipeline to show that somatic recombination of Alu and L1 elements is widespread in the human genome. Our analysis uncovers tissue-specific non-allelic homologous recombination hallmarks; moreover, we find that centromeres and cancer-associated genes are enriched for retroelements that may act as recombination hotspots. We compare recombination profiles in human-induced pluripotent stem cells and differentiated neurons and find that the neuron-specific recombination of repeat elements accompanies chromatin changes during cell-fate determination. Finally, we report that somatic recombination profiles are altered in Parkinson's and Alzheimer's disease, suggesting a link between retroelement recombination and genomic instability in neurodegeneration. This work highlights a significant contribution of the somatic recombination of repeat elements to genomic diversity in health and disease.


Assuntos
Genoma Humano , Retroelementos , Elementos Alu/genética , Recombinação Homóloga , Humanos , Elementos Nucleotídeos Longos e Dispersos , Sequências Repetitivas de Ácido Nucleico
2.
Hum Mol Genet ; 30(7): 552-563, 2021 05 12.
Artigo em Inglês | MEDLINE | ID: mdl-33693705

RESUMO

Facioscapulohumeral muscular dystrophy (FSHD) is an inherited muscle disease caused by misexpression of the DUX4 gene in skeletal muscle. DUX4 is a transcription factor, which is normally expressed in the cleavage-stage embryo and regulates gene expression involved in early embryonic development. Recent studies revealed that DUX4 also activates the transcription of repetitive elements such as endogenous retroviruses (ERVs), mammalian apparent long terminal repeat (LTR)-retrotransposons and pericentromeric satellite repeats (Human Satellite II). DUX4-bound ERV sequences also create alternative promoters for genes or long non-coding RNAs, producing fusion transcripts. To further understand transcriptional regulation by DUX4, we performed nanopore long-read direct RNA sequencing (dRNA-seq) of human muscle cells induced by DUX4, because long reads show whole isoforms with greater confidence. We successfully detected differential expression of known DUX4-induced genes and discovered 61 differentially expressed repeat loci, which are near DUX4-ChIP peaks. We also identified 247 gene-ERV fusion transcripts, of which 216 were not reported previously. In addition, long-read dRNA-seq clearly shows that RNA splicing is a common event in DUX4-activated ERV transcripts. Long-read analysis showed non-LTR transposons including Alu elements are also transcribed from LTRs. Our findings revealed further complexity of DUX4-induced ERV transcripts. This catalogue of DUX4-activated repetitive elements may provide useful information to elucidate the pathology of FSHD. Also, our results indicate that nanopore dRNA-seq has complementary strengths to conventional short-read complementary DNA sequencing.


Assuntos
Proteínas de Homeodomínio/genética , Músculo Esquelético/metabolismo , Distrofia Muscular Facioescapuloumeral/genética , Nanoporos , Sequências Repetitivas de Ácido Nucleico/genética , Análise de Sequência de RNA/métodos , Linhagem Celular Tumoral , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Humanos , Células Musculares/metabolismo , Distrofia Muscular Facioescapuloumeral/patologia , Isoformas de Proteínas/genética , Isoformas de RNA/genética , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Análise de Sequência de RNA/estatística & dados numéricos
3.
Nucleic Acids Res ; 49(6): 3139-3155, 2021 04 06.
Artigo em Inglês | MEDLINE | ID: mdl-33693858

RESUMO

Minimal absent words (MAWs) are minimal-length oligomers absent from a genome or proteome. Although some artificially synthesized MAWs have deleterious effects, there is still a lack of a strategy for the classification of non-occurring sequences as potentially malicious or benign. In this work, by using Markovian models with multiple-testing correction, we reveal significant absent oligomers, which are statistically expected to exist. This suggests that their absence is due to negative selection. We survey genomes and proteomes covering the diversity of life and find thousands of significant absent sequences. Common significant MAWs are often mono- or dinucleotide tracts, or palindromic. Significant viral MAWs are often restriction sites and may indicate unknown restriction motifs. Surprisingly, significant mammal genome MAWs are often present, but rare, in other mammals, suggesting that they are suppressed but not completely forbidden. Significant human MAWs are frequently present in prokaryotes, suggesting immune function, but rarely present in human viruses, indicating viral mimicry of the host. More than one-fourth of human proteins are one substitution away from containing a significant MAW, with the majority of replacements being predicted harmful. We provide a web-based, interactive database of significant MAWs across genomes and proteomes.


Assuntos
Bases de Dados Genéticas , Genômica/métodos , Proteômica/métodos , Animais , Genoma , Humanos , Cadeias de Markov , Mutação , Peptídeos/química , Proteoma , Software , Vírus/genética
4.
Nat Genet ; 51(8): 1215-1221, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31332381

RESUMO

Neuronal intranuclear inclusion disease (NIID) is a progressive neurodegenerative disease that is characterized by eosinophilic hyaline intranuclear inclusions in neuronal and somatic cells. The wide range of clinical manifestations in NIID makes ante-mortem diagnosis difficult1-8, but skin biopsy enables its ante-mortem diagnosis9-12. The average onset age is 59.7 years among approximately 140 NIID cases consisting of mostly sporadic and several familial cases. By linkage mapping of a large NIID family with several affected members (Family 1), we identified a 58.1 Mb linked region at 1p22.1-q21.3 with a maximum logarithm of the odds score of 4.21. By long-read sequencing, we identified a GGC repeat expansion in the 5' region of NOTCH2NLC (Notch 2 N-terminal like C) in all affected family members. Furthermore, we found similar expansions in 8 unrelated families with NIID and 40 sporadic NIID cases. We observed abnormal anti-sense transcripts in fibroblasts specifically from patients but not unaffected individuals. This work shows that repeat expansion in human-specific NOTCH2NLC, a gene that evolved by segmental duplication, causes a human disease.


Assuntos
Encéfalo/patologia , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Desequilíbrio de Ligação , Doenças Neurodegenerativas/genética , Doenças Neurodegenerativas/patologia , Receptores Notch/genética , Expansão das Repetições de Trinucleotídeos/genética , Adolescente , Adulto , Idoso , Encéfalo/metabolismo , Estudos de Casos e Controles , Feminino , Marcadores Genéticos/genética , Humanos , Corpos de Inclusão Intranuclear/genética , Corpos de Inclusão Intranuclear/patologia , Masculino , Pessoa de Meia-Idade , Linhagem , Receptores Notch/metabolismo , Adulto Jovem
5.
DNA Res ; 26(1): 55-65, 2019 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-30462165

RESUMO

The current RNA-Seq method analyses fragments of mRNAs, from which it is occasionally difficult to reconstruct the entire transcript structure. Here, we performed and evaluated the recent procedure for full-length cDNA sequencing using the Nanopore sequencer MinION. We applied MinION RNA-Seq for various applications, which would not always be easy using the usual RNA-Seq by Illumina. First, we examined and found that even though the sequencing accuracy was still limited to 92.3%, practically useful RNA-Seq analysis is possible. Particularly, taking advantage of the long-read nature of MinION, we demonstrate the identification of splicing patterns and their combinations as a form of full-length cDNAs without losing precise information concerning their expression levels. Transcripts of fusion genes in cancer cells can also be identified and characterized. Furthermore, the full-length cDNA information can be used for phasing of the SNPs detected by WES on the transcripts, providing essential information to identify allele-specific transcriptional events. We constructed a catalogue of full-length cDNAs in seven major organs for two particular individuals and identified allele-specific transcription and splicing. Finally, we demonstrate that single-cell sequencing is also possible. RNA-Seq on the MinION platform should provide a novel approach that is complementary to the current RNA-Seq.


Assuntos
Alelos , Perfilação da Expressão Gênica/métodos , Splicing de RNA , Análise de Sequência de RNA/métodos , DNA Complementar , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Polimorfismo de Nucleotídeo Único
6.
DNA Res ; 24(6): 585-596, 2017 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-29117310

RESUMO

Here, we employed cDNA amplicon sequencing using a long-read portable sequencer, MinION, to characterize various types of mutations in cancer-related genes, namely, EGFR, KRAS, NRAS and NF1. For homozygous SNVs, the precision and recall rates were 87.5% and 91.3%, respectively. For previously reported hotspot mutations, the precision and recall rates reached 100%. The precise junctions of EML4-ALK, CCDC6-RET and five other gene fusions were also detected. Taking advantages of long-read sequencing, we conducted phasing of EGFR mutations and elucidated the mutational allelic backgrounds of anti-tumor drug-sensitive and resistant mutations, which could provide useful information for selecting therapeutic approaches. In the H1975 cells, 72% of the reads harbored both L858R and T790M mutations, and 22% of the reads harbored neither mutation. To ensure that the clinical requirements can be met in potentially low cancer cell populations, we further conducted a serial dilution analysis of the template for EGFR mutations. Several percent of the mutant alleles could be detected depending on the yield and quality of the sequencing data. Finally, we characterized the mutation genotypes in eight clinical samples. This method could be a convenient long-read sequencing-based analytical approach and thus may change the current approaches used for cancer genome sequencing.


Assuntos
Adenocarcinoma/genética , Receptores ErbB/genética , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neoplasias Pulmonares/genética , Mutação , Análise de Sequência de DNA/métodos , Biomarcadores Tumorais/genética , Humanos
7.
Genome Res ; 18(1): 1-12, 2008 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18032727

RESUMO

Genome-wide detection of transcription start sites (TSSs) has revealed that RNA Polymerase II transcription initiates at millions of positions in mammalian genomes. Most core promoters do not have a single TSS, but an array of closely located TSSs with different rates of initiation. As a rule, genes have more than one such core promoter; however, defining the boundaries between core promoters is not trivial. These discoveries prompt a re-evaluation of our models for transcription initiation. We describe a new framework for understanding the organization of transcription initiation. We show that initiation events are clustered on the chromosomes at multiple scales-clusters within clusters-indicating multiple regulatory processes. Within the smallest of such clusters, which can be interpreted as core promoters, the local DNA sequence predicts the relative transcription start usage of each nucleotide with a remarkable 91% accuracy, implying the existence of a DNA code that determines TSS selection. Conversely, the total expression strength of such clusters is only partially determined by the local DNA sequence. Thus, the overall control of transcription can be understood as a combination of large- and small-scale effects; the selection of transcription start sites is largely governed by the local DNA sequence, whereas the transcriptional activity of a locus is regulated at a different level; it is affected by distal features or events such as enhancers and chromatin remodeling.


Assuntos
Montagem e Desmontagem da Cromatina/fisiologia , Cromossomos Humanos/fisiologia , Genoma Humano/fisiologia , Regiões Promotoras Genéticas/fisiologia , RNA Polimerase II/fisiologia , Transcrição Gênica/fisiologia , Animais , Linhagem Celular Tumoral , Bases de Dados Genéticas , Humanos , Cadeias de Markov , Locos de Características Quantitativas/fisiologia , Análise de Sequência de DNA
8.
Pigment Cell Res ; 20(3): 201-9, 2007 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-17516927

RESUMO

As part of the RIKEN mouse encyclopedia project, two cDNA libraries were prepared from melanocyte-derived cell lines, using techniques of full-length clone selection and subtraction/normalization to enrich for rare transcripts. End sequencing showed that these libraries display over 83% complete coding sequence at the 5' end and 96-97% complete coding sequence at the 3' end. Evaluation of the libraries, derived from B16F10Y tumor cells and melan-c cells, revealed that they contain clones for a majority of the genes previously demonstrated to function in melanocyte biology. Analysis of genomic locations for transcripts revealed that the distribution of melanocyte genes is non-random throughout the genome. Three genomic regions identified that showed significant clustering of melanocyte-expressed genes contain one or more genes previously shown to regulate melanocyte development or function. A catalog of genes expressed in these libraries is presented, providing a valuable resource of cDNA clones and sequence information that can be used for identification of new genes important for melanocyte development, function, and disease.


Assuntos
Biologia Computacional/métodos , Biblioteca Gênica , Técnicas Genéticas , Melanócitos/metabolismo , Animais , Mapeamento Cromossômico , Regulação da Expressão Gênica , Genoma , Genômica/métodos , Melanoma Experimental/metabolismo , Camundongos , Modelos Biológicos
9.
Brain Res ; 1000(1-2): 156-73, 2004 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-15053963

RESUMO

The adenosine A(2A) receptor (A(2A)R) is abundantly expressed in brain and emerging as an important therapeutic target for Parkinson's disease and potentially other neuropsychiatric disorders. To understand the molecular mechanisms of A(2A)R gene expression, we have characterized the genomic organization of the mouse and human A(2A)R genes by molecular and bioinformatic analyses. Three new exons (m1A, m1B and m1C) encoding the 5' untranslated regions (5'-UTRs) of mouse A(2A)R mRNA were identified by rapid amplification of 5' cDNA end (5' RACE), RT-PCR analysis and genome sequence analyses. Similar bioinformatics analysis also suggested six variants of the non-coding "exon 1" (h1A, h1B, h1C, h1D, h1E and h1F) in the human A(2A)R gene, which were confirmed by RT-PCR analysis, while three of the human exon 1 variants (h1D, h1E and h1F) were likewise verified by 5' oligonucleotide capping analysis suggesting multiple transcription start sites. Importantly, RT-PCR and quantitative PCR analysis demonstrated that the A(2A)R transcripts with different exon 1 variants displayed tissue-specific expression patterns. For instance, the mouse exon m1A mRNA was detected only in brain (specifically striatum) and the human exon h1D mRNA in lymphoreticular system. Furthermore, the determination of the three new transcription start sites of human A(2A)R gene by 5' oligonucleotide capping and bioinformatics analyses led to the identification of three corresponding promoter regions which contain several important cis elements, providing additional target for further molecular dissection of A(2A)R gene expression. Finally, our analysis indicates that A(2A)R mRNA and a novel transcript partially overlapping with the 3' exon h3, but in opposite orientation to the A(2A)R gene, could conceivably form duplexes to mutually regulate transcript expression. Thus, combined molecular and bioinformatics analyses revealed a new A(2A)R genomic structure, with conserved coding exons 2 and 3 and divergent, tissue-specific exon 1 variants encoding for 5'-UTR. This raises the possibility of generating multiple tissue-specific A(2A)R mRNA species by alternative promoters with varying regulatory susceptibility.


Assuntos
Clonagem Molecular/métodos , Éxons/genética , Receptor A2A de Adenosina/química , Receptor A2A de Adenosina/genética , Análise de Sequência de DNA/métodos , Animais , Sequência de Bases , Biologia Computacional/métodos , Regulação da Expressão Gênica/fisiologia , Variação Genética , Biblioteca Genômica , Humanos , Masculino , Camundongos , Dados de Sequência Molecular , Ratos , Receptor A2A de Adenosina/biossíntese , Receptor A2A de Adenosina/isolamento & purificação , Reação em Cadeia da Polimerase Via Transcriptase Reversa
10.
Mol Endocrinol ; 18(8): 1859-75, 2004 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-15031323

RESUMO

Estrogen influences the physiology of many target tissues in both women and men. The long-term effects of estrogen are mediated predominantly by nuclear estrogen receptors (ERs) functioning as DNA-binding transcription factors. Tissue-specific responses to estrogen therefore result from regulation of different sets of genes. However, it remains perplexing as to what regulatory sequence contexts specify distinct genomic responses. First, this review classifies estrogen response sequences in mammalian target genes. Of note, around one third of known human target genes associate only indirectly with ER, through intermediary transcription factor(s). Then, computational approaches are presented both for refining direct ER-binding sites and for formulating hypotheses regarding the overall genomic expression pattern. Surprisingly, limited evolutionary conservation of specific estrogen-responsive sites is observed between human and mouse. Finally, consideration of the cellular functions of regulated human genes suggests links between particular biological roles and specific types of estrogen response elements, although with the important caveat that only a restricted set of target genes is available. These analyses support the view that specific, hormone-driven gene expression programs can result from the interplay of environmental and cellular cues with the distinct types of estrogen-response sequences.


Assuntos
Núcleo Celular/genética , Núcleo Celular/metabolismo , Receptores de Estrogênio/metabolismo , Elementos de Resposta/genética , Animais , Sequência de Bases , Estrogênios/metabolismo , Estrogênios/farmacologia , Regulação da Expressão Gênica/efeitos dos fármacos , Regulação da Expressão Gênica/genética , Genômica , Humanos , Receptores de Estrogênio/química , Receptores de Estrogênio/genética
11.
Nucleic Acids Res ; 31(13): 3510-7, 2003 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-12824356

RESUMO

Theatre is a web-based computing system designed for the comparative analysis of genomic sequences, especially with respect to motifs likely to be involved in the regulation of gene expression. Theatre is an interface to commonly used sequence analysis tools and biological sequence databases to determine or predict the positions of coding regions, repetitive sequences and transcription factor binding sites in families of DNA sequences. The information is displayed in a manner that can be easily understood and can reveal patterns that might not otherwise have been noticed. In addition to web-based output, Theatre can produce publication quality colour hardcopies showing predicted features in aligned genomic sequences. A case study using the p53 promoter region of four mammalian species and two fish species is described. Unlike the mammalian sequences the promoter regions in fish have not been previously predicted or characterized and we report the differences in the p53 promoter region of four mammals and that predicted for two fish species. Theatre can be accessed at http://www.hgmp.mrc.ac.uk/Registered/Webapp/theatre/.


Assuntos
Genômica/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Animais , Sequência de Bases , Sítios de Ligação , Gráficos por Computador , Cricetinae , Peixes/genética , Componentes do Gene , Regulação da Expressão Gênica , Genes p53 , Humanos , Internet , Camundongos , Dados de Sequência Molecular , Regiões Promotoras Genéticas , Ratos , Sequências Repetitivas de Ácido Nucleico , Fatores de Transcrição/metabolismo , Interface Usuário-Computador
12.
BMC Bioinformatics ; 4: 1, 2003 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-12513700

RESUMO

BACKGROUND: Many readers will sympathize with the following story. You are viewing a gene sequence in Entrez, and you want to find whether it contains a particular sequence motif. You reach for the browser's "find in page" button, but those darn spaces every 10 bp get in the way. And what if the motif is on the opposite strand? Subsequently, your favorite sequence analysis software informs you that there is an interesting feature at position 13982-14013. By painstakingly counting the 10 bp blocks, you are able to examine the sequence at this location. But now you want to see what other features have been annotated close by, and this information is buried several screenfuls higher up the web page. RESULTS: SeqVISTA presents a holistic, graphical view of features annotated on nucleotide or protein sequences. This interactive tool highlights the residues in the sequence that correspond to features chosen by the user, and allows easy searching for sequence motifs or extraction of particular subsequences. SeqVISTA is able to display results from diverse sequence analysis tools in an integrated fashion, and aims to provide much-needed unity to the bioinformatics resources scattered around the Internet. Our viewer may be launched on a GenBank record by a single click of a button installed in the web browser. CONCLUSION: SeqVISTA allows insights to be gained by viewing the totality of sequence annotations and predictions, which may be more revealing than the sum of their parts. SeqVISTA runs on any operating system with a Java 1.4 virtual machine. It is freely available to academic users at http://zlab.bu.edu/SeqVISTA.


Assuntos
Gráficos por Computador , Software , Sequência de Aminoácidos , Sequência de Bases , Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos/tendências , Genoma Viral , Humanos , Dados de Sequência Molecular , Alinhamento de Sequência/métodos , Vírus 40 dos Símios/genética , Software/classificação , Software/normas , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA