Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
PeerJ ; 12: e17025, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38464746

RESUMEN

Insects are a highly diverse phylogeny and possess a wide variety of traits, including the presence or absence of wings and metamorphosis. These diverse traits are of great interest for studying genome evolution, and numerous comparative genomic studies have examined a wide phylogenetic range of insects. Here, we analyzed 22 insects belonging to a wide phylogenetic range (Endopterygota, Paraneoptera, Polyneoptera, Palaeoptera, and other insects) by using a batch-learning self-organizing map (BLSOM) for oligonucleotide compositions in their genomic fragments (100-kb or 1-Mb sequences), which is an unsupervised machine learning algorithm that can extract species-specific characteristics of the oligonucleotide compositions (genome signatures). The genome signature is of particular interest in terms of the mechanisms and biological significance that have caused the species-specific difference, and can be used as a powerful search needle to explore the various roles of genome sequences other than protein coding, and can be used to unveil mysteries hidden in the genome sequence. Since BLSOM is an unsupervised clustering method, the clustering of sequences was performed based on the oligonucleotide composition alone, without providing information about the species from which each fragment sequence was derived. Therefore, not only the interspecies separation, but also the intraspecies separation can be achieved. Here, we have revealed the specific genomic regions with oligonucleotide compositions distinct from the usual sequences of each insect genome, e.g., Mb-level structures found for a grasshopper Schistocerca americana. One aim of this study was to compare the genome characteristics of insects with those of vertebrates, especially humans, which are phylogenetically distant from insects. Recently, humans seem to be the "model organism" for which a large amount of information has been accumulated using a variety of cutting-edge and high-throughput technologies. Therefore, it is reasonable to use the abundant information from humans to study insect lineages. The specific regions of Mb length with distinct oligonucleotide compositions have also been previously observed in the human genome. These regions were enriched by transcription factor binding motifs (TFBSs) and hypothesized to be involved in the three-dimensional arrangement of chromosomal DNA in interphase nuclei. The present study characterized the species-specific oligonucleotide compositions (i.e., genome signatures) in insect genomes and identified specific genomic regions with distinct oligonucleotide compositions.


Asunto(s)
Genoma Humano , Genoma de los Insectos , Animales , Humanos , Filogenia , Genoma de los Insectos/genética , Oligonucleótidos/genética , Inteligencia Artificial
2.
PLoS One ; 17(8): e0273860, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36044525

RESUMEN

Among mutations that occur in SARS-CoV-2, efficient identification of mutations advantageous for viral replication and transmission is important to characterize and defeat this rampant virus. Mutations rapidly expanding frequency in a viral population are candidates for advantageous mutations, but neutral mutations hitchhiking with advantageous mutations are also likely to be included. To distinguish these, we focus on mutations that appear to occur independently in different lineages and expand in frequency in a convergent evolutionary manner. Batch-learning SOM (BLSOM) can separate SARS-CoV-2 genome sequences according by lineage from only providing the oligonucleotide composition. Focusing on remarkably expanding 20-mers, each of which is only represented by one copy in the viral genome, allows us to correlate the expanding 20-mers to mutations. Using visualization functions in BLSOM, we can efficiently identify mutations that have expanded remarkably both in the Omicron lineage, which is phylogenetically distinct from other lineages, and in other lineages. Most of these mutations involved changes in amino acids, but there were a few that did not, such as an intergenic mutation.


Asunto(s)
COVID-19 , Mutación , Oligonucleótidos , SARS-CoV-2 , Inteligencia Artificial , COVID-19/genética , Genoma Viral , Humanos , Aprendizaje Automático , Oligonucleótidos/genética , Filogenia , SARS-CoV-2/genética , Glicoproteína de la Espiga del Coronavirus/genética
3.
BMC Genomics ; 23(1): 497, 2022 Jul 08.
Artículo en Inglés | MEDLINE | ID: mdl-35804296

RESUMEN

BACKGROUND: Emerging infectious disease-causing RNA viruses, such as the SARS-CoV-2 and Ebola viruses, are thought to rely on bats as natural reservoir hosts. Since these zoonotic viruses pose a great threat to humans, it is important to characterize the bat genome from multiple perspectives. Unsupervised machine learning methods for extracting novel information from big sequence data without prior knowledge or particular models are highly desirable for obtaining unexpected insights. We previously established a batch-learning self-organizing map (BLSOM) of the oligonucleotide composition that reveals novel genome characteristics from big sequence data. RESULTS: In this study, using the oligonucleotide BLSOM, we conducted a comparative genomic study of humans and six bat species. BLSOM is an explainable-type machine learning algorithm that reveals the diagnostic oligonucleotides contributing to sequence clustering (self-organization). When unsupervised machine learning reveals unexpected and/or characteristic features, these features can be studied in more detail via the much simpler and more direct standard distribution map method. Based on this combined strategy, we identified the Mb-level enrichment of CG dinucleotide (Mb-level CpG islands) around the termini of bat long-scaffold sequences. In addition, a class of CG-containing oligonucleotides were enriched in the centromeric and pericentromeric regions of human chromosomes. Oligonucleotides longer than tetranucleotides often represent binding motifs for a wide variety of proteins (e.g., transcription factor binding sequences (TFBSs)). By analyzing the penta- and hexanucleotide composition, we observed the evident enrichment of a wide range of hexanucleotide TFBSs in centromeric and pericentromeric heterochromatin regions on all human chromosomes. CONCLUSION: Function of transcription factors (TFs) beyond their known regulation of gene expression (e.g., TF-mediated looping interactions between two different genomic regions) has received wide attention. The Mb-level TFBS and CpG islands are thought to be involved in the large-scale nuclear organization, such as centromere and telomere clustering. TFBSs, which are enriched in centromeric and pericentromeric heterochromatin regions, are thought to play an important role in the formation of nuclear 3D structures. Our machine learning-based analysis will help us to understand the differential features of nuclear 3D structures in the human and bat genomes.


Asunto(s)
COVID-19 , Quirópteros/genética , Genoma Humano/genética , SARS-CoV-2/fisiología , Animales , COVID-19/transmisión , Quirópteros/virología , Islas de CpG , Genómica/métodos , Heterocromatina/química , Heterocromatina/genética , Humanos , Conformación Molecular , Oligonucleótidos/química , Aprendizaje Automático no Supervisado
4.
BMC Microbiol ; 22(1): 73, 2022 03 10.
Artículo en Inglés | MEDLINE | ID: mdl-35272618

RESUMEN

BACKGROUND: Unsupervised AI (artificial intelligence) can obtain novel knowledge from big data without particular models or prior knowledge and is highly desirable for unveiling hidden features in big data. SARS-CoV-2 poses a serious threat to public health and one important issue in characterizing this fast-evolving virus is to elucidate various aspects of their genome sequence changes. We previously established unsupervised AI, a BLSOM (batch-learning SOM), which can analyze five million genomic sequences simultaneously. The present study applied the BLSOM to the oligonucleotide compositions of forty thousand SARS-CoV-2 genomes. RESULTS: While only the oligonucleotide composition was given, the obtained clusters of genomes corresponded primarily to known main clades and internal divisions in the main clades. Since the BLSOM is explainable AI, it reveals which features of the oligonucleotide composition are responsible for clade clustering. Additionally, BLSOM also provided information concerning the special genomic region possibly undergoing RNA modifications. CONCLUSIONS: The BLSOM has powerful image display capabilities and enables efficient knowledge discovery about viral evolutionary processes, and it can complement phylogenetic methods based on sequence alignment.


Asunto(s)
COVID-19 , SARS-CoV-2 , Inteligencia Artificial , Evolución Molecular , Humanos , Filogenia , SARS-CoV-2/genética
5.
Genes Genet Syst ; 96(4): 165-176, 2021 Dec 16.
Artículo en Inglés | MEDLINE | ID: mdl-34565757

RESUMEN

In genetics and related fields, huge amounts of data, such as genome sequences, are accumulating, and the use of artificial intelligence (AI) suitable for big data analysis has become increasingly important. Unsupervised AI that can reveal novel knowledge from big data without prior knowledge or particular models is highly desirable for analyses of genome sequences, particularly for obtaining unexpected insights. We have developed a batch-learning self-organizing map (BLSOM) for oligonucleotide compositions that can reveal various novel genome characteristics. Here, we explain the data mining by the BLSOM: an unsupervised AI. As a specific target, we first selected SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) because a large number of viral genome sequences have been accumulated via worldwide efforts. We analyzed more than 0.6 million sequences collected primarily in the first year of the pandemic. BLSOMs for short oligonucleotides (e.g., 4-6-mers) allowed separation into known clades, but longer oligonucleotides further increased the separation ability and revealed subgrouping within known clades. In the case of 15-mers, there is mostly one copy in the genome; thus, 15-mers that appeared after the epidemic started could be connected to mutations, and the BLSOM for 15-mers revealed the mutations that contributed to separation into known clades and their subgroups. After introducing the detailed methodological strategies, we explain BLSOMs for various topics, such as the tetranucleotide BLSOM for over 5 million 5-kb fragment sequences derived from almost all microorganisms currently available and its use in metagenome studies. We also explain BLSOMs for various eukaryotes, including fishes, frogs and Drosophila species, and found a high separation ability among closely related species. When analyzing the human genome, we found enrichments in transcription factor-binding sequences in centromeric and pericentromeric heterochromatin regions. The tDNAs (tRNA genes) could be separated according to their corresponding amino acid.


Asunto(s)
Inteligencia Artificial , Biología Computacional/métodos , Genoma Humano , Genoma Viral , SARS-CoV-2/genética , Análisis por Conglomerados , Uso de Codones , Humanos , Metagenómica/métodos , Mutación , ARN de Transferencia , Factores de Tiempo
6.
Life Sci Alliance ; 4(5)2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-33712508

RESUMEN

The Japanese wrinkled frog (Glandirana rugosa) is unique in having both XX-XY and ZZ-ZW types of sex chromosomes within the species. The genome sequencing and comparative genomics with other frogs should be important to understand mechanisms of turnover of sex chromosomes within one species or during a short period. In this study, we analyzed the newly sequenced genome of G. rugosa using a batch-learning self-organizing map which is unsupervised artificial intelligence for oligonucleotide compositions. To clarify genome characteristics of G. rugosa, we compared its short oligonucleotide compositions in all 1-Mb genomic fragments with those of other six frog species (Pyxicephalus adspersus, Rhinella marina, Spea multiplicata, Leptobrachium leishanense, Xenopus laevis, and Xenopus tropicalis). In G. rugosa, we found an Mb-level large size of repeat sequences having a high identity with the W chromosome of the African bullfrog (P. adspersus). Our study concluded that G. rugosa has unique genome characteristics with a high CG frequency, and its genome is assumed to heterochromatinize a large size of genome via methylataion of CG.


Asunto(s)
Composición de Base/genética , Ranidae/genética , Cromosomas Sexuales/genética , Animales , Secuencia de Bases/genética , Femenino , Genómica/métodos , Masculino , Filogenia , Aprendizaje Automático no Supervisado
7.
Gene X ; 5: 100038, 2020 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-32835214

RESUMEN

We first conducted time-series analysis of mono- and dinucleotide composition for over 10,000 SARS-CoV-2 genomes, as well as over 1500 Zaire ebolavirus genomes, and found clear time-series changes in the compositions on a monthly basis, which should reflect viral adaptations for efficient growth in human cells. We next developed a sequence alignment free method that extensively searches for advantageous mutations and rank them in an increase level for their intrapopulation frequency. Time-series analysis of occurrences of oligonucleotides of diverse lengths for SARS-CoV-2 genomes revealed seven distinctive mutations that rapidly expanded their intrapopulation frequency and are thought to be candidates of advantageous mutations for the efficient growth in human cells.

8.
Genes Genet Syst ; 95(1): 29-41, 2020 Apr 22.
Artículo en Inglés | MEDLINE | ID: mdl-32161227

RESUMEN

Unsupervised machine learning that can discover novel knowledge from big sequence data without prior knowledge or particular models is highly desirable for current genome study. We previously established a batch-learning self-organizing map (BLSOM) for oligonucleotide compositions, which can reveal various novel genome characteristics from big sequence data, and found that transcription factor binding sequences (TFBSs) and CpG-containing oligonucleotides are enriched in human centromeric and pericentromeric regions, which support centromere clustering and form the condensed heterochromatin "chromocenter" in interphase nuclei. The number and size of chromocenters, as well as the type of centromeres gathered in individual chromocenters, vary depending on cell type. To study molecular mechanisms of cell type-dependent chromocenter formation, we analyzed distribution patterns of occurrence per Mb of hexa- and heptanucleotide TFBSs, which have been compiled by the SwissRegulon Portal, and of CpG-containing oligonucleotides. We found Mb-level islands enriched for TFBSs and CpG-containing oligonucleotides in centromeric and pericentromeric regions on all human chromosomes except chrY. Considering molecular mechanisms for cell type-dependent centromere clustering, the chromosome-dependent enrichment of a set of TFBSs and CpG-containing oligonucleotides is of particular interest, since the cellular content of TFs and methyl-CpG-binding proteins exhibits cell type-dependent regulation. A newly introduced BLSOM, which analyzed occurrences of a total of 3,946 octanucleotide TFBSs compiled by the SwissRegulon Portal, has self-organized (separated) the sequences that are characteristically enriched in TFBSs and shown that these sequences are derived primarily from centromeric and pericentromeric constitutive heterochromatin regions. Furthermore, the BLSOM identified and visualized characteristic TFBSs that are enriched in these regions. By analyzing Hi-C data for interchromosomal interactions, the present study showed that the chromatin segments supporting the interchromosomal interactions locate primarily in Mb-level TFBS and CpG islands and are thus enriched for a wide variety of TFBSs and CG-containing oligonucleotides.


Asunto(s)
Inteligencia Artificial , Cromosomas Humanos/genética , Islas de CpG/genética , Genoma Humano/genética , Sitios de Unión , Centrómero/genética , Heterocromatina/genética , Humanos , Oligonucleótidos/genética , Unión Proteica , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
9.
Gene ; 763S: 100038, 2020 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-34493367

RESUMEN

We first conducted time-series analysis of mono- and dinucleotide composition for over 10,000 SARS-CoV-2 genomes, as well as over 1500 Zaire ebolavirus genomes, and found clear time-series changes in the compositions on a monthly basis, which should reflect viral adaptations for efficient growth in human cells. We next developed a sequence alignment free method that extensively searches for advantageous mutations and rank them in an increase level for their intrapopulation frequency. Time-series analysis of occurrences of oligonucleotides of diverse lengths for SARS-CoV-2 genomes revealed seven distinctive mutations that rapidly expanded their intrapopulation frequency and are thought to be candidates of advantageous mutations for the efficient growth in human cells.


Asunto(s)
COVID-19/genética , Genoma Viral/genética , ARN Viral/genética , SARS-CoV-2/genética , COVID-19/patología , Humanos , Mutación/genética , Oligonucleótidos/genética , SARS-CoV-2/patogenicidad , Alineación de Secuencia
10.
Genes Genet Syst ; 92(1): 43-54, 2017 Sep 12.
Artículo en Inglés | MEDLINE | ID: mdl-28344190

RESUMEN

Unsupervised data mining capable of extracting a wide range of knowledge from big data without prior knowledge or particular models is a timely application in the era of big sequence data accumulation in genome research. By handling oligonucleotide compositions as high-dimensional data, we have previously modified the conventional self-organizing map (SOM) for genome informatics and established BLSOM, which can analyze more than ten million sequences simultaneously. Here, we develop BLSOM specialized for tRNA genes (tDNAs) that can cluster (self-organize) more than one million microbial tDNAs according to their cognate amino acid solely depending on tetra- and pentanucleotide compositions. This unsupervised clustering can reveal combinatorial oligonucleotide motifs that are responsible for the amino acid-dependent clustering, as well as other functionally and structurally important consensus motifs, which have been evolutionarily conserved. BLSOM is also useful for identifying tDNAs as phylogenetic markers for special phylotypes. When we constructed BLSOM with 'species-unknown' tDNAs from metagenomic sequences plus 'species-known' microbial tDNAs, a large portion of metagenomic tDNAs self-organized with species-known tDNAs, yielding information on microbial communities in environmental samples. BLSOM can also enhance accuracy in the tDNA database obtained from big sequence data. This unsupervised data mining should become important for studying numerous functionally unclear RNAs obtained from a wide range of organisms.


Asunto(s)
Inteligencia Artificial , Genómica/métodos , ARN de Transferencia/genética , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Animales , Humanos
11.
Sci Rep ; 6: 36197, 2016 11 03.
Artículo en Inglés | MEDLINE | ID: mdl-27808119

RESUMEN

Ebolavirus, MERS coronavirus and influenza virus are zoonotic RNA viruses, which mutate very rapidly. Viral growth depends on many host factors, but human cells may not provide the ideal growth conditions for viruses invading from nonhuman hosts. The present time-series analyses of short and long oligonucleotide compositions in these genomes showed directional changes in their composition after invasion from a nonhuman host, which are thought to recur after future invasions. In the recent West Africa Ebola outbreak, directional time-series changes in a wide range of oligonucleotides were observed in common for three geographic areas, and the directional changes were observed also for the recent MERS coronavirus epidemics starting in the Middle East. In addition, common directional changes in human influenza A viruses were observed for three subtypes, whose epidemics started independently. Long oligonucleotides that showed an evident directional change observed in common for the three subtypes corresponded to some of influenza A siRNAs, whose activities have been experimentally proven. Predicting directional and reoccurring changes in oligonucleotide composition should become important for designing diagnostic RT-PCR primers and therapeutic oligonucleotides with long effectiveness.


Asunto(s)
Genoma Viral , Zoonosis/virología , Animales , Secuencia de Bases , Ebolavirus/genética , Humanos , Coronavirus del Síndrome Respiratorio de Oriente Medio/genética , Oligonucleótidos/genética , Orthomyxoviridae/genética , Factores de Tiempo
12.
Genes Genet Syst ; 90(1): 43-53, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26119665

RESUMEN

Unsupervised data mining capable of extracting a wide range of information from big sequence data without prior knowledge or particular models is highly desirable in an era of big data accumulation for research on genes, genomes and genetic systems. By handling oligonucleotide compositions in genomic sequences as high-dimensional data, we have previously modified the conventional SOM (self-organizing map) for genome informatics and established BLSOM for oligonucleotide composition, which can analyze more than ten million sequences simultaneously and is thus suitable for big data analyses. Oligonucleotides often represent motif sequences responsible for sequence-specific binding of proteins such as transcription factors. The distribution of such functionally important oligonucleotides is probably biased in genomic sequences, and may differ among genomic regions. When constructing BLSOMs to analyze pentanucleotide composition in 50-kb sequences derived from the human genome in this study, we found that BLSOMs did not classify human sequences according to chromosome but revealed several specific zones, which are enriched for a class of CG-containing pentanucleotides; these zones are composed primarily of sequences derived from pericentric regions. The biological significance of enrichment of these pentanucleotides in pericentric regions is discussed in connection with cell type- and stage-dependent formation of the condensed heterochromatin in the chromocenter, which is formed through association of pericentric regions of multiple chromosomes.


Asunto(s)
Composición de Base , Sitios de Unión , Cromosomas Humanos , Genoma Humano , Genómica , Motivos de Nucleótidos , Oligonucleótidos , Factores de Transcripción/metabolismo , Genómica/métodos , Humanos
13.
DNA Res ; 21(5): 459-67, 2014 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-24800745

RESUMEN

With a remarkable increase in genomic sequence data of a wide range of species, novel tools are needed for comprehensive analyses of the big sequence data. Self-organizing map (SOM) is a powerful tool for clustering high-dimensional data on one plane. For oligonucleotide compositions handled as high-dimensional data, we have previously modified the conventional SOM for genome informatics: BLSOM. In the present study, we constructed BLSOMs for oligonucleotide compositions in fragment sequences (e.g. 100 kb) from a wide range of vertebrates, including coelacanth, and found that the sequences were clustered primarily according to species without species information. As one of the nearest living relatives of tetrapod ancestors, coelacanth is believed to provide access to the phenotypic and genomic transitions leading to the emergence of tetrapods. The characteristic oligonucleotide composition found for coelacanth was connected with the lowest dinucleotide CG occurrence (i.e. the highest CG suppression) among fishes, which was rather equivalent to that of tetrapods. This evident CG suppression in coelacanth should reflect molecular evolutionary processes of epigenetic systems including DNA methylation during vertebrate evolution. Sequence of a de novo DNA methylase (Dntm3a) of coelacanth was found to be more closely related to that of tetrapods than that of other fishes.


Asunto(s)
Evolución Molecular , Genoma , Vertebrados/genética , Animales , Biología Computacional , Metilasas de Modificación del ADN/genética , Peces/genética , Filogenia
14.
BMC Infect Dis ; 13: 386, 2013 Aug 21.
Artículo en Inglés | MEDLINE | ID: mdl-23964903

RESUMEN

BACKGROUND: With the remarkable increase of microbial and viral sequence data obtained from high-throughput DNA sequencers, novel tools are needed for comprehensive analysis of the big sequence data. We have developed "Batch-Learning Self-Organizing Map (BLSOM)" which can characterize very many, even millions of, genomic sequences on one plane. Influenza virus is one of zoonotic viruses and shows clear host tropism. Important issues for bioinformatics studies of influenza viruses are prediction of genomic sequence changes in the near future and surveillance of potentially hazardous strains. METHODS: To characterize sequence changes in influenza virus genomes after invasion into humans from other animal hosts, we applied BLSOMs to analyses of mono-, di-, tri-, and tetranucleotide compositions in all genome sequences of influenza A and B viruses and found clear host-dependent clustering (self-organization) of the sequences. RESULTS: Viruses isolated from humans and birds differed in mononucleotide composition from each other. In addition, host-dependent oligonucleotide compositions that could not be explained with the host-dependent mononucleotide composition were revealed by oligonucleotide BLSOMs. Retrospective time-dependent directional changes of mono- and oligonucleotide compositions, which were visualized for human strains on BLSOMs, could provide predictive information about sequence changes in newly invaded viruses from other animal hosts (e.g. the swine-derived pandemic H1N1/09). CONCLUSIONS: Basing on the host-dependent oligonucleotide composition, we proposed a strategy for prediction of directional changes of virus sequences and for surveillance of potentially hazardous strains when introduced into human populations from non-human sources. Millions of genomic sequences from infectious microbes and viruses have become available because of their medical and social importance, and BLSOM can characterize the big data and support efficient knowledge discovery.


Asunto(s)
Genoma Viral , Genómica/métodos , Virus de la Influenza A/genética , Virus de la Influenza B/genética , Gripe Humana/virología , Bases de Datos Genéticas , Humanos , Subtipo H1N1 del Virus de la Influenza A/genética , Modelos Genéticos , ARN Viral , Estudios Retrospectivos , Análisis de Secuencia de ARN , Tropismo Viral
15.
Chromosome Res ; 21(5): 461-74, 2013 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-23896648

RESUMEN

Since oligonucleotide composition in the genome sequence varies significantly among species even among those possessing the same genome G + C%, the composition has been used to distinguish a wide range of genomes and called as "genome signature". Oligonucleotides often represent motif sequences responsible for sequence-specific protein binding (e.g., transcription-factor binding). Occurrences of such motif oligonucleotides in the genome should be biased compared to those observed in random sequences and may differ among genomes and genomic portions. Self-Organizing Map (SOM) is a powerful tool for clustering high-dimensional data such as oligonucleotide composition on one plane. We previously modified the conventional SOM for genome informatics to batch learning SOM or "BLSOM". When we constructed BLSOMs to analyze pentanucleotide composition in 20-, 50-, and 100-kb sequences derived from the human genome, BLSOMs did not classify human sequences according to chromosome but revealed several specific zones composed primarily of sequences derived from pericentric regions. Interestingly, various transcription-factor-binding motifs were characteristically overrepresented in pericentric regions but underrepresented in most genomic sequences. When we focused on much shorter sequences (e.g., 1 kb), the clustering of transcription-factor-binding motifs was evident in pericentric, subtelomeric and sex chromosome pseudoautosomal regions. The biological significance of the clustering in these regions was discussed in connection with cell-type and -stage-dependent chromocenter formation and nuclear organization.


Asunto(s)
Sitios de Unión , Biología Computacional/métodos , Genoma Humano , Genómica/métodos , Motivos de Nucleótidos , Factores de Transcripción/metabolismo , Secuencia de Bases , Mapeo Cromosómico , Análisis por Conglomerados , Secuencia de Consenso , Bases de Datos Genéticas , Humanos
16.
Microorganisms ; 1(1): 137-157, 2013 Nov 20.
Artículo en Inglés | MEDLINE | ID: mdl-27694768

RESUMEN

With the remarkable increase of genomic sequence data of microorganisms, novel tools are needed for comprehensive analyses of the big sequence data available. The self-organizing map (SOM) is an effective tool for clustering and visualizing high-dimensional data, such as oligonucleotide composition on one map. By modifying the conventional SOM, we developed batch-learning SOM (BLSOM), which allowed classification of sequence fragments (e.g., 1 kb) according to phylotypes, solely depending on oligonucleotide composition. Metagenomics studies of uncultivable microorganisms in clinical and environmental samples should allow extensive surveys of genes important in life sciences. BLSOM is most suitable for phylogenetic assignment of metagenomic sequences, because fragmental sequences can be clustered according to phylotypes, solely depending on oligonucleotide composition. We first constructed oligonucleotide BLSOMs for all available sequences from genomes of known species, and by mapping metagenomic sequences on these large-scale BLSOMs, we can predict phylotypes of individual metagenomic sequences, revealing a microbial community structure of uncultured microorganisms, including viruses. BLSOM has shown that influenza viruses isolated from humans and birds clearly differ in oligonucleotide composition. Based on this host-dependent oligonucleotide composition, we have proposed strategies for predicting directional changes of virus sequences and for surveilling potentially hazardous strains when introduced into humans from non-human sources.

17.
DNA Res ; 18(2): 125-36, 2011 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-21444341

RESUMEN

Influenza virus poses a significant threat to public health, as exemplified by the recent introduction of the new pandemic strain H1N1/09 into human populations. Pandemics have been initiated by the occurrence of novel changes in animal sources that eventually adapt to human. One important issue in studies of viral genomes, particularly those of influenza virus, is to predict possible changes in genomic sequence that will become hazardous. We previously established a clustering method termed 'BLSOM' (batch-learning self-organizing map) that does not depend on sequence alignment and can characterize and compare even 1 million genomic sequences in one run. Strategies for comparing a vast number of genomic sequences simultaneously become increasingly important in genome studies because of remarkable progresses in nucleotide sequencing. In this study, we have constructed BLSOMs based on the oligonucleotide and codon composition of all influenza A viral strains available. Without prior information with regard to their hosts, sequences derived from strains isolated from avian or human sources were successfully clustered according to the hosts. Notably, the pandemic H1N1/09 strains have oligonucleotide and codon compositions that are clearly different from those of human seasonal influenza A strains. This enables us to infer future directional changes in the influenza A viral genome.


Asunto(s)
Genoma Viral/genética , Subtipo H1N1 del Virus de la Influenza A/genética , Modelos Genéticos , Pandemias , Animales , Secuencia de Bases , Aves/virología , Mapeo Cromosómico , Análisis por Conglomerados , Codón , Especificidad del Huésped , Humanos , Gripe Aviar/virología , Gripe Humana/epidemiología , Gripe Humana/virología , Oligonucleótidos/genética , Reproducibilidad de los Resultados , Estaciones del Año , Homología de Secuencia de Ácido Nucleico , Factores de Tiempo
18.
Biomaterials ; 31(14): 4179-85, 2010 May.
Artículo en Inglés | MEDLINE | ID: mdl-20181392

RESUMEN

Curcumin, which can exist in an equilibrium between keto and enol tautomers, binds to beta-amyloid (Abeta) fibrils/aggregates. The aim of this study was to assess the relationship between the tautomeric structures of curcumin derivatives and their Abeta-binding activities. Curcumin derivatives with keto-enol tautomerism showed high levels of binding to Abeta aggregates but not to Abeta monomers. The binding activity of the keto form analogue of curcumin to Abeta aggregates was found to be much weaker than that of curcumin derivatives with keto-enol tautomerism. The color of a curcumin derivative with keto-enol tautomerism, which was substituted at the C-4 position, changed from yellow to orange within 30 min of being combined with Abeta aggregates in physiological buffer. This resulted from a remarkable increase in the enol form with extended conjugation of double bonds upon binding. These findings suggest that curcumin derivatives exist predominantly in the enol form during binding to Abeta aggregates, and that the enolization of curcumin derivatives is crucial for binding to Abeta aggregates. The keto-enol tautomerism of curcumin derivatives may be a novel target for the design of amyloid-binding agents that can be used both for therapy and for amyloid detection in Alzheimer's disease.


Asunto(s)
Enfermedad de Alzheimer/tratamiento farmacológico , Péptidos beta-Amiloides/metabolismo , Curcumina/análogos & derivados , Curcumina/uso terapéutico , Péptidos beta-Amiloides/química , Curcumina/química , Curcumina/metabolismo , Espectroscopía de Resonancia Magnética , Metanol/química , Unión Proteica/efectos de los fármacos , Estructura Cuaternaria de Proteína , Soluciones , Espectrofotometría Ultravioleta , Estereoisomerismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...