RESUMO
BACKGROUND: To a greater or lesser extent, eukaryotic nuclear genomes contain fragments of their mitochondrial genome counterpart, deriving from the random insertion of damaged mtDNA fragments. NumtS (Nuclear mt Sequences) are not equally abundant in all species, and are redundant and polymorphic in terms of copy number. In population and clinical genetics, it is important to have a complete overview of NumtS quantity and location. Searching PubMed for NumtS or Mitochondrial pseudo-genes yields hundreds of papers reporting Human NumtS compilations produced by in silico or wet-lab approaches. A comparison of published compilations clearly shows significant discrepancies among data, due both to unwise application of Bioinformatics methods and to a not yet correctly assembled nuclear genome. To optimize quantification and location of NumtS, we produced a consensus compilation of Human NumtS by applying various bioinformatics approaches. RESULTS: Location and quantification of NumtS may be achieved by applying database similarity searching methods: we have applied various methods such as Blastn, MegaBlast and BLAT, changing both parameters and database; the results were compared, further analysed and checked against the already published compilations, thus producing the Reference Human Numt Sequences (RHNumtS) compilation. The resulting NumtS total 190. CONCLUSION: The RHNumtS compilation represents a highly reliable reference basis, which may allow designing a lab protocol to test the actual existence of each NumtS. Here we report preliminary results based on PCR amplification and sequencing on 41 NumtS selected from RHNumtS among those with lower score. In parallel, we are currently designing the RHNumtS database structure for implementation in the HmtDB resource. In the future, the same database will host NumtS compilations from other organisms, but these will be generated only when the nuclear genome of a specific organism has reached a high-quality level of assembly.
Assuntos
Biologia Computacional/métodos , DNA Mitocondrial/genética , Genoma Humano , Genômica/métodos , Núcleo Celular/genética , Mapeamento Cromossômico , Bases de Dados de Ácidos Nucleicos , Genoma Mitocondrial , Humanos , Isocoros/genética , Reação em Cadeia da Polimerase , Valores de Referência , Alinhamento de Sequência , Análise de Sequência de DNARESUMO
Sequencing of entire human mtDNA genomes has become rapid and efficient, leading to the production of a great number of complete mtDNA sequences from a wide range of human populations. We introduce here a new statistical approach for classifying mtDNA nucleotide sites, simply by comparing the mean simple deviation (MSD) of their specific variability values estimated on continent-specific dataset sequences, without the need for any reference sequence. Excellent correspondence was observed between sites with the highest MSD values and those marking known mtDNA haplogroups. This in turn supports the classification of 81 sites (23 in Africa, eight in Asia, eight in Europe, 34 in Oceania, and eight in America) as novel markers of 47 mtDNA haplogroups not yet identified by phylogeographic studies. Not only does this approach allow refinement of mtDNA phylogeny, an essential requirement also for mitochondrial disease studies, but may greatly facilitate the discrimination of candidate disease-causing mutations from haplogroup-specific polymorphisms in mtDNA sequences of patients affected by mitochondrial disorders.
Assuntos
Análise Mutacional de DNA/métodos , DNA Mitocondrial/química , Haplótipos , Polimorfismo Genético , DNA Mitocondrial/classificação , Marcadores Genéticos , Humanos , Filogenia , Grupos Raciais/genéticaRESUMO
BACKGROUND: Population genetics studies based on the analysis of mtDNA and mitochondrial disease studies have produced a huge quantity of sequence data and related information. These data are at present worldwide distributed in differently organised databases and web sites not well integrated among them. Moreover it is not generally possible for the user to submit and contemporarily analyse its own data comparing them with the content of a given database, both for population genetics and mitochondrial disease data. RESULTS: HmtDB is a well-integrated web-based human mitochondrial bioinformatic resource aimed at supporting population genetics and mitochondrial disease studies, thanks to a new approach based on site-specific nucleotide and aminoacid variability estimation. HmtDB consists of a database of Human Mitochondrial Genomes, annotated with population data, and a set of bioinformatic tools, able to produce site-specific variability data and to automatically characterize newly sequenced human mitochondrial genomes. A query system for the retrieval of genomes and a web submission tool for the annotation of new genomes have been designed and will soon be implemented. The first release contains 1255 fully annotated human mitochondrial genomes. Nucleotide site-specific variability data and multialigned genomes can be downloaded. Intra-human and inter-species aminoacid variability data estimated on the 13 coding for proteins genes of the 1255 human genomes and 60 mammalian species are also available. HmtDB is freely available, upon registration, at http://www.hmdb.uniba.it. CONCLUSION: The HmtDB project will contribute towards completing and/or refining haplogroup classification and revealing the real pathogenic potential of mitochondrial mutations, on the basis of variability estimation.