Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 17: 133, 2016 Mar 18.
Artigo em Inglês | MEDLINE | ID: mdl-26992851

RESUMO

BACKGROUND: Reconstruction of multiple sequence alignments (MSAs) is a crucial step in most homology-based sequence analyses, which constitute an integral part of computational biology. To improve the accuracy of this crucial step, it is essential to better characterize errors that state-of-the-art aligners typically make. For this purpose, we here introduce two tools: the complete-likelihood score and the position-shift map. RESULTS: The logarithm of the total probability of a MSA under a stochastic model of sequence evolution along a time axis via substitutions, insertions and deletions (called the "complete-likelihood score" here) can serve as an ideal score of the MSA. A position-shift map, which maps the difference in each residue's position between two MSAs onto one of them, can clearly visualize where and how MSA errors occurred and help disentangle composite errors. To characterize MSA errors using these tools, we constructed three sets of simulated MSAs of selectively neutral mammalian DNA sequences, with small, moderate and large divergences, under a stochastic evolutionary model with an empirically common power-law insertion/deletion length distribution. Then, we reconstructed MSAs using MAFFT and Prank as representative state-of-the-art single-optimum-search aligners. About 40-99% of the hundreds of thousands of gapped segments were involved in alignment errors. In a substantial fraction, from about 1/4 to over 3/4, of erroneously reconstructed segments, reconstructed MSAs by each aligner showed complete-likelihood scores not lower than those of the true MSAs. Out of the remaining errors, a majority by an iterative option of MAFFT showed discrepancies between the aligner-specific score and the complete-likelihood score, and a majority by Prank seemed due to inadequate exploration of the MSA space. Analyses by position-shift maps indicated that true MSAs are in considerable neighborhoods of reconstructed MSAs in about 80-99% of the erroneous segments for small and moderate divergences, but in only a minority for large divergences. CONCLUSIONS: The results of this study suggest that measures to further improve the accuracy of reconstructed MSAs would substantially differ depending on the types of aligners. They also re-emphasize the importance of obtaining a probability distribution of fairly likely MSAs, instead of just searching for a single optimum MSA.


Assuntos
Biologia Computacional/métodos , DNA/genética , Alinhamento de Sequência/estatística & dados numéricos , Homologia de Sequência do Ácido Nucleico , Animais , Simulação por Computador , Evolução Molecular , Mutação INDEL , Funções Verossimilhança , Mamíferos/genética , Filogenia , Distribuições Estatísticas
2.
BMC Bioinformatics ; 17: 304, 2016 Aug 11.
Artigo em Inglês | MEDLINE | ID: mdl-27638547

RESUMO

BACKGROUND: Insertions and deletions (indels) account for more nucleotide differences between two related DNA sequences than substitutions do, and thus it is imperative to develop a stochastic evolutionary model that enables us to reliably calculate the probability of the sequence evolution through indel processes. Recently, indel probabilistic models are mostly based on either hidden Markov models (HMMs) or transducer theories, both of which give the indel component of the probability of a given sequence alignment as a product of either probabilities of column-to-column transitions or block-wise contributions along the alignment. However, it is not a priori clear how these models are related with any genuine stochastic evolutionary model, which describes the stochastic evolution of an entire sequence along the time-axis. Moreover, currently none of these models can fully accommodate biologically realistic features, such as overlapping indels, power-law indel-length distributions, and indel rate variation across regions. RESULTS: Here, we theoretically dissect the ab initio calculation of the probability of a given sequence alignment under a genuine stochastic evolutionary model, more specifically, a general continuous-time Markov model of the evolution of an entire sequence via insertions and deletions. Our model is a simple extension of the general "substitution/insertion/deletion (SID) model". Using the operator representation of indels and the technique of time-dependent perturbation theory, we express the ab initio probability as a summation over all alignment-consistent indel histories. Exploiting the equivalence relations between different indel histories, we find a "sufficient and nearly necessary" set of conditions under which the probability can be factorized into the product of an overall factor and the contributions from regions separated by gapless columns of the alignment, thus providing a sort of generalized HMM. The conditions distinguish evolutionary models with factorable alignment probabilities from those without ones. The former category includes the "long indel" model (a space-homogeneous SID model) and the model used by Dawg, a genuine sequence evolution simulator. CONCLUSIONS: With intuitive clarity and mathematical preciseness, our theoretical formulation will help further advance the ab initio calculation of alignment probabilities under biologically realistic models of sequence evolution via indels.


Assuntos
Evolução Molecular , Mutação INDEL , Modelos Genéticos , Humanos , Cadeias de Markov , Probabilidade , Alinhamento de Sequência
3.
BMC Bioinformatics ; 17(1): 397, 2016 Sep 27.
Artigo em Inglês | MEDLINE | ID: mdl-27677569

RESUMO

BACKGROUND: Insertions and deletions (indels) account for more nucleotide differences between two related DNA sequences than substitutions do, and thus it is imperative to develop a method to reliably calculate the occurrence probabilities of sequence alignments via evolutionary processes on an entire sequence. Previously, we presented a perturbative formulation that facilitates the ab initio calculation of alignment probabilities under a continuous-time Markov model, which describes the stochastic evolution of an entire sequence via indels with quite general rate parameters. And we demonstrated that, under some conditions, the ab initio probability of an alignment can be factorized into the product of an overall factor and contributions from regions (or local alignments) delimited by gapless columns. RESULTS: Here, using our formulation, we attempt to approximately calculate the probabilities of local alignments under space-homogeneous cases. First, for each of all types of local pairwise alignments (PWAs) and some typical types of local multiple sequence alignments (MSAs), we numerically computed the total contribution from all parsimonious indel histories and that from all next-parsimonious histories, and compared them. Second, for some common types of local PWAs, we derived two integral equation systems that can be numerically solved to give practically exact solutions. We compared the total parsimonious contribution with the practically exact solution for each such local PWA. Third, we developed an algorithm that calculates the first-approximate MSA probability by multiplying total parsimonious contributions from all local MSAs. Then we compared the first-approximate probability of each local MSA with its absolute frequency in the MSAs created via a genuine sequence evolution simulator, Dawg. In all these analyses, the total parsimonious contributions approximated the multiplication factors fairly well, as long as gap sizes and branch lengths are at most moderate. Examination of the accuracy of another indel probabilistic model in the light of our formulation indicated some modifications necessary for the model's accuracy improvement. CONCLUSIONS: At least under moderate conditions, the approximate methods can quite accurately calculate ab initio alignment probabilities under biologically more realistic models than before. Thus, our formulation will provide other indel probabilistic models with a sound reference point.

4.
BMC Genet ; 14: 37, 2013 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-23651527

RESUMO

BACKGROUND: Whether or not a mutant allele in a population is under selection is an important issue in population genetics, and various neutrality tests have been invented so far to detect selection. However, detection of negative selection has been notoriously difficult, partly because negatively selected alleles are usually rare in the population and have little impact on either population dynamics or the shape of the gene genealogy. Recently, through studies of genetic disorders and genome-wide analyses, many structural variations were shown to occur recurrently in the population. Such "recurrent mutations" might be revealed as deleterious by exploiting the signal of negative selection in the gene genealogy enhanced by their recurrence. RESULTS: Motivated by the above idea, we devised two new test statistics. One is the total number of mutants at a recurrently mutating locus among sampled sequences, which is tested conditionally on the number of forward mutations mapped on the sequence genealogy. The other is the size of the most common class of identical-by-descent mutants in the sample, again tested conditionally on the number of forward mutations mapped on the sequence genealogy. To examine the performance of these two tests, we simulated recurrently mutated loci each flanked by sites with neutral single nucleotide polymorphisms (SNPs), with no recombination. Using neutral recurrent mutations as null models, we attempted to detect deleterious recurrent mutations. Our analyses demonstrated high powers of our new tests under constant population size, as well as their moderate power to detect selection in expanding populations. We also devised a new maximum parsimony algorithm that, given the states of the sampled sequences at a recurrently mutating locus and an incompletely resolved genealogy, enumerates mutation histories with a minimum number of mutations while partially resolving genealogical relationships when necessary. CONCLUSIONS: With their considerably high powers to detect negative selection, our new neutrality tests may open new venues for dealing with the population genetics of recurrent mutations as well as help identifying some types of genetic disorders that may have escaped identification by currently existing methods.


Assuntos
Mutação , Seleção Genética , Humanos , Polimorfismo de Nucleotídeo Único
6.
Mol Biol Evol ; 27(9): 2152-71, 2010 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-20430861

RESUMO

Homogenization of duplicated genes is an important factor for gene family evolution. In the previous study, we developed a method, named 4-2-4 here, to detect partial homogenization with high sensitivity and high specificity using quartets. A quartet is a set of four genes generated by a duplication event and the subsequent speciation of two closely related species. We searched the human and macaque genomes and found 430 nonredundant quartets, which correspond to primate-specific paralogs. The prevalence of homogenization in these quartets was 10.0% (43/430), which was ca. one-third of that (29.8% = 206/691) in the rodent-specific nonredundant quartets obtained through comparison of mouse and rat genomes. Part of this difference comes from the fact that primate paralogs tend to be more remotely located to each other than rodent paralogs, and the remainder may be explained by the inherent difference in the neutral evolutionary rate between the primate and rodent lineages. A statistical analysis taking account of the effects of false negatives uncovered negative correlations between sequence divergence and homogenization prevalence both in primates and rodents. Further statistical analyses controlling for false-negative rates and sequence divergences revealed two characteristics shared by the primate and rodent paralogs; 1) significant negative correlations of the homogenization prevalence with physical distances, and 2) no significant correlation between the prevalence and relative transcriptional orientations. Patterns of the homogenization in the genomic alignments of human-macaque quartets indicate that gene conversion, rather than unequal crossing-over, is the major cause of the homogenization.


Assuntos
Evolução Molecular , Macaca/genética , Primatas/genética , Animais , Conversão Gênica/genética , Genes Duplicados/genética , Humanos
7.
Genetics ; 194(3): 709-19, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23666936

RESUMO

The population genetic behavior of mutations in sperm genes is theoretically investigated. We modeled the processes at two levels. One is the standard population genetic process, in which the population allele frequencies change generation by generation, depending on the difference in selective advantages. The other is the sperm competition during each genetic transmission from one generation to the next generation. For the sperm competition process, we formulate the situation where a huge number of sperm with alleles A and B, produced by a single heterozygous male, compete to fertilize a single egg. This "minimal model" demonstrates that a very slight difference in sperm performance amounts to quite a large difference between the alleles' winning probabilities. By incorporating this effect of paternity-sharing sperm competition into the standard population genetic process, we show that fierce sperm competition can enhance the fixation probability of a mutation with a very small phenotypic effect at the single-sperm level, suggesting a contribution of sperm competition to rapid amino acid substitutions in haploid-expressed sperm genes. Considering recent genome-wide demonstrations that a substantial fraction of the mammalian sperm genes are haploid expressed, our model could provide a potential explanation of rapid evolution of sperm genes with a wide variety of functions (as long as they are expressed in the haploid phase). Another advantage of our model is that it is applicable to a wide range of species, irrespective of whether the species is externally fertilizing, polygamous, or monogamous. The theoretical result was applied to mammalian data to estimate the selection intensity on nonsynonymous mutations in sperm genes.


Assuntos
Evolução Molecular , Haploidia , Seleção Genética , Espermatozoides , Alelos , Substituição de Aminoácidos , Animais , Regulação da Expressão Gênica no Desenvolvimento , Genética Populacional , Heterozigoto , Humanos , Masculino , Taxa de Mutação
8.
Genome Biol Evol ; 3: 1119-35, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21859807

RESUMO

Duplogs, or intraspecies paralogs, constitute the important portion of eukaryote genomes and serve as a major source of functional innovation. We conducted detailed analyses of recently emerged animal duplogs. Genome data of three vertebrate species (Homo sapiens, Mus musculus, and Danio rerio), Caenorhabditis elegans, and two Drosophila species (Drosophila melanogaster and D. pseudoobscura) were used. Duplication events were divided into six age-groups according to the synonymous distance (dS) up to 0.6. Duplogs were classified into four equal-sized classes on physical distances and into three classes on relative orientations. We observed the following shared characteristics among intrachromosomal multiexon duplogs: 1) inverted duplogs account for 20-50%, and about a half of the physically most distant 25%; 2) except for C. elegans, the composition of physical distances, that of relative orientations, and the proportion of inverted duplogs in each physical distance category are more or less uniform; 3) except for C. elegans, the characteristics of the youngest (dS < 0.01) duplogs are similar to the overall characteristics of the entire set. These results suggest that intrachromosomal duplogs with fairly long physical distances were generated at once, rather than resulting from tandem duplications and subsequent genomic rearrangements. This is different from the three well-known modes of gene duplication: tandem duplication, retrotransposition, and genome duplication. We termed this new mode as "drift" duplication. The drift duplication has been producing duplicate copies at paces comparable with tandem duplications since the common ancestor of vertebrates, and it may have already operated in the common ancestor of bilateral animals.


Assuntos
Caenorhabditis elegans/genética , Drosophila/genética , Evolução Molecular , Duplicação Gênica , Vertebrados/genética , Animais , Caenorhabditis elegans/classificação , Drosophila/classificação , Genoma , Humanos , Filogenia , Vertebrados/classificação
9.
Genomics ; 89(5): 618-29, 2007 May.
Artigo em Inglês | MEDLINE | ID: mdl-17350798

RESUMO

Gasdermin (Gsdm) was originally identified as a candidate causative gene for several mouse skin mutants. Several Gsdm-related genes sharing a protein domain with DFNA5, the causative gene of human nonsyndromic hearing loss, have been found in the mouse and human genomes, and this group is referred to as the DFNA5-Gasdermin domain family. However, our current comparative genomic analysis identified several novel motifs distinct from the previously reported domain in the Gsdm-related genes. We also identified three new Gsdm genes clustered on mouse chromosome 15. We named these genes collectively the Gsdm family. Extensive expression analysis revealed exclusive expression of Gsdm family genes in the epithelium of the skin and gastrointestinal tract in a highly tissue-specific manner. Further database searching revealed the presence of other related genes with a similar N-terminal motif. These results suggest that the Gsdm family and related genes have evolved divergent epithelial expression profiles.


Assuntos
Epitélio/metabolismo , Trato Gastrointestinal/citologia , Proteínas de Neoplasias/química , Pele/citologia , Sequência de Aminoácidos , Animais , Expressão Gênica , Humanos , Camundongos , Camundongos Endogâmicos C57BL , Dados de Sequência Molecular , Proteínas de Neoplasias/classificação , Proteínas de Neoplasias/genética , Proteínas de Neoplasias/fisiologia , Proteínas do Tecido Nervoso/química , Especificidade de Órgãos , Filogenia , Receptores de Estrogênio/química , Homologia de Sequência de Aminoácidos
10.
Mol Biol Evol ; 23(5): 927-40, 2006 May.
Artigo em Inglês | MEDLINE | ID: mdl-16407460

RESUMO

Gene conversion is considered to play important roles in the formation of genomic makeup such as homogenization of multigene families and diversification of alleles. We devised two statistical tests on quartets for detecting gene conversion events. Each "quartet" consists of two pairs of orthologous sequences supposed to have been generated by a duplication event and a subsequent speciation of two closely related species. As example data, EnsEMBL mouse and rat cDNA sequences were used to obtain a genome-wide picture of gene conversion events. We extensively sampled 2,641 quartets that appear to have resulted from duplications after the divergence of primates and rodents and before mouse-rat speciation. Combination of our new tests with Sawyer's and Takahata's tests enhanced the detection sensitivity while keeping false positives as few as possible. About 18% (488 quartets) were shown to be highly positive for gene conversion using this combined test. Out of them, 340 (13% of the total) showed signs of gene conversion in mouse sequence pairs. Those gene conversion-positive gene pairs are mostly linked in the same chromosomes, with the proportion of positive pairs in the linked and unlinked categories being 15% and 1%, respectively. Statistical analyses showed that (1) the susceptibility to gene conversion correlates negatively with the physical distance, especially the frequency of 29% was observed for gene pairs whose distances are smaller than 55 kb; (2) the occurrence of gene conversions does not depend on the transcriptional direction; (3) small gene families consisting of between three and six contiguous genes are highly prone to gene conversion; and (4) frequency of gene conversions greatly varies depending on functional categories, and cadherins favor gene conversion, while vomeronasal receptors type 1 and immunoglobulin V-type proteins disfavor it. These findings will be useful to deepen the understanding of the roles of gene conversion.


Assuntos
Duplicação Gênica , Genoma , Alelos , Animais , Códon , DNA Complementar/metabolismo , Conversão Gênica , Ligação Genética , Camundongos , Peptídeos/química , Ratos , Especificidade da Espécie
11.
Genome Res ; 14(12): 2439-47, 2004 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-15574823

RESUMO

MSM/Ms is an inbred strain derived from the Japanese wild mouse, Mus musculus molossinus. It is believed that subspecies molossinus has contributed substantially to the genome constitution of common laboratory strains of mice, although the majority of their genome is derived from the west European M. m. domesticus. Information on the molossinus genome is thus essential not only for genetic studies involving molossinus but also for characterization of common laboratory strains. Here, we report the construction of an arrayed bacterial artificial chromosome (BAC) library from male MSM/Ms genomic DNA, covering approximately 1x genome equivalent. Both ends of 176,256 BAC clone inserts were sequenced, and 62,988 BAC-end sequence (BES) pairs were mapped onto the C57BL/6J genome (NCBI mouse Build 30), covering 2,228,164 kbp or 89% of the total genome. Taking advantage of the BES map data, we established a computer-based clone screening system. Comparison of the MSM/Ms and C57BL/6J sequences revealed 489,200 candidate single nucleotide polymorphisms (SNPs) in 51,137,941 bp sequenced. The overall nucleotide substitution rate was as high as 0.0096. The distribution of SNPs along the C57BL/6J genome was not uniform: The majority of the genome showed a high SNP rate, and only 5.2% of the genome showed an extremely low SNP rate (percentage identity = 0.9997); these sequences are likely derived from the molossinus genome.


Assuntos
Cromossomos Artificiais Bacterianos/genética , Genoma , Camundongos/genética , Animais , Sequência de Bases , Mapeamento Cromossômico , Primers do DNA , Masculino , Camundongos Endogâmicos C57BL , Dados de Sequência Molecular , Polimorfismo de Nucleotídeo Único/genética , Análise de Sequência de DNA , Especificidade da Espécie
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA