Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 96
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Genome Res ; 31(5): 799-810, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33863805

RESUMO

The members of the tribe Brassiceae share a whole-genome triplication (WGT), and one proposed model for its formation is a two-step pair of hybridizations producing hexaploid descendants. However, evidence for this model is incomplete, and the evolutionary and functional constraints that drove evolution after the hexaploidy are even less understood. Here, we report a new genome sequence of Crambe hispanica, a species sister to most sequenced Brassiceae. Using this new genome and three others that share the hexaploidy, we traced the history of gene loss after the WGT using the Polyploidy Orthology Inference Tool (POInT). We confirm the two-step formation model and infer that there was a significant temporal gap between those two allopolyploidizations, with about a third of the gene losses from the first two subgenomes occurring before the arrival of the third. We also, for the 90,000 individual genes in our study, make parental subgenome assignments, inferring, with measured uncertainty, from which of the progenitor genomes of the allohexaploidy each gene derives. We further show that each subgenome has a statistically distinguishable rate of homoeolog losses. There is little indication of functional distinction between the three subgenomes: the individual subgenomes show no patterns of functional enrichment, no excess of shared protein-protein or metabolic interactions between their members, and no biases in their likelihood of having experienced a recent selective sweep. We propose a "mix and match" model of allopolyploidy, in which subgenome origin drives homoeolog loss propensities but where genes from different subgenomes function together without difficulty.


Assuntos
Genoma , Poliploidia , Evolução Molecular , Genoma de Planta , Humanos , Hibridização Genética , Filogenia
2.
Entropy (Basel) ; 25(12)2023 Dec 17.
Artigo em Inglês | MEDLINE | ID: mdl-38136548

RESUMO

Capacity restrictions in stores, maintained by mechanisms like spacing customer intake, became familiar features of retailing in the time of the pandemic. Shopping rates in a crowded store under a social distancing regime are prone to considerable slowdown. Inspired by the random particle collision concepts of statistical mechanics, we introduce a dynamical model of the evolution of the shopping rate as a function of a given customer intake rate. The slowdown of each individual customer is incorporated as an additive term to the baseline value of the shopping time, proportionally to the number of other customers in the store. We determine analytically and via simulation the trajectory of the model as it approaches a Little's law equilibrium and identify the point beyond which equilibrium cannot be achieved. By relating the customer shopping rate to the slowdown compared with the baseline, we can calculate the optimal intake rate leading to maximum equilibrium spending. This turns out to be the maximum rate compatible with equilibrium. The slowdown due to the largest possible number of shoppers is more than compensated for by the increased volume of shopping. This macroscopic model is validated by simulation experiments in which avoidance interactions between pairs of shoppers are responsible for shopping delays.

3.
J Theor Biol ; 532: 110924, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34627861

RESUMO

Many angiosperms have undergone some series of polyploidization events over the course of their evolutionary history. In these genomes, especially those resulting from multiple autopolyploidization, it may be relatively easy to recognize all the ξ sets of n homeologous chromosomes, but it is much harder, if not impossible, to partition these chromosomes into n subgenomes, each representing one distinct genomic component of ξ chromosomes making up the original polyploid. Thus, if we wish to infer the polyploidization history of the genome, we could make use of all the gene trees inferred from the genes in one set of homeologous chromosomes to construct a consensus tree, but there is no evident way of combining the trees from the ξ different sets, because we have no labelling of the chromosomes that is known to be consistent across these sets. We suggest here that lacking a consistent leaf-labelling, the topological structure of the trees may display sufficient resemblance so that a higher level consensus could be revealing of evolutionary history. This would be especially true of the peripheral structures of the tree, likely representing events that occurred more recently and have thus been less obscured by subsequent evolutionary processes. Here, we present a statistical test to assess whether the subgenomes in a polyploid genome could have been added one at a time. The null hypothesis is that the accumulation of chromosomes follows a stochastic process in which transition from one generation to the next is through randomly choosing an edge, and then subdividing this edge in order to link the new internal vertex to a new external vertex. We analyze the probability distributions of a number of peripheral tree substructures, namely leaf- or terminal-pairs, triples and quadruples, arising from this stochastic process, in terms of some exact recurrences. We propose some conjectures regarding the asymptotic behaviours of these distributions. Applying our analysis to a sugarcane genome, we demonstrate that it is unlikely that the accumulation of subgenomes has occurred one at a time.


Assuntos
Magnoliopsida , Poliploidia , Humanos , Filogenia
4.
Proc Natl Acad Sci U S A ; 116(34): 17081-17089, 2019 08 20.
Artigo em Inglês | MEDLINE | ID: mdl-31387975

RESUMO

The avocado, Persea americana, is a fruit crop of immense importance to Mexican agriculture with an increasing demand worldwide. Avocado lies in the anciently diverged magnoliid clade of angiosperms, which has a controversial phylogenetic position relative to eudicots and monocots. We sequenced the nuclear genomes of the Mexican avocado race, P. americana var. drymifolia, and the most commercially popular hybrid cultivar, Hass, and anchored the latter to chromosomes using a genetic map. Resequencing of Guatemalan and West Indian varieties revealed that ∼39% of the Hass genome represents Guatemalan source regions introgressed into a Mexican race background. Some introgressed blocks are extremely large, consistent with the recent origin of the cultivar. The avocado lineage experienced 2 lineage-specific polyploidy events during its evolutionary history. Although gene-tree/species-tree phylogenomic results are inconclusive, syntenic ortholog distances to other species place avocado as sister to the enormous monocot and eudicot lineages combined. Duplicate genes descending from polyploidy augmented the transcription factor diversity of avocado, while tandem duplicates enhanced the secondary metabolism of the species. Phenylpropanoid biosynthesis, known to be elicited by Colletotrichum (anthracnose) pathogen infection in avocado, is one enriched function among tandems. Furthermore, transcriptome data show that tandem duplicates are significantly up- and down-regulated in response to anthracnose infection, whereas polyploid duplicates are not, supporting the general view that collections of tandem duplicates contribute evolutionarily recent "tuning knobs" in the genome adaptive landscapes of given species.


Assuntos
Colletotrichum/fisiologia , DNA Intergênico , Introgressão Genética , Genoma de Planta , Interações Hospedeiro-Patógeno/genética , Magnoliopsida , Persea , Filogenia , Doenças das Plantas , Duplicação Gênica , Magnoliopsida/genética , Magnoliopsida/microbiologia , Persea/genética , Persea/microbiologia , Doenças das Plantas/genética , Doenças das Plantas/microbiologia
5.
Proc Natl Acad Sci U S A ; 114(22): E4435-E4441, 2017 05 30.
Artigo em Inglês | MEDLINE | ID: mdl-28507139

RESUMO

Utricularia gibba, the humped bladderwort, is a carnivorous plant that retains a tiny nuclear genome despite at least two rounds of whole genome duplication (WGD) since common ancestry with grapevine and other species. We used a third-generation genome assembly with several complete chromosomes to reconstruct the two most recent lineage-specific ancestral genomes that led to the modern U. gibba genome structure. Patterns of subgenome dominance in the most recent WGD, both architectural and transcriptional, are suggestive of allopolyploidization, which may have generated genomic novelty and led to instantaneous speciation. Syntenic duplicates retained in polyploid blocks are enriched for transcription factor functions, whereas gene copies derived from ongoing tandem duplication events are enriched in metabolic functions potentially important for a carnivorous plant. Among these are tandem arrays of cysteine protease genes with trap-specific expression that evolved within a protein family known to be useful in the digestion of animal prey. Further enriched functions among tandem duplicates (also with trap-enhanced expression) include peptide transport (intercellular movement of broken-down prey proteins), ATPase activities (bladder-trap acidification and transmembrane nutrient transport), hydrolase and chitinase activities (breakdown of prey polysaccharides), and cell-wall dynamic components possibly associated with active bladder movements. Whereas independently polyploid Arabidopsis syntenic gene duplicates are similarly enriched for transcriptional regulatory activities, Arabidopsis tandems are distinct from those of U. gibba, while still metabolic and likely reflecting unique adaptations of that species. Taken together, these findings highlight the special importance of tandem duplications in the adaptive landscapes of a carnivorous plant genome.


Assuntos
Carnivoridade/fisiologia , Genoma de Planta , Lamiales/genética , Lamiales/fisiologia , Adaptação Fisiológica/genética , Cisteína Proteases/química , Cisteína Proteases/genética , Evolução Molecular , Duplicação Gênica , Modelos Moleculares , Filogenia , Proteínas de Plantas/química , Proteínas de Plantas/genética , Poliploidia , Análise de Sequência de DNA , Sintenia
6.
BMC Bioinformatics ; 20(Suppl 20): 635, 2019 Dec 17.
Artigo em Inglês | MEDLINE | ID: mdl-31842736

RESUMO

BACKGROUND: A basic tool for studying the polyploidization history of a genome, especially in plants, is the distribution of duplicate gene similarities in syntenically aligned regions of a genome. This distribution can usually be decomposed into two or more components identifiable by peaks, or local maxima, each representing a different polyploidization event. The distributions may be generated by means of a discrete time branching process, followed by a sequence divergence model. The branching process, as well as the inference of fractionation rates based on it, requires knowledge of the ploidy level of each event, which cannot be directly inferred from the pair similarity distribution. RESULTS: For a sequence of two events of unknown ploidy, either tetraploid, giving rise to whole genome doubling (WGD), or hexaploid, giving rise to whole genome tripling (WGT), we base our analysis on triples of similar genes. We calculate the probability of the four triplet types with origins in one or the other event, or both, and impose a mutational model so that the distribution resembles the original data. Using a ML transition point in the similarities between the two events as a discriminator for the hypothesized origin of each similarity, we calculate the predicted number of triplets of each type for each model combining WGT and/or WGD. This yields a predicted profile of triplet types for each model. We compare the observed and predicted triplet profiles for each model to confirm the polyploidization history of durian, poplar and cabbage. CONCLUSIONS: We have developed a way of inferring the ploidy of up to three successive WGD and/or WGT events by estimating the time of origin of each of the similarities in triples of genes. This may be generalized to a larger number of events and to higher ploidies.


Assuntos
Genoma de Planta , Poliploidia , Sintenia/genética , Bombacaceae/genética , Brassicaceae/genética , Genes de Plantas , Modelos Genéticos , Mutação/genética , Populus/genética
7.
Bioinformatics ; 34(13): i366-i375, 2018 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-29950018

RESUMO

Motivation: When gene duplication occurs, one of the copies may become free of selective pressure and evolve at an accelerated pace. This has important consequences on the prediction of orthology relationships, since two orthologous genes separated by divergence after duplication may differ in both sequence and function. In this work, we make the distinction between the primary orthologs, which have not been affected by accelerated mutation rates on their evolutionary path, and the secondary orthologs, which have. Similarity-based prediction methods will tend to miss secondary orthologs, whereas phylogeny-based methods cannot separate primary and secondary orthologs. However, both types of orthology have applications in important areas such as gene function prediction and phylogenetic reconstruction, motivating the need for methods that can distinguish the two types. Results: We formalize the notion of divergence after duplication and provide a theoretical basis for the inference of primary and secondary orthologs. We then put these ideas to practice with the Hybrid Prediction of Paralogs and Orthologs (HyPPO) framework, which combines ideas from both similarity and phylogeny approaches. We apply our method to simulated and empirical datasets and show that we achieve superior accuracy in predicting primary orthologs, secondary orthologs and paralogs. Availability and implementation: HyPPO is a modular framework with a core developed in Python and is provided with a variety of C++ modules. The source code is available at https://github.com/manuellafond/HyPPO. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Evolução Molecular , Duplicação Gênica , Software , Eucariotos/genética , Taxa de Mutação , Filogenia , Análise de Sequência de DNA/métodos
8.
BMC Genomics ; 19(Suppl 5): 287, 2018 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-29745846

RESUMO

BACKGROUND: Fractionation is the genome-wide process of losing one gene per duplicate pair following whole genome multiplication (doubling, tripling, …). This is important in the evolution of plants over tens of millions of years, because of their repeated cycles of genome multiplication and fractionation. One type of evidence in the study of these processes is the frequency distribution of similarities between the two genes, over all the duplicate pairs in the genome. RESULTS: We study modeling and inference problems around the processes of fractionation and whole genome multiplication focusing first on the frequency distribution of similarities of duplicate pairs in the genome. Our birth-and-death model accounts for repeated duplication, triplication or other multiplication events, as well as fractionation rates among multiple progeny of a single gene specific to each event. It also has a biologically and combinatorially well-motivated way of handling the tendency for at least one sibling to survive fractionation. The method settles previously unexplored questions about the expected number of gene pairs tracing their ancestry back to each multiplication event. We exemplify the algebraic concepts inherent in our models and on Brassica rapa, whose evolutionary history is well-known. We demonstrate the quantitative analysis of high-similarity gene pairs and triples to confirm the known ploidies of events in the lineage of B. rapa. CONCLUSIONS: Our birth-and-death model accounts for the similarity distribution of paralogs in terms of multiple rounds of whole genome multiplication and fractionation. An analysis of high-similarity gene triples confirms the recent Brassica triplication.


Assuntos
Brassica rapa/genética , Duplicação Gênica , Genoma de Planta , Ploidias , Cromossomos de Plantas , Evolução Molecular , Filogenia , Sintenia
9.
BMC Genomics ; 19(Suppl 2): 100, 2018 May 09.
Artigo em Inglês | MEDLINE | ID: mdl-29764371

RESUMO

BACKGROUND: The reconstruction of ancestral genomes must deal with the problem of resolution, necessarily involving a trade-off between trying to identify genomic details and being overwhelmed by noise at higher resolutions. RESULTS: We use the median reconstruction at the synteny block level, of the ancestral genome of the order Gentianales, based on coffee, Rhazya stricta and grape, to exemplify the effects of resolution (granularity) on comparative genomic analyses. CONCLUSIONS: We show how decreased resolution blurs the differences between evolving genomes, with respect to rate, mutational process and other characteristics.


Assuntos
Apocynaceae/genética , Coffea/genética , Genoma de Planta , Vitis/genética , Algoritmos , Animais , Evolução Molecular , Ordem dos Genes , Modelos Genéticos , Mutação , Filogenia , Sintenia
10.
BMC Genomics ; 18(Suppl 4): 366, 2017 05 24.
Artigo em Inglês | MEDLINE | ID: mdl-28589858

RESUMO

BACKGROUND: The current literature establishes the importance of gene functional category and expression in promoting or suppressing duplicate gene loss after whole genome doubling in plants, a process known as fractionation. Inspired by studies that have reported gene expression to be the dominating factor in preventing duplicate gene loss, we analyzed the relative effect of functional category and expression. METHODS: We use multivariate methods to study data sets on gene retention, function and expression in rosids and asterids to estimate effects and assess their interaction. RESULTS: Our results suggest that the effect on duplicate gene retention fractionation by functional category and expression are independent and have no statistical interaction. CONCLUSION: In plants, functional category is the more dominant factor in explaining duplicate gene loss.


Assuntos
Perfilação da Expressão Gênica , Estatística como Assunto/métodos , Dosagem de Genes , Anotação de Sequência Molecular
11.
BMC Bioinformatics ; 17(Suppl 14): 412, 2016 Nov 11.
Artigo em Inglês | MEDLINE | ID: mdl-28185566

RESUMO

BACKGROUND: We propose a new, continuous model of the fractionation process (duplicate gene deletion after polyploidization) on the real line. The aim is to infer how much DNA is deleted at a time, based on segment lengths for alternating deleted (invisible) and undeleted (visible) regions. RESULTS: After deriving a number of analytical results for "one-sided" fractionation, we undertake a series of simulations that help us identify the distribution of segment lengths as a gamma with shape and rate parameters evolving over time. This leads to an inference procedure based on observed length distributions for visible and invisible segments. CONCLUSIONS: We suggest extensions of this mathematical and simulation work to biologically realistic discrete models, including two-sided fractionation.


Assuntos
Modelos Teóricos , Deleção de Genes , Duplicação Gênica , Genômica
12.
BMC Bioinformatics ; 17(Suppl 18): 473, 2016 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-28105920

RESUMO

BACKGROUND: The median of k≥3 genomes was originally defined to find a compromise genome indicative of a common ancestor. However, in gene order comparisons, the usual definitions based on minimizing the sum of distances to the input genomes lead to degenerate medians reflecting only one of the input genomes. "Near-medians", consisting of equal samples of gene adjacencies from all the input genomes, were designed to restore the idea of compromise to the median problem. RESULT: We explore adjacency sampling constructions in full generality in the case k=3, with given overlapping sets of adjacencies in the three genomes, where all adjacencies in two-way or three-way overlaps are included in the sample. We require the construction to be maximal, in the sense that no additional proportion of adjacencies from any of the genomes may be added without violating the local linearity of the genome. We discover that in incorporating as many adjacencies as possible, evenly from all the input genomes, we are actually maximizing, rather than minimizing, the sum of distances over all other maximal sampling schemes. CONCLUSIONS: We propose to explore compromise instead of parsimony as the organizing principle for the small phylogeny problem.


Assuntos
Genômica/normas , Algoritmos , Evolução Molecular , Ordem dos Genes , Genoma , Genômica/métodos , Modelos Genéticos
13.
BMC Genomics ; 17 Suppl 1: 1, 2016 Jan 11.
Artigo em Inglês | MEDLINE | ID: mdl-26818753

RESUMO

BACKGROUND: The inference of genome rearrangement operations requires complete genome assemblies as input data, since a rearrangement can involve an arbitrarily large proportion of one or more chromosomes. Most genome sequence projects, especially those on non-model organisms for which no physical map exists, produce very fragmented assembles, so that a rearranged fragment may be impossible to identify because its two endpoints are on different scaffolds. However, breakpoints are easily identified, as long as they do not coincide with scaffold ends. For the phylogenetic context, in comparing a fragmented assembly with a number of complete assemblies, certain combinatorial constraints on breakpoints can be derived. We ask to what extent we can use breakpoint data between a fragmented genome and a number of complete genomes to recover all the arrangements in a phylogeny. RESULTS: We simulate genomic evolution via chromosomal inversion, fragmenting one of the genomes into a large number of scaffolds to represent the incompleteness of assembly. We identify all the breakpoints between this genome and the remainder. We devise an algorithm which takes these breakpoints into account in trying to determine on which branch of the phylogeny a rearrangement event occurred. We present an analysis of the dependence of recovery rates on scaffold size and rearrangement rate, and show that the true tree, the one on which the rearrangement simulation was performed, tends to be most parsimonious in estimating the number of true events inferred. CONCLUSIONS: It is somewhat surprising that the breakpoints identified just between the fragmented genome and each of the others suffice to recover most of the rearrangements produced by the simulations. This holds even in parts of the phylogeny disjoint from the lineage of the fragmented genome.


Assuntos
Algoritmos , Classificação/métodos , Rearranjo Gênico/fisiologia , Genoma , Filogenia , Plantas/classificação , Plantas/genética
14.
BMC Genomics ; 17(Suppl 10): 782, 2016 11 11.
Artigo em Inglês | MEDLINE | ID: mdl-28185558

RESUMO

BACKGROUND: Of the approximately two hundred sequenced plant genomes, how many and which ones were sequenced motivated by strictly or largely scientific considerations, and how many by chiefly economic, in a wide sense, incentives? And how large a role does publication opportunity play? RESULTS: In an integration of multiple disparate databases and other sources of information, we collect and analyze data on the size (number of species) in the plant orders and families containing sequenced genomes, on the trade value of these species, and of all the same-family or same-order species, and on the publication priority within the family and order. These data are subjected to multiple regression and other statistical analyses. We find that despite the initial importance of model organisms, it is clearly economic considerations that outweigh others in the choice of genome to be sequenced. CONCLUSIONS: This has important implications for generalizations about plant genomes, since human choices of plants to harvest (and cultivate) will have incurred many biases with respect to phenotypic characteristics and hence of genomic properties, and recent genomic evolution will also have been affected by human agricultural practices.


Assuntos
Genoma de Planta , Genômica/economia , Plantas/genética , Agricultura/economia , Bases de Dados Genéticas , Sequenciamento de Nucleotídeos em Larga Escala , Plantas/classificação , Análise de Regressão , Análise de Sequência de DNA
15.
BMC Bioinformatics ; 16 Suppl 17: S9, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26680009

RESUMO

BACKGROUND: The loss of duplicate genes - fractionation - after whole genome doubling (WGD) is the subject to a debate as to whether it proceeds gene by gene or through deletion of multi-gene chromosomal segments. RESULTS: WGD produces two copies of every chromosome, namely two identical copies of a sequence of genes. We assume deletion events excise a geometrically distributed number of consecutive genes with mean µ ≥ 1, and these events can combine to produce single-copy runs of length l. If µ = 1, the process is gene-by-gene. If µ > 1, the process at least occasionally excises more than one gene at a time. In the latter case if deletions overlap, the later one simply extends the existing run of single-copy genes. We explore aspects of the predicted distribution of the lengths of single-copy regions analytically, but resort to simulations to show how observing run lengths l allows us to discriminate between the two hypotheses. CONCLUSIONS: Deletion run length distributions can discriminate between gene-by-gene fractionation and deletion of segments of geometrically distributed length, even if µ is only slightly larger than 1, as long as the genome is large enough and fractionation has not proceeded too far towards completion.


Assuntos
Duplicação Gênica , Genes Duplicados , Genoma , Simulação por Computador , Deleção de Genes , Dosagem de Genes
16.
BMC Genomics ; 16 Suppl 10: S8, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26449933

RESUMO

BACKGROUND: Following whole genome duplication (WGD), there is a compact distribution of gene similarities within the genome reflecting duplicate pairs of all the genes in the genome. With time, the distribution broadens and loses volume due to variable decay of duplicate gene similarity and to the process of duplicate gene loss. If there are two WGD, the older one becomes so reduced and broad that it merges with the tail of the distributions resulting from more recent events, and it becomes difficult to distinguish them. The goal of this paper is to advance statistical methods of identifying, or at least counting, the WGD events in the lineage of a given genome. METHODS: For a set of 15 angiosperm genomes, we analyze all 15 × 14 = 210 ordered pairs of target genome versus reference genome, using SynMap to find syntenic blocks. We consider all sets of B ≥ 2 syntenic blocks in the target genome that overlap in the reference genome as evidence of WGD activity in the target, whether it be one event or several. We hypothesize that in fitting an exponential function to the tail of the empirical distribution f (B) of block multiplicities, the size of the exponent will reflect the amount of WGD in the history of the target genome. RESULTS: By amalgamating the results from all reference genomes, a range of values of SynMap parameters, and alternative cutoff points for the tail, we find a clear pattern whereby multiple-WGD core eudicots have the smallest (negative) exponents, followed by core eudicots with only the single "γ" triplication in their history, followed by a non-core eudicot with a single WGD, followed by the monocots, with a basal angiosperm, the WGD-free Amborella having the largest exponent. CONCLUSION: The hypothesis that the exponent of the fit to the tail of the multiplicity distribution is a signature of the amount of WGD is verified, but there is also a clear complicating factor in the monocot clade, where a history of multiple WGD is not reflected in a small exponent.


Assuntos
Evolução Molecular , Genoma de Planta , Filogenia , Poliploidia , Duplicação Gênica , Magnoliopsida/genética
17.
BMC Genomics ; 15 Suppl 6: S3, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25571965

RESUMO

BACKGROUND: The breakpoint median in the set Sn of permutations on n terms is known to have some unusual behavior, especially if the input genomes are maximally different to each other. The mathematical study of the set of medians is complicated by the facts that breakpoint distance is not a metric but a pseudo-metric, and that it does not define a geodesic space. RESULTS: We introduce the notion of partial geodesic, or geodesic patch between two permutations, and show that if two permutations are medians, then every permutation on a geodesic patch between them is also a median. We also prove the conjecture that the input permutations themselves are medians.


Assuntos
Pontos de Quebra do Cromossomo , Genoma , Genômica , Modelos Genéticos , Algoritmos , Mutação
18.
BMC Genomics ; 15 Suppl 6: S1, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25572274

RESUMO

BACKGROUND: The breakpoint median for a set of k ≥ 3 random genomes tends to approach (any) one of these genomes ("corners") as genome length increases, although there are diminishing proportion of medians equidistant from all k ("medians in the middle"). Algorithms are likely to miss the latter, and this has consequences for the general case where input genomes share some or many gene adjacencies, where the tendency for the median to be closer to one input genome may be an artifact of the corner tendency. RESULTS: We present a simple sampling procedure for constructing a "near median" that represents a compromise among k random genomes and that has only a slightly greater breakpoint distance to all of them than the median does. We generalize to the realistic case where genomes share varying proportions of gene adjacencies. We present a supplementary sampling scheme that brings the constructed genome even closer to median status. CONCLUSIONS: Our approach is of particular use in the phylogenetic context where medians are repeatedly calculated at ancestral nodes, and where the corner effect prevents different parts of the phylogeny from communicating with each other.


Assuntos
Genoma , Genômica/métodos , Modelos Genéticos , Algoritmos
19.
BMC Genomics ; 15 Suppl 6: S19, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25573431

RESUMO

BACKGROUND: Previous work on whole genome doubling in plants established the importance of gene functional category in provoking or suppressing duplicate gene loss, or fractionation. Other studies, particularly in Paramecium have correlated levels of gene expression with vulnerability or resistance to duplicate loss. RESULTS: Here we analyze the simultaneous effect of function category and expression in two plant data sets, rosids and asterids. CONCLUSION: We demonstrate function category and expression level have independent effects, though expression does not play the dominant role it does in Paramecium.


Assuntos
Biologia Computacional , Regulação da Expressão Gênica de Plantas , Genômica , Plantas/genética , Biologia Computacional/métodos , Ontologia Genética , Genes de Plantas , Genoma de Planta , Genômica/métodos , Filogenia , Plantas/classificação
20.
BMC Genomics ; 15 Suppl 6: S8, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25572777

RESUMO

BACKGROUND: Chaining is a major problem in constructing gene families. RESULTS: We define a new kind of cluster on graphs with strong and weak edges: soft cliques with backbones (SCWiB). This differs from other definitions in how it controls the "chaining effect", by ensuring clusters satisfy a tolerant edge density criterion that takes into account cluster size. We implement algorithms for decomposing a graph of similarities into SCWiBs. We compare examples of output from SCWiB and the Markov Cluster Algorithm (MCL), and also compare some curated Arabidopsis thaliana gene families with the results of automatic clustering. We apply our method to 44 published angiosperm genomes with annotation, and discover that Amborella trichopoda is distinct from all the others in having substantially and systematically smaller proportions of moderate- and large-size gene families. CONCLUSIONS: We offer several possible evolutionary explanations for this result.


Assuntos
Flores/genética , Genes de Plantas , Modelos Genéticos , Família Multigênica , Plantas/genética , Algoritmos , Magnoliopsida/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA