RESUMO
An equidistant X-cactus is a type of rooted, arc-weighted, directed acyclic graph with leaf set X, that is used in biology to represent the evolutionary history of a set X of species. In this paper, we introduce and investigate the space of equidistant X-cactuses. This space contains, as a subset, the space of ultrametric trees on X that was introduced by Gavryushkin and Drummond. We show that equidistant-cactus space is a CAT(0)-metric space which implies, for example, that there are unique geodesic paths between points. As a key step to proving this, we present a combinatorial result concerning ranked rooted X-cactuses. In particular, we show that such graphs can be encoded in terms of a pairwise compatibility condition arising from a poset of collections of pairs of subsets of X that satisfy certain set-theoretic properties. As a corollary, we also obtain an encoding of ranked, rooted X-trees in terms of partitions of X, which provides an alternative proof that the space of ultrametric trees on X is CAT(0). We expect that our results will provide the basis for novel ways to perform statistical analyses on collections of equidistant X-cactuses, as well as new directions for defining and understanding spaces of more general, arc-weighted phylogenetic networks.
RESUMO
Convergent evolution is an important process in which independent species evolve similar features usually over a long period of time. It occurs with many different species across the tree of life, and is often caused by the fact that species have to adapt to similar environmental niches. In this paper, we introduce and study properties of a distance-based model for convergent evolution in which we assume that two ancestral species converge for a certain period of time within a collection of species that have otherwise evolved according to an evolutionary clock. Under these assumptions it follows that we obtain a distance on the collection that is a modification of an ultrametric distance arising from an equidistant phylogenetic tree. As well as characterising when this modified distance is a tree metric, we give conditions in terms of the model's parameters for when it is still possible to recover the underlying tree and also its height, even in case the modified distance is not a tree metric.
Assuntos
Evolução Molecular , Modelos Genéticos , FilogeniaRESUMO
The generalized circumradius of a set of points AâRd with respect to a convex body K equals the minimum value of λ≥0 such that a translate of λK contains A. Each choice of K gives a different function on the set of bounded subsets of Rd; we characterize which functions can arise in this way. Our characterization draws on the theory of diversities, a recently introduced generalization of metrics from functions on pairs to functions on finite subsets. We additionally investigate functions which arise by restricting the generalized circumradius to a finite subset of Rd. We obtain elegant characterizations in the case that K is a simplex or parallelotope.
RESUMO
This article presents metagenome-assembled genomes (MAGs) for both eukaryotic and prokaryotic organisms originating from the Arctic and Atlantic oceans, along with gene prediction and functional annotation for MAGs from both domains. Eleven samples from the chlorophyll-a maximum layer of the surface ocean were collected during two cruises in 2012; six from the Arctic in June-July on ARK-XXVII/1 (PS80), and five from the Atlantic in November on ANT-XXIX/1 (PS81). Sequencing and assembly was carried out by the Joint Genome Institute (JGI), who provide annotation of the assembled sequences, and 122 MAGs for prokaryotic organisms. A subsequent binning process identified 21 MAGs for eukaryotic organisms, mostly identified as Mamiellophyceae or Bacillariophyceae. The data for each MAG includes sequences in FASTA format, and tables of functional annotation of genes. For eukaryotic MAGs, transcript and protein sequences for predicted genes are available. A spreadsheet is provided summarising quality measures and taxonomic classifications for each MAG. These data provide draft genomes for uncultured marine microbes, including some of the first MAGs for polar eukaryotes, and can provide reference genetic data for these environments, or used in genomics-based comparison between environments.
RESUMO
Ice-binding proteins (IBPs) are a group of ecologically and biotechnologically relevant enzymes produced by psychrophilic organisms. Although putative IBPs containing the domain of unknown function (DUF) 3494 have been identified in many taxa of polar microbes, our knowledge of their genetic and structural diversity in natural microbial communities is limited. Here, we used samples from sea ice and sea water collected in the central Arctic Ocean as part of the MOSAiC expedition for metagenome sequencing and the subsequent analyses of metagenome-assembled genomes (MAGs). By linking structurally diverse IBPs to particular environments and potential functions, we reveal that IBP sequences are enriched in interior ice, have diverse genomic contexts and cluster taxonomically. Their diverse protein structures may be a consequence of domain shuffling, leading to variable combinations of protein domains in IBPs and probably reflecting the functional versatility required to thrive in the extreme and variable environment of the central Arctic Ocean.
Assuntos
Proteínas de Transporte , Células Procarióticas , Domínios Proteicos , Água do Mar , Oceanos e MaresRESUMO
A rooted phylogenetic network is a directed acyclic graph with a single root, whose sinks correspond to a set of species. As such networks are useful for representing the evolution of species that have undergone reticulate evolution, there has been great interest in developing the theory behind and algorithms for constructing them. However, unlike evolutionary trees, these networks can be highly non-planar, which can make them difficult to visualise and interpret. Here we investigate properties of planar rooted phylogenetic networks and algorithms for deciding whether or not rooted networks have certain special planarity properties. In particular, we introduce three natural subclasses of planar rooted phylogenetic networks and show that they form a hierarchy. In addition, for the well-known level- k networks, we show that level-1, -2, -3 networks are always outer, terminal, and upward planar, respectively, and that level-4 networks are not necessarily planar. Finally, we show that a regular network is terminal planar if and only if it is pyramidal. Our results make use of the highly developed field of planar digraphs, and we believe that the link between phylogenetic networks and planar graphs should prove useful in future for developing new approaches to both construct and visualise phylogenetic networks.
Assuntos
Algoritmos , FilogeniaRESUMO
BACKGROUND: Phytoplankton communities significantly contribute to global biogeochemical cycles of elements and underpin marine food webs. Although their uncultured genomic diversity has been estimated by planetary-scale metagenome sequencing and subsequent reconstruction of metagenome-assembled genomes (MAGs), this approach has yet to be applied for complex phytoplankton microbiomes from polar and non-polar oceans consisting of microbial eukaryotes and their associated prokaryotes. RESULTS: Here, we have assembled MAGs from chlorophyll a maximum layers in the surface of the Arctic and Atlantic Oceans enriched for species associations (microbiomes) with a focus on pico- and nanophytoplankton and their associated heterotrophic prokaryotes. From 679 Gbp and estimated 50 million genes in total, we recovered 143 MAGs of medium to high quality. Although there was a strict demarcation between Arctic and Atlantic MAGs, adjacent sampling stations in each ocean had 51-88% MAGs in common with most species associations between Prasinophytes and Proteobacteria. Phylogenetic placement revealed eukaryotic MAGs to be more diverse in the Arctic whereas prokaryotic MAGs were more diverse in the Atlantic Ocean. Approximately 70% of protein families were shared between Arctic and Atlantic MAGs for both prokaryotes and eukaryotes. However, eukaryotic MAGs had more protein families unique to the Arctic whereas prokaryotic MAGs had more families unique to the Atlantic. CONCLUSION: Our study provides a genomic context to complex phytoplankton microbiomes to reveal that their community structure was likely driven by significant differences in environmental conditions between the polar Arctic and warm surface waters of the tropical and subtropical Atlantic Ocean. Video Abstract.
Assuntos
Metagenoma , Microbiota , Oceano Atlântico , Clorofila A , Eucariotos/genética , Metagenoma/genética , Microbiota/genética , Filogenia , Fitoplâncton/genéticaRESUMO
Current microRNA (miRNA) prediction methods are generally based on annotation criteria that tend to miss potential functional miRNAs. Recently, new miRNA annotation criteria have been proposed that could lead to improvements in miRNA prediction methods in plants. Here, we investigate the effect of the new criteria on miRNA prediction in Arabidopsis thaliana and present a new degradome assisted functional miRNA prediction approach. We investigated the effect by applying the new criteria, and a more permissive criteria on miRNA prediction using existing miRNA prediction tools. We also developed an approach to miRNA prediction that is assisted by the functional information extracted from the analysis of degradome sequencing. We demonstrate the improved performance of degradome assisted miRNA prediction compared to unassisted prediction and evaluate the approach using miRNA differential expression analysis. We observe how the miRNA predictions fit under the different criteria and show a potential novel miRNA that has been missed within Arabidopsis thaliana. Additionally, we introduce a freely available software 'PAREfirst' that employs the degradome assisted approach. The study shows that some miRNAs could be missed due to the stringency of the former annotation criteria, and combining a degradome assisted approach with more permissive miRNA criteria can expand confident miRNA predictions.
Assuntos
Arabidopsis , MicroRNAs , MicroRNAs/genética , MicroRNAs/metabolismo , Arabidopsis/genética , Arabidopsis/metabolismo , Plantas/genética , Software , Análise de Sequência de RNA , Regulação da Expressão Gênica de Plantas , Sequenciamento de Nucleotídeos em Larga EscalaRESUMO
Eukaryotic phytoplankton are responsible for at least 20% of annual global carbon fixation. Their diversity and activity are shaped by interactions with prokaryotes as part of complex microbiomes. Although differences in their local species diversity have been estimated, we still have a limited understanding of environmental conditions responsible for compositional differences between local species communities on a large scale from pole to pole. Here, we show, based on pole-to-pole phytoplankton metatranscriptomes and microbial rDNA sequencing, that environmental differences between polar and non-polar upper oceans most strongly impact the large-scale spatial pattern of biodiversity and gene activity in algal microbiomes. The geographic differentiation of co-occurring microbes in algal microbiomes can be well explained by the latitudinal temperature gradient and associated break points in their beta diversity, with an average breakpoint at 14 °C ± 4.3, separating cold and warm upper oceans. As global warming impacts upper ocean temperatures, we project that break points of beta diversity move markedly pole-wards. Hence, abrupt regime shifts in algal microbiomes could be caused by anthropogenic climate change.
Assuntos
Variação Genética , Microalgas/genética , Microbiota/genética , Fitoplâncton/genética , Transcriptoma/genética , Regiões Antárticas , Regiões Árticas , Biodiversidade , Ciclo do Carbono , Mudança Climática , Ontologia Genética , Geografia , Aquecimento Global , Microalgas/classificação , Microalgas/crescimento & desenvolvimento , Oceanos e Mares , Fitoplâncton/classificação , Fitoplâncton/crescimento & desenvolvimento , RNA Ribossômico 16S/genética , RNA Ribossômico 18S/genética , Análise de Sequência de DNA/métodos , Especificidade da Espécie , TemperaturaRESUMO
Recently there has been considerable interest in the problem of finding a phylogenetic network with a minimum number of reticulation vertices which displays a given set of phylogenetic trees, that is, a network with minimum hybrid number. Such networks are useful for representing the evolution of species whose genomes have undergone processes such as lateral gene transfer and recombination that cannot be represented appropriately by a phylogenetic tree. Even so, as was recently pointed out in the literature, insisting that a network displays the set of trees can be an overly restrictive assumption when modeling certain evolutionary phenomena such as incomplete lineage sorting. In this paper, we thus consider the less restrictive notion of rigidly displaying which we introduce and study here. More specifically, we characterize when two trees can be rigidly displayed by a certain type of phylogenetic network called a temporal tree-child network in terms of fork-picking sequences. These are sequences of special subconfigurations of the two trees related to the well-studied cherry-picking sequences. We also show that, in case it exists, the rigid hybrid number for two phylogenetic trees is given by a minimum weight fork-picking sequence for the trees. Finally, we consider the relationship between the rigid hybrid number and three closely related numbers; the weak, beaded, and temporal hybrid numbers. In particular, we show that these numbers can all be different even for a fixed pair of trees, and also present an infinite family of pairs of trees which demonstrates that the difference between the rigid hybrid number and the temporal-hybrid number for two phylogenetic trees on the same set of n leaves can grow at least linearly with n.
Assuntos
Modelos Genéticos , Filogenia , Algoritmos , Humanos , Hibridização GenéticaRESUMO
The highly heterogeneous clinical course of human prostate cancer has prompted the development of multiple RNA biomarkers and diagnostic tools to predict outcome for individual patients. Biomarker discovery is often unstable with, for example, small changes in discovery dataset configuration resulting in large alterations in biomarker composition. Our hypothesis, which forms the basis of this current study, is that highly significant overlaps occurring between gene signatures obtained using entirely different approaches indicate genes fundamental for controlling cancer progression. For prostate cancer, we found two sets of signatures that had significant overlaps suggesting important genes (p < 10-34 for paired overlaps, hypergeometrical test). These overlapping signatures defined a core set of genes linking hormone signalling (HES6-AR), cell cycle progression (Prolaris) and a molecular subgroup of patients (PCS1) derived by Non Negative Matrix Factorization (NNMF) of control pathways, together designated as SIG-HES6. The second set (designated SIG-DESNT) consisted of the DESNT diagnostic signature and a second NNMF signature PCS3. Stratifications using SIG-HES6 (HES6, PCS1, Prolaris) and SIG-DESNT (DESNT) classifiers frequently detected the same individual high-risk cancers, indicating that the underlying mechanisms associated with SIG-HES6 and SIG-DESNT may act together to promote aggressive cancer development. We show that the use of combinations of a SIG-HES6 signature together with DESNT substantially increases the ability to predict poor outcome, and we propose a model for prostate cancer development involving co-operation between the SIG-HES6 and SIG-DESNT pathways that has implication for therapeutic design.
Assuntos
Biomarcadores Tumorais/genética , Neoplasias da Próstata , Transcriptoma , Biomarcadores Tumorais/análise , Estudos de Coortes , Conjuntos de Dados como Assunto/estatística & dados numéricos , Progressão da Doença , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Humanos , Masculino , Análise em Microsséries , Invasividade Neoplásica , Prognóstico , Neoplasias da Próstata/diagnóstico , Neoplasias da Próstata/genética , Neoplasias da Próstata/patologiaRESUMO
The classical gene and species tree reconciliation, used to infer the history of gene gain and loss explaining the evolution of gene families, assumes an independent evolution for each family. While this assumption is reasonable for genes that are far apart in the genome, it is not appropriate for genes grouped into syntenic blocks, which are more plausibly the result of a concerted evolution. Here, we introduce the Super-Reconciliation problem which consists in inferring a history of segmental duplication and loss events (involving a set of neighboring genes) leading to a set of present-day syntenies from a single ancestral one. In other words, we extend the traditional Duplication-Loss reconciliation problem of a single gene tree, to a set of trees, accounting for segmental duplications and losses. Existency of a Super-Reconciliation depends on individual gene tree consistency. In addition, ignoring rearrangements implies that existency also depends on gene order consistency. We first show that the problem of reconstructing a most parsimonious Super-Reconciliation, if any, is NP-hard and give an exact exponential-time algorithm to solve it. Alternatively, we show that accounting for rearrangements in the evolutionary model, but still only minimizing segmental duplication and loss events, leads to an exact polynomial-time algorithm. We finally assess time efficiency of the former exponential time algorithm for the Duplication-Loss model on simulated datasets, and give a proof of concept on the opioid receptor genes.
RESUMO
Natural antisense transcript-derived small interfering RNAs (nat-siRNAs) are a class of functional small RNA (sRNA) that have been found in both plant and animals kingdoms. In plants, these sRNAs have been shown to suppress the translation of messenger RNAs (mRNAs) by directing the RNA-induced silencing complex (RISC) to their sequence-specific mRNA target(s). Current computational tools for classification of nat-siRNAs are limited in number and can be computationally infeasible to use. In addition, current methods do not provide any indication of the function of the predicted nat-siRNAs. Here, we present a new software pipeline, called NATpare, for prediction and functional analysis of nat-siRNAs using sRNA and degradome sequencing data. Based on our benchmarking in multiple plant species, NATpare substantially reduces the time required to perform prediction with minimal resource requirements allowing for comprehensive analysis of nat-siRNAs in larger and more complex organisms for the first time. We then exemplify the use of NATpare by identifying tissue and stress specific nat-siRNAs in multiple Arabidopsis thaliana datasets.
Assuntos
RNA de Plantas/genética , RNA Interferente Pequeno/química , Análise de Sequência de RNA/métodos , Software , Arabidopsis , Interferência de RNA , RNA de Plantas/metabolismo , RNA Interferente Pequeno/genética , RNA Interferente Pequeno/metabolismoRESUMO
BACKGROUND: Unsupervised learning methods, such as Hierarchical Cluster Analysis, are commonly used for the analysis of genomic platform data. Unfortunately, such approaches ignore the well-documented heterogeneous composition of prostate cancer samples. Our aim is to use more sophisticated analytical approaches to deconvolute the structure of prostate cancer transcriptome data, providing novel clinically actionable information for this disease. METHODS: We apply an unsupervised model called Latent Process Decomposition (LPD), which can handle heterogeneity within individual cancer samples, to genome-wide expression data from eight prostate cancer clinical series, including 1,785 malignant samples with the clinical endpoints of PSA failure and metastasis. RESULTS: We show that PSA failure is correlated with the level of an expression signature called DESNT (HR = 1.52, 95% CI = [1.36, 1.7], P = 9.0 × 10-14, Cox model), and that patients with a majority DESNT signature have an increased metastatic risk (X2 test, P = 0.0017, and P = 0.0019). In addition, we develop a stratification framework that incorporates DESNT and identifies three novel molecular subtypes of prostate cancer. CONCLUSIONS: These results highlight the importance of using more complex approaches for the analysis of genomic data, may assist drug targeting, and have allowed the construction of a nomogram combining DESNT with other clinical factors for use in clinical management.
Assuntos
Biomarcadores Tumorais/sangue , Perfilação da Expressão Gênica/estatística & dados numéricos , Neoplasias da Próstata/genética , Transcriptoma/genética , Regulação Neoplásica da Expressão Gênica/genética , Genômica/estatística & dados numéricos , Humanos , Estimativa de Kaplan-Meier , Masculino , Pessoa de Meia-Idade , Prognóstico , Intervalo Livre de Progressão , Modelos de Riscos Proporcionais , Antígeno Prostático Específico/sangue , Neoplasias da Próstata/sangue , Neoplasias da Próstata/patologia , Medição de Risco , Fatores de RiscoRESUMO
MicroRNAs (miRNAs) are short, non-coding RNAs that modulate the translation-rate of messenger RNAs (mRNAs) by directing the RNA-induced silencing complex to sequence-specific targets. In plants, this typically results in cleavage and subsequent degradation of the mRNA. Degradome sequencing is a high-throughput technique developed to capture cleaved mRNA fragments and thus can be used to support miRNA target prediction. The current criteria used for miRNA target prediction were inferred on a limited number of experimentally validated A. thaliana interactions and were adapted to fit these specific interactions; thus, these fixed criteria may not be optimal across all datasets (organisms, tissues or treatments). We present a new tool, PAREameters, for inferring targeting criteria from small RNA and degradome sequencing datasets. We evaluate its performance using a more extensive set of experimentally validated interactions in multiple A. thaliana datasets. We also perform comprehensive analyses to highlight and quantify the differences between subsets of miRNA-mRNA interactions in model and non-model organisms. Our results show increased sensitivity in A. thaliana when using the PAREameters inferred criteria and that using data-driven criteria enables the identification of additional interactions that further our understanding of the RNA silencing pathway in both model and non-model organisms.
Assuntos
Arabidopsis/genética , Biologia Computacional/métodos , Regulação da Expressão Gênica de Plantas , MicroRNAs/genética , RNA Mensageiro/genética , RNA de Plantas/genética , Software , Arabidopsis/metabolismo , Sequência de Bases , Conjuntos de Dados como Assunto , Flores/genética , Flores/metabolismo , Sequenciamento de Nucleotídeos em Larga Escala , MicroRNAs/metabolismo , Folhas de Planta/genética , Folhas de Planta/metabolismo , Clivagem do RNA , RNA Mensageiro/metabolismo , RNA de Plantas/metabolismo , Sensibilidade e Especificidade , Análise de Sequência de RNA , TranscriptomaRESUMO
Denitrification is one of the key processes of the global nitrogen (N) cycle driven by bacteria. It has been widely known for more than 100 years as a process by which the biogeochemical N-cycle is balanced. To study this process, we develop an individual-based model called INDISIM-Denitrification. The model embeds a thermodynamic model for bacterial yield prediction inside the individual-based model INDISIM and is designed to simulate in aerobic and anaerobic conditions the cell growth kinetics of denitrifying bacteria. INDISIM-Denitrification simulates a bioreactor that contains a culture medium with succinate as a carbon source, ammonium as nitrogen source and various electron acceptors. To implement INDISIM-Denitrification, the individual-based model INDISIM was used to give sub-models for nutrient uptake, stirring and reproduction cycle. Using a thermodynamic approach, the denitrification pathway, cellular maintenance and individual mass degradation were modeled using microbial metabolic reactions. These equations are the basis of the sub-models for metabolic maintenance, individual mass synthesis and reducing internal cytotoxic products. The model was implemented in the open-access platform NetLogo. INDISIM-Denitrification is validated using a set of experimental data of two denitrifying bacteria in two different experimental conditions. This provides an interactive tool to study the denitrification process carried out by any denitrifying bacterium since INDISIM-Denitrification allows changes in the microbial empirical formula and in the energy-transfer-efficiency used to represent the metabolic pathways involved in the denitrification process. The simulator can be obtained from the authors on request.
Assuntos
Desnitrificação , Compostos de Amônio/metabolismo , Bactérias/metabolismo , Reatores Biológicos/microbiologia , Carbono/metabolismo , Nitrogênio/metabolismo , TermodinâmicaRESUMO
Phylogenomics commonly aims to construct evolutionary trees from genomic sequence information. One way to approach this problem is to first estimate event-labeled gene trees (i.e., rooted trees whose non-leaf vertices are labeled by speciation or gene duplication events), and to then look for a species tree which can be reconciled with this tree through a reconciliation map between the trees. In practice, however, it can happen that there is no such map from a given event-labeled tree to any species tree. An important situation where this might arise is where the species evolution is better represented by a network instead of a tree. In this paper, we therefore consider the problem of reconciling event-labeled trees with species networks. In particular, we prove that any event-labeled gene tree can be reconciled with some network and that, under certain mild assumptions on the gene tree, the network can even be assumed to be multi-arc free. To prove this result, we show that we can always reconcile the gene tree with some multi-labeled (MUL-)tree, which can then be "folded up" to produce the desired reconciliation and network. In addition, we study the interplay between reconciliation maps from event-labeled gene trees to MUL-trees and networks. Our results could be useful for understanding how genomes have evolved after undergoing complex evolutionary events such as polyploidy.
Assuntos
Evolução Molecular , Redes Reguladoras de Genes , Modelos Genéticos , Filogenia , Algoritmos , Duplicação Gênica , Especiação Genética , Conceitos MatemáticosRESUMO
Network reconstruction lies at the heart of phylogenetic research. Two well-studied classes of phylogenetic networks include tree-child networks and level-k networks. In a tree-child network, every non-leaf node has a child that is a tree node or a leaf. In a level-k network, the maximum number of reticulations contained in a biconnected component is k. Here, we show that level-k tree-child networks are encoded by their reticulate-edge-deleted subnetworks, which are subnetworks obtained by deleting a single reticulation edge, if [Formula: see text]. Following this, we provide a polynomial-time algorithm for uniquely reconstructing such networks from their reticulate-edge-deleted subnetworks. Moreover, we show that this can even be done when considering subnetworks obtained by deleting one reticulation edge from each biconnected component with k reticulations.
Assuntos
Algoritmos , Filogenia , Biologia Computacional , Evolução Molecular , Conceitos Matemáticos , Modelos GenéticosRESUMO
Introgression is an evolutionary process which provides an important source of innovation for evolution. Although various methods have been used to detect introgression, very few methods are currently available for constructing evolutionary histories involving introgression. In this article, we propose a new method for constructing such evolutionary histories whose starting point is a species forest (consisting of a collection of lineage trees, usually arising as a collection of clades or monophyletic groups in a species tree), and a gene tree for a specific allele of interest, or allele tree for short. Our method is based on representing introgression in terms of a certain "overlay" of the allele tree over the lineage trees, called an overlaid species forest (OSF). OSFs are similar to phylogenetic networks although a key difference is that they typically have multiple roots because each monophyletic group in the species tree has a different point of origin. Employing a new model for introgression, we derive an efficient algorithm for building OSFs called OSF-Builder that is guaranteed to return an optimal OSF in the sense that the number of potential introgression events is minimized. As well as using simulations to assess the performance of OSF-Builder, we illustrate its use on a butterfly data set in which introgression has been previously inferred. The OSF-Builder software is available for download from https://www.uea.ac.uk/computing/software/OSF-Builder.
Assuntos
Evolução Biológica , Classificação/métodos , SoftwareRESUMO
Given a collection [Formula: see text] of subsets of a finite set X, we say that [Formula: see text] is phylogenetically flexible if, for any collection R of rooted phylogenetic trees whose leaf sets comprise the collection [Formula: see text], R is compatible (i.e. there is a rooted phylogenetic X-tree that displays each tree in R). We show that [Formula: see text] is phylogenetically flexible if and only if it satisfies a Hall-type inequality condition of being 'slim'. Using submodularity arguments, we show that there is a polynomial-time algorithm for determining whether or not [Formula: see text] is slim. This 'slim' condition reduces to a simpler inequality in the case where all of the sets in [Formula: see text] have size 3, a property we call 'thin'. Thin sets were recently shown to be equivalent to the existence of an (unrooted) tree for which the median function provides an injective mapping to its vertex set; we show here that the unrooted tree in this representation can always be chosen to be a caterpillar tree. We also characterise when a collection [Formula: see text] of subsets of size 2 is thin (in terms of the flexibility of total orders rather than phylogenies) and show that this holds if and only if an associated bipartite graph is a forest. The significance of our results for phylogenetics is in providing precise and efficiently verifiable conditions under which supertree methods that require consistent inputs of trees can be applied to any input trees on given subsets of species.