Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
1.
Bull Math Biol ; 80(6): 1563-1577, 2018 06.
Article in English | MEDLINE | ID: mdl-29524097

ABSTRACT

Böcker and Dress (Adv Math 138:105-125, 1998) presented a 1-to-1 correspondence between symbolically dated rooted trees and symbolic ultrametrics. We consider the corresponding problem for unrooted trees. More precisely, given a tree T with leaf set X and a proper vertex coloring of its interior vertices, we can map every triple of three different leaves to the color of its median vertex. We characterize all ternary maps that can be obtained in this way in terms of 4- and 5-point conditions, and we show that the corresponding tree and its coloring can be reconstructed from a ternary map that satisfies those conditions. Further, we give an additional condition that characterizes whether the tree is binary, and we describe an algorithm that reconstructs general trees in a bottom-up fashion.


Subject(s)
Models, Theoretical , Phylogeny , Algorithms , Computational Biology , Mathematical Concepts
2.
Mol Biol Evol ; 30(5): 1206-17, 2013 May.
Article in English | MEDLINE | ID: mdl-23493256

ABSTRACT

Phylogenetic networks can model reticulate evolutionary events such as hybridization, recombination, and horizontal gene transfer. However, reconstructing such networks is not trivial. Popular character-based methods are computationally inefficient, whereas distance-based methods cannot guarantee reconstruction accuracy because pairwise genetic distances only reflect partial information about a reticulate phylogeny. To balance accuracy and computational efficiency, here we introduce a quartet-based method to construct a phylogenetic network from a multiple sequence alignment. Unlike distances that only reflect the relationship between a pair of taxa, quartets contain information on the relationships among four taxa; these quartets provide adequate capacity to infer a more accurate phylogenetic network. In applications to simulated and biological data sets, we demonstrate that this novel method is robust and effective in reconstructing reticulate evolutionary events and it has the potential to infer more accurate phylogenetic distances than other conventional phylogenetic network construction methods such as Neighbor-Joining, Neighbor-Net, and Split Decomposition. This method can be used in constructing phylogenetic networks from simple evolutionary events involving a few reticulate events to complex evolutionary histories involving a large number of reticulate events. A software called "Quartet-Net" is implemented and available at http://sysbio.cvm.msstate.edu/QuartetNet/.


Subject(s)
Evolution, Molecular , Models, Genetic , Software , Models, Theoretical , Phylogeny
3.
Nucleic Acids Res ; 40(Web Server issue): W123-6, 2012 Jul.
Article in English | MEDLINE | ID: mdl-22570412

ABSTRACT

An important task in biomedical research is identifying biomarkers that correlate with patient clinical data, and these biomarkers then provide a critical foundation for the diagnosis and treatment of disease. Conventionally, such an analysis is based on individual genes, but the results are often noisy and difficult to interpret. Using a biological network as the searching platform, network-based biomarkers are expected to be more robust and provide deep insights into the molecular mechanisms of disease. We have developed a novel bioinformatics web server for identifying network-based biomarkers that most correlate with patient survival data, SurvNet. The web server takes three input files: one biological network file, representing a gene regulatory or protein interaction network; one molecular profiling file, containing any type of gene- or protein-centred high-throughput biological data (e.g. microarray expression data or DNA methylation data); and one patient survival data file (e.g. patients' progression-free survival data). Given user-defined parameters, SurvNet will automatically search for subnetworks that most correlate with the observed patient survival data. As the output, SurvNet will generate a list of network biomarkers and display them through a user-friendly interface. SurvNet can be accessed at http://bioinformatics.mdanderson.org/main/SurvNet.


Subject(s)
Biomarkers/analysis , Software , Survival Analysis , Gene Regulatory Networks , Humans , Internet , Protein Interaction Mapping , Transcriptome
4.
BMC Bioinformatics ; 14 Suppl 14: S8, 2013.
Article in English | MEDLINE | ID: mdl-24266981

ABSTRACT

The advances in high throughput omics technologies have made it possible to characterize molecular interactions within and across various species. Alignments and comparison of molecular networks across species will help detect orthologs and conserved functional modules and provide insights on the evolutionary relationships of the compared species. However, such analyses are not trivial due to the complexity of network and high computational cost. Here we develop a mixture of global and local algorithm, BinAligner, for network alignments. Based on the hypotheses that the similarity between two vertices across networks would be context dependent and that the information from the edges and the structures of subnetworks can be more informative than vertices alone, two scoring schema, 1-neighborhood subnetwork and graphlet, were introduced to derive the scoring matrices between networks, besides the commonly used scoring scheme from vertices. Then the alignment problem is formulated as an assignment problem, which is solved by the combinatorial optimization algorithm, such as the Hungarian method. The proposed algorithm was applied and validated in aligning the protein-protein interaction network of Kaposi's sarcoma associated herpesvirus (KSHV) and that of varicella zoster virus (VZV). Interestingly, we identified several putative functional orthologous proteins with similar functions but very low sequence similarity between the two viruses. For example, KSHV open reading frame 56 (ORF56) and VZV ORF55 are helicase-primase subunits with sequence identity 14.6%, and KSHV ORF75 and VZV ORF44 are tegument proteins with sequence identity 15.3%. These functional pairs can not be identified if one restricts the alignment into orthologous protein pairs. In addition, BinAligner identified a conserved pathway between two viruses, which consists of 7 orthologous protein pairs and these proteins are connected by conserved links. This pathway might be crucial for virus packing and infection.


Subject(s)
Algorithms , Herpesvirus 3, Human/chemistry , Herpesvirus 8, Human/chemistry , Viral Proteins/chemistry , Herpesvirus 3, Human/genetics , Herpesvirus 8, Human/genetics , Open Reading Frames , Viral Proteins/genetics
5.
BMC Bioinformatics ; 12: 18, 2011 Jan 13.
Article in English | MEDLINE | ID: mdl-21226965

ABSTRACT

BACKGROUND: As one of the most widely used parsimony methods for ancestral reconstruction, the Fitch method minimizes the total number of hypothetical substitutions along all branches of a tree to explain the evolution of a character. Due to the extensive usage of this method, it has become a scientific endeavor in recent years to study the reconstruction accuracies of the Fitch method. However, most studies are restricted to 2-state evolutionary models and a study for higher-state models is needed since DNA sequences take the format of 4-state series and protein sequences even have 20 states. RESULTS: In this paper, the ambiguous and unambiguous reconstruction accuracy of the Fitch method are studied for N-state evolutionary models. Given an arbitrary phylogenetic tree, a recurrence system is first presented to calculate iteratively the two accuracies. As complete binary tree and comb-shaped tree are the two extremal evolutionary tree topologies according to balance, we focus on the reconstruction accuracies on these two topologies and analyze their asymptotic properties. Then, 1000 Yule trees with 1024 leaves are generated and analyzed to simulate real evolutionary scenarios. It is known that more taxa not necessarily increase the reconstruction accuracies under 2-state models. The result under N-state models is also tested. CONCLUSIONS: In a large tree with many leaves, the reconstruction accuracies of using all taxa are sometimes less than those of using a leaf subset under N-state models. For complete binary trees, there always exists an equilibrium interval [a, b] of conservation probability, in which the limiting ambiguous reconstruction accuracy equals to the probability of randomly picking a state. The value b decreases with the increase of the number of states, and it seems to converge. When the conservation probability is greater than b, the reconstruction accuracies of the Fitch method increase rapidly. The reconstruction accuracies on 1000 simulated Yule trees also exhibit similar behaviors. For comb-shaped trees, the limiting reconstruction accuracies of using all taxa are always less than or equal to those of using the nearest root-to-leaf path when the conservation probability is not less than 1/N. As a result, more taxa are suggested for ancestral reconstruction when the tree topology is balanced and the sequences are highly similar, and a few taxa close to the root are recommended otherwise.


Subject(s)
Algorithms , Evolution, Molecular , Phylogeny , Sequence Analysis, DNA , Sequence Analysis, Protein
6.
J Genet Genomics ; 48(3): 198-207, 2021 03 20.
Article in English | MEDLINE | ID: mdl-33593615

ABSTRACT

The human face is a heritable surface with many complex sensory organs. In recent years, many genetic loci associated with facial features have been reported in different populations, yet there is a lack of studies on the Han Chinese population. Here, we report a genome-wide association study of 3D normal human faces of 2,659 Han Chinese with autosegment phenotypes of facial morphology. We identify single-nucleotide polymorphisms (SNPs) encompassing four genomic regions showing significant associations with different facial regions, including SNPs in DENND1B associated with the chin, SNPs among PISRT1 associated with eyes, SNPs between DCHS2 and SFRP2 associated with the nose, and SNPs in VPS13B associated with the nose. We replicate 24 SNPs from previously reported genetic loci in different populations, whose candidate genes are DCHS2, SUPT3H, HOXD1, SOX9, PAX3, and EDAR. These results provide a more comprehensive understanding of the genetic basis of variation in human facial morphology.


Subject(s)
Genome-Wide Association Study , Polymorphism, Single Nucleotide , Genetic Loci , Genetic Predisposition to Disease , Phenotype
7.
Med Image Anal ; 47: 15-30, 2018 07.
Article in English | MEDLINE | ID: mdl-29656107

ABSTRACT

The identification of connexel-wise associations, which involves examining functional connectivities between pairwise voxels across the whole brain, is both statistically and computationally challenging. Although such a connexel-wise methodology has recently been adopted by brain-wide association studies (BWAS) to identify connectivity changes in several mental disorders, such as schizophrenia, autism and depression, the multiple correction and power analysis methods designed specifically for connexel-wise analysis are still lacking. Therefore, we herein report the development of a rigorous statistical framework for connexel-wise significance testing based on the Gaussian random field theory. It includes controlling the family-wise error rate (FWER) of multiple hypothesis testings using topological inference methods, and calculating power and sample size for a connexel-wise study. Our theoretical framework can control the false-positive rate accurately, as validated empirically using two resting-state fMRI datasets. Compared with Bonferroni correction and false discovery rate (FDR), it can reduce false-positive rate and increase statistical power by appropriately utilizing the spatial information of fMRI data. Importantly, our method bypasses the need of non-parametric permutation to correct for multiple comparison, thus, it can efficiently tackle large datasets with high resolution fMRI images. The utility of our method is shown in a case-control study. Our approach can identify altered functional connectivities in a major depression disorder dataset, whereas existing methods fail. A software package is available at https://github.com/weikanggong/BWAS.


Subject(s)
Connectome/classification , Image Processing, Computer-Assisted/methods , Magnetic Resonance Imaging/methods , Models, Statistical , False Positive Reactions , Healthy Volunteers , Humans , Linear Models , Software
8.
Math Biosci ; 208(2): 521-37, 2007 Aug.
Article in English | MEDLINE | ID: mdl-17303190

ABSTRACT

Closure operations are a useful device in both the theory and practice of tree reconstruction in biology and other areas of classification. These operations take a collection of trees (rooted or unrooted) that classify overlapping sets of objects at their leaves, and infer further tree-like relationships. In this paper we investigate closure operations on phylogenetic trees; both rooted and unrooted; as well as on X-splits, and in a general abstract setting. We derive a number of new results, particularly concerning the completeness (and incompleteness) and complexity of various types of closure rules.


Subject(s)
Phylogeny , Biological Evolution , Mathematics , Models, Genetic
9.
Otolaryngol Head Neck Surg ; 136(5): 726-33; discussion 734-5, 2007 May.
Article in English | MEDLINE | ID: mdl-17478205

ABSTRACT

OBJECTIVE: To assess and differentiate the health-related quality of life (HR-QoL) in patients with hereditary hemorrhagic telangiectasia (HHT). STUDY DESIGN AND SETTING: A prospective, open, cross-sectional questionnaire-based study (including the Short Form-36 Health Survey [SF-36]) performed by a tertiary care center. RESULTS: A total of 77 patients (36 females) were included. Except for one domain (bodily pain), the scores for all scales of the SF-36 were significantly reduced in comparison with normative data. The duration of epistaxis, the presence of hepatic involvement and gastrointestinal bleeding, and the number of visible telangiectases correlated with lower scores on several scales of the SF-36. Unexpectedly, the frequency of epistaxis did not correlate with any scale. CONCLUSIONS: The duration of epistaxis, liver involvement, gastrointestinal bleeding, and the number of visible telangiectases have a major influence on the HR-QoL in HHT whereby the frequency of epistaxis seems to play a minor role. SIGNIFICANCE: The data presented have an impact on therapeutic decisions, medical expert opinions, and research funding.


Subject(s)
Health Status , Quality of Life/psychology , Telangiectasia, Hereditary Hemorrhagic/psychology , Cross-Sectional Studies , Female , Humans , Male , Prospective Studies , Recurrence , Surveys and Questionnaires , Telangiectasia, Hereditary Hemorrhagic/epidemiology
10.
Nat Neurosci ; 20(6): 886-895, 2017 Jun.
Article in English | MEDLINE | ID: mdl-28414332

ABSTRACT

While human cognitive abilities are clearly unique, underlying changes in brain organization and function remain unresolved. Here we characterized the transcriptome of the cortical layers and adjacent white matter in the prefrontal cortexes of humans, chimpanzees and rhesus macaques using unsupervised sectioning followed by RNA sequencing. More than 20% of detected genes were expressed predominantly in one layer, yielding 2,320 human layer markers. While the bulk of the layer markers were conserved among species, 376 switched their expression to another layer in humans. By contrast, only 133 of such changes were detected in the chimpanzee brain, suggesting acceleration of cortical reorganization on the human evolutionary lineage. Immunohistochemistry experiments further showed that human-specific expression changes were not limited to neurons but affected a broad spectrum of cortical cell types. Thus, despite apparent histological conservation, human neocortical organization has undergone substantial changes affecting more than 5% of its transcriptome.


Subject(s)
Gene Expression Profiling , Macaca mulatta , Neocortex/metabolism , Pan troglodytes , Prefrontal Cortex/metabolism , White Matter/metabolism , Animals , Biological Evolution , Humans , Neocortex/anatomy & histology , Prefrontal Cortex/cytology , Species Specificity , Young Adult
11.
Comput Biol Chem ; 29(3): 196-203, 2005 Jun.
Article in English | MEDLINE | ID: mdl-15979039

ABSTRACT

The task of the quartet puzzling problem is to find a best-fitting binary X-tree for a finite n-set from confidence values for the 3n4 binary trees with exactly four leaves from X, its fitness being measured by the sum of the confidence values of all "induced" four-leaves subtrees. We describe a method for finding an exact solution of this problem by integer linear programming. Similar procedures can also be used for finding, e.g. best-fitting "circular" networks. A crucial problem in this context is, of course, how to obtain the input confidence values for the quartet trees. We propose to use inner products of rate-matrix diagonals calculated for pairs of taxa and present the trees resulting from applying our approach to two data sets of up to 36 mitochondrial sequences of mammals including an outgroup.

12.
PLoS One ; 10(4): e0123533, 2015.
Article in English | MEDLINE | ID: mdl-25881057

ABSTRACT

G protein-coupled receptors (GPCRs) form the largest family of membrane receptors in the human genome. Advances in membrane protein crystallization so far resulted in the determination of 24 receptors available as high-resolution atomic structures. We performed the first phylogenetic analysis of GPCRs based on the available set of GPCR structures. We present a new phylogenetic tree of known human rhodopsin-like GPCR sequences based on this structure set. We can distinguish the three separate classes of small-ligand binding GPCRs, peptide binding GPCRs, and olfactory receptors. Analyzing different structural subdomains, we found that small molecule binding receptors most likely have evolved from peptide receptor precursors, with a rhodopsin/S1PR1 ancestor, most likely an ancestral opsin, forming the link between both classes. A light-activated receptor therefore seems to be the origin of the small molecule hormone receptors of the central nervous system. We find hints for a common evolutionary path of both ligand binding site and central sodium/water binding site. Surprisingly, opioid receptors exhibit both a binding cavity and a central sodium/water binding site similar to the one of biogenic amine receptors instead of peptide receptors, making them seemingly prone to bind small molecule ligands, e.g. opiates. Our results give new insights into the relationship and the pharmacological properties of rhodopsin-like GPCRs.


Subject(s)
Phylogeny , Receptors, G-Protein-Coupled/chemistry , Receptors, G-Protein-Coupled/metabolism , Binding Sites , Crystallography, X-Ray , Evolution, Molecular , Humans , Ligands , Opsins/chemistry , Opsins/metabolism , Protein Structure, Tertiary , Receptors, Opioid/chemistry , Receptors, Opioid/metabolism , Rhodopsin/chemistry , Rhodopsin/metabolism , Sodium/metabolism
13.
BMC Syst Biol ; 8: 21, 2014 Feb 20.
Article in English | MEDLINE | ID: mdl-24555518

ABSTRACT

BACKGROUND: Phylogenetic networks are employed to visualize evolutionary relationships among a group of nucleotide sequences, genes or species when reticulate events like hybridization, recombination, reassortant and horizontal gene transfer are believed to be involved. In comparison to traditional distance-based methods, quartet-based methods consider more information in the reconstruction process and thus have the potential to be more accurate. RESULTS: We introduce QuartetSuite, which includes a set of new quartet-based methods, namely QuartetS, QuartetA, and QuartetM, to reconstruct phylogenetic networks from nucleotide sequences. We tested their performances and compared them with other popular methods on two simulated nucleotide sequence data sets: one generated from a tree topology and the other from a complicated evolutionary history containing three reticulate events. We further validated these methods to two real data sets: a bacterial data set consisting of seven concatenated genes of 36 bacterial species and an influenza data set related to recently emerging H7N9 low pathogenic avian influenza viruses in China. CONCLUSION: QuartetS, QuartetA, and QuartetM have the potential to accurately reconstruct evolutionary scenarios from simple branching trees to complicated networks containing many reticulate events. These methods could provide insights into the understanding of complicated biological evolutionary processes such as bacterial taxonomy and reassortant of influenza viruses.


Subject(s)
Computational Biology/methods , Phylogeny , Evolution, Molecular
14.
Article in English | MEDLINE | ID: mdl-23702551

ABSTRACT

Supertrees are a commonly used tool in phylogenetics to summarize collections of partial phylogenetic trees. As a generalization of supertrees, phylogenetic supernetworks allow, in addition, the visual representation of conflict between the trees that is not possible to observe with a single tree. Here, we introduce SuperQ, a new method for constructing such supernetworks (SuperQ is freely available at >www.uea.ac.uk/computing/superq.). It works by first breaking the input trees into quartet trees, and then stitching these together to form a special kind of phylogenetic network, called a split network. This stitching process is performed using an adaptation of the QNet method for split network reconstruction employing a novel approach to use the branch lengths from the input trees to estimate the branch lengths in the resulting network. Compared with previous supernetwork methods, SuperQ has the advantage of producing a planar network. We compare the performance of SuperQ to the Z-closure and Q-imputation supernetwork methods, and also present an analysis of some published data sets as an illustration of its applicability.


Subject(s)
Computational Biology/methods , Phylogeny , Software , Databases, Genetic , Genes, Fungal , Genes, Plant , Models, Genetic
15.
Article in English | MEDLINE | ID: mdl-19179702

ABSTRACT

With the number of sequenced genomes growing ever larger, it is now common practice to concatenate sequence alignments from several genomic loci as a first step to phylogenetic tree inference. However, as different loci may support different trees due to processes such as gene duplication and lineage sorting, it is important to better understand how commonly used phylogenetic inference methods behave on such "phylogenetic mixtures". Here we shall focus on how parsimony, one of the most popular methods for reconstructing phylogenetic trees, behaves for mixtures of two trees. In particular, we show that (i) the parsimony problem is NP-complete for mixtures of two trees, (ii) there are mixtures of two trees that have exponentially many (in the number of leaves) most parsimonious trees, and (iii) give an explicit description of the most parsimonious tree(s) and scores corresponding to the mixture of a pair of trees related by a single TBR operation.


Subject(s)
Computational Biology/methods , Genomics , Models, Genetic , Phylogeny , Databases, Genetic , Sequence Alignment , Sequence Analysis, DNA
16.
J Math Biol ; 56(4): 465-77, 2008 Apr.
Article in English | MEDLINE | ID: mdl-17891538

ABSTRACT

One of the main problems in phylogenetics is to develop systematic methods for constructing evolutionary or phylogenetic trees. For a set of species X, an edge-weighted phylogenetic X-tree or phylogenetic tree is a (graph theoretical) tree with leaf set X and no degree 2 vertices, together with a map assigning a non-negative length to each edge of the tree. Within phylogenetics, several methods have been proposed for constructing such trees that work by trying to piece together quartet trees on X, i.e. phylogenetic trees each having four leaves in X. Hence, it is of interest to characterise when a collection of quartet trees corresponds to a (unique) phylogenetic tree. Recently, Dress and Erdös provided such a characterisation for binary phylogenetic trees, that is, phylogenetic trees all of whose internal vertices have degree 3. Here we provide a new characterisation for arbitrary phylogenetic trees.


Subject(s)
Models, Genetic , Phylogeny , Animals , Decision Trees , Genetic Speciation , Humans , Neural Networks, Computer
17.
Algorithms Mol Biol ; 3: 7, 2008 Jun 24.
Article in English | MEDLINE | ID: mdl-18577231

ABSTRACT

MOTIVATION: Sequence-based methods for phylogenetic reconstruction from (nucleic acid) sequence data are notoriously plagued by two effects: homoplasies and alignment errors. Large evolutionary distances imply a large number of homoplastic sites. As most protein-coding genes show dramatic variations in substitution rates that are not uncorrelated across the sequence, this often leads to a patchwork pattern of (i) phylogenetically informative and (ii) effectively randomized regions. In highly variable regions, furthermore, alignment errors accumulate resulting in sometimes misleading signals in phylogenetic reconstruction. RESULTS: We present here a method that, based on assessing the distribution of character states along a cyclic ordering of the taxa, allows the identification of phylogenetically uninformative homoplastic sites in a multiple sequence alignment. Removal of these sites appears to improve the performance of phylogenetic reconstruction algorithms as measured by various indices of "tree quality". In particular, we obtain more stable trees due to the exclusion of phylogenetically incompatible sites that most likely represent strongly randomized characters. SOFTWARE: The computer program noisy implements this approach. It can be employed to improving phylogenetic reconstruction capability with quite a considerable success rate whenever (1) the average bootstrap support obtained from the original alignment is low, and (2) there are sufficiently many taxa in the data set - at least, say, 12 to 15 taxa. The software can be obtained under the GNU Public License from http://www.bioinf.uni-leipzig.de/Software/noisy/.

18.
Mol Biol Evol ; 24(2): 532-8, 2007 Feb.
Article in English | MEDLINE | ID: mdl-17119010

ABSTRACT

We present QNet, a method for constructing split networks from weighted quartet trees. QNet can be viewed as a quartet analogue of the distance-based Neighbor-Net (NNet) method for network construction. Just as NNet, QNet works by agglomeratively computing a collection of circular weighted splits of the taxa set which is subsequently represented by a planar split network. To illustrate the applicability of QNet, we apply it to a previously published Salmonella data set. We conclude that QNet can provide a useful alternative to NNet if distance data are not available or a character-based approach is preferred. Moreover, it can be used as an aid for determining when a quartet-based tree-building method may or may not be appropriate for a given data set. QNet is freely available for download.


Subject(s)
Models, Statistical , Phylogeny , Software , Algorithms , Least-Squares Analysis , Models, Genetic , Salmonella/genetics
19.
J Math Biol ; 51(2): 171-82, 2005 Aug.
Article in English | MEDLINE | ID: mdl-15868201

ABSTRACT

Evolutionary processes such as hybridisation, lateral gene transfer, and recombination are all key factors in shaping the structure of genes and genomes. However, since such processes are not always best represented by trees, there is now considerable interest in using more general networks instead. For example, in recent studies it has been shown that networks can be used to provide lower bounds on the number of recombination events and also for the number of lateral gene transfers that took place in the evolutionary history of a set of molecular sequences. In this paper we describe the theoretical performance of some related bounds that result when merging pairs of trees into networks.


Subject(s)
Evolution, Molecular , Hybridization, Genetic , Models, Genetic , Gene Transfer, Horizontal , Phylogeny
SELECTION OF CITATIONS
SEARCH DETAIL