Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
Más filtros













Base de datos
Intervalo de año de publicación
1.
Nature ; 629(8013): 851-860, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38560995

RESUMEN

Despite tremendous efforts in the past decades, relationships among main avian lineages remain heavily debated without a clear resolution. Discrepancies have been attributed to diversity of species sampled, phylogenetic method and the choice of genomic regions1-3. Here we address these issues by analysing the genomes of 363 bird species4 (218 taxonomic families, 92% of total). Using intergenic regions and coalescent methods, we present a well-supported tree but also a marked degree of discordance. The tree confirms that Neoaves experienced rapid radiation at or near the Cretaceous-Palaeogene boundary. Sufficient loci rather than extensive taxon sampling were more effective in resolving difficult nodes. Remaining recalcitrant nodes involve species that are a challenge to model due to either extreme DNA composition, variable substitution rates, incomplete lineage sorting or complex evolutionary events such as ancient hybridization. Assessment of the effects of different genomic partitions showed high heterogeneity across the genome. We discovered sharp increases in effective population size, substitution rates and relative brain size following the Cretaceous-Palaeogene extinction event, supporting the hypothesis that emerging ecological opportunities catalysed the diversification of modern birds. The resulting phylogenetic estimate offers fresh insights into the rapid radiation of modern birds and provides a taxon-rich backbone tree for future comparative studies.


Asunto(s)
Aves , Evolución Molecular , Genoma , Filogenia , Animales , Aves/genética , Aves/clasificación , Aves/anatomía & histología , Encéfalo/anatomía & histología , Extinción Biológica , Genoma/genética , Genómica , Densidad de Población , Masculino , Femenino
2.
Nanomaterials (Basel) ; 13(22)2023 Nov 14.
Artículo en Inglés | MEDLINE | ID: mdl-37999299

RESUMEN

La2O3 nanoparticles stabilized on carbon nanoflake (CNF) matrix were synthesized and graphitized to produce core-shell structures La2O3/CNFs@C. Further oxidation of these structures by nitric acid vapors for 1, 3 or 6 h was performed, and surface-oxidized particles La2O3/CNFs@C_x (x = 1, 3, 6) were produced. Bulk and surface compositions of La2O3/CNFs@C and La2O3/CNFs@C_x were investigated by thermogravimetric analysis and X-ray photoelectron spectroscopy. With increasing the duration of oxidation, the oxygen and La2O3 content in the La2O3/CNFs@C_x samples increased. The electronic structures of samples were assessed by electron paramagnetic resonance. Two paramagnetic centers were associated with unpaired localized and mobile electrons and were registered in all samples. The correlation between bulk and surface compositions of the samples and their electronic structures was investigated for the first time. The impact of the ratio between sp2- and sp3-hybridized C atoms, the number and nature of oxygen-containing groups on the surface and the presence and proportion of coordinated La atoms on the EPR spectra was demonstrated.

3.
Bioinform Adv ; 3(1): vbad124, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37750068

RESUMEN

Summary: Maximum likelihood (ML) is a widely used phylogenetic inference method. ML implementations heavily rely on numerical optimization routines that use internal numerical thresholds to determine convergence. We systematically analyze the impact of these threshold settings on the log-likelihood and runtimes for ML tree inferences with RAxML-NG, IQ-TREE, and FastTree on empirical datasets. We provide empirical evidence that we can substantially accelerate tree inferences with RAxML-NG and IQ-TREE by changing the default values of two such numerical thresholds. At the same time, altering these settings does not significantly impact the quality of the inferred trees. We further show that increasing both thresholds accelerates the RAxML-NG bootstrap without influencing the resulting support values. For RAxML-NG, increasing the likelihood thresholds ϵLnL and ϵbrlen to 10 and 103, respectively, results in an average tree inference speedup of 1.9 ± 0.6 on Data collection 1, 1.8 ± 1.1 on Data collection 2, and 1.9 ± 0.8 on Data collection 2 for the RAxML-NG bootstrap compared to the runtime under the current default setting. Increasing the likelihood threshold ϵLnL to 10 in IQ-TREE results in an average tree inference speedup of 1.3 ± 0.4 on Data collection 1 and 1.3 ± 0.9 on Data collection 2. Availability and implementation: All MSAs we used for our analyses, as well as all results, are available for download at https://cme.h-its.org/exelixis/material/freeLunch_data.tar.gz. Our data generation scripts are available at https://github.com/tschuelia/ml-numerical-analysis.

4.
Pharmaceutics ; 15(1)2023 Jan 12.
Artículo en Inglés | MEDLINE | ID: mdl-36678902

RESUMEN

A series of nanoparticles (NPs) with a hydrodynamic radius from 20 to 100 nm in PBS was developed over the solubilization of hydrophobic dye methyl pheophorbide a (chlorin e6 derivative) by amphiphilic copolymers of N-vinylpyrrolidone with (di)methacrylates. Photophysical properties and biological activity of the NPs aqueous solution were studied. It was found that the dye encapsulated in the copolymers is in an aggregated state. However, its aggregation degree decreases sharply, and singlet oxygen quantum yield and the fluorescence signal increase upon the interaction of these NPs with model biological membranes-liposomes or components of a tissue homogenate. The phototoxic effect of NPs in HeLa cells exceeds by 1.5-2 times that of the reference dye chlorin e6 trisodium salt-one of the most effective photosensitizers used in clinical practice. It could be explained by the effective release of the hydrophobic photosensitizer from the NPs into biological structures. The demonstrated approach can be used not only for the encapsulation of hydrophobic photosensitizers for PDT but also for other drugs, and N-vinylpyrrolidone amphiphilic copolymers show promising potential as a modern platform for the design of targeted delivery vehicles.

5.
Small ; 18(47): e2203555, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-36192153

RESUMEN

Metallic barcode nanowires (BNWs) composed of repeating heterogeneous segments fabricated by template-assisted electrodeposition can offer extended functionality in magnetic, electrical, mechanical, and biomedical applications. The authors consider such nanostructures as a 3D system of magnetically interacting elements with magnetic behavior strongly affected by complex magnetostatic interactions. This study discusses the influence of geometrical parameters of segments on the character of their interactions and the overall magnetic behavior of the array of BNWs having alternating magnetization, because the Fe and Au segments are made of Fe-Au alloys with high and low magnetizations. By controlling the applied current densities and the elapsed time in the electrodeposition, the dimension of the Fe-Au BNWs can be regulated. This study reveals that the influence of the length of magnetically weak Au segments on the interaction field between nanowires is different for samples with magnetically strong 100 and 200 nm long Fe segments using the first-order reversal curve (FORC) diagram method. With the help of micromagnetic simulations, three types of magnetostatic interactions in the BNW arrays are discovered and analy. This study demonstrates that the dominating type of interaction depends on the geometric parameters of the Fe and Au segments and the interwire and intrawire distances.


Asunto(s)
Nanoestructuras , Nanocables , Nanocables/química , Nanoestructuras/química , Galvanoplastia/métodos , Magnetismo
6.
Bioinformatics ; 38(15): 3725-3733, 2022 08 02.
Artículo en Inglés | MEDLINE | ID: mdl-35713506

RESUMEN

MOTIVATION: Phylogenetic networks can represent non-treelike evolutionary scenarios. Current, actively developed approaches for phylogenetic network inference jointly account for non-treelike evolution and incomplete lineage sorting (ILS). Unfortunately, this induces a very high computational complexity and current tools can only analyze small datasets. RESULTS: We present NetRAX, a tool for maximum likelihood (ML) inference of phylogenetic networks in the absence of ILS. Our tool leverages state-of-the-art methods for efficiently computing the phylogenetic likelihood function on trees, and extends them to phylogenetic networks via the notion of 'displayed trees'. NetRAX can infer ML phylogenetic networks from partitioned multiple sequence alignments and returns the inferred networks in Extended Newick format. On simulated data, our results show a very low relative difference in Bayesian Information Criterion (BIC) score and a near-zero unrooted softwired cluster distance to the true, simulated networks. With NetRAX, a network inference on a partitioned alignment with 8000 sites, 30 taxa and 3 reticulations completes within a few minutes on a standard laptop. AVAILABILITY AND IMPLEMENTATION: Our implementation is available under the GNU General Public License v3.0 at https://github.com/lutteropp/NetRAX. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Filogenia , Teorema de Bayes , Alineación de Secuencia , Funciones de Verosimilitud
7.
Genome Biol ; 23(1): 37, 2022 01 26.
Artículo en Inglés | MEDLINE | ID: mdl-35081992

RESUMEN

We introduce CellPhy, a maximum likelihood framework for inferring phylogenetic trees from somatic single-cell single-nucleotide variants. CellPhy leverages a finite-site Markov genotype model with 16 diploid states and considers amplification error and allelic dropout. We implement CellPhy into RAxML-NG, a widely used phylogenetic inference package that provides statistical confidence measurements and scales well on large datasets with hundreds or thousands of cells. Comprehensive simulations suggest that CellPhy is more robust to single-cell genomics errors and outperforms state-of-the-art methods under realistic scenarios, both in accuracy and speed. CellPhy is freely available at https://github.com/amkozlov/cellphy .


Asunto(s)
Algoritmos , Programas Informáticos , Genómica/métodos , Genotipo , Filogenia
8.
Bioinformatics ; 37(22): 4056-4063, 2021 11 18.
Artículo en Inglés | MEDLINE | ID: mdl-34037680

RESUMEN

MOTIVATION: Phylogenetic trees are now routinely inferred on large scale high performance computing systems with thousands of cores as the parallel scalability of phylogenetic inference tools has improved over the past years to cope with the molecular data avalanche. Thus, the parallel fault tolerance of phylogenetic inference tools has become a relevant challenge. To this end, we explore parallel fault tolerance mechanisms and algorithms, the software modifications required and the performance penalties induced via enabling parallel fault tolerance by example of RAxML-NG, the successor of the widely used RAxML tool for maximum likelihood-based phylogenetic tree inference. RESULTS: We find that the slowdown induced by the necessary additional recovery mechanisms in RAxML-NG is on average 1.00 ± 0.04. The overall slowdown by using these recovery mechanisms in conjunction with a fault-tolerant Message Passing Interface implementation amounts to on average 1.7 ± 0.6 for large empirical datasets. Via failure simulations, we show that RAxML-NG can successfully recover from multiple simultaneous failures, subsequent failures, failures during recovery and failures during checkpointing. Recoveries are automatic and transparent to the user. AVAILABILITY AND IMPLEMENTATION: The modified fault-tolerant RAxML-NG code is available under GNU GPL at https://github.com/lukashuebner/ft-raxml-ng. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Filogenia , Interfaz Usuario-Computador , Algoritmos , Programas Informáticos
9.
Mol Biol Evol ; 38(5): 1777-1791, 2021 05 04.
Artículo en Inglés | MEDLINE | ID: mdl-33316067

RESUMEN

Numerous studies covering some aspects of SARS-CoV-2 data analyses are being published on a daily basis, including a regularly updated phylogeny on nextstrain.org. Here, we review the difficulties of inferring reliable phylogenies by example of a data snapshot comprising a quality-filtered subset of 8,736 out of all 16,453 virus sequences available on May 5, 2020 from gisaid.org. We find that it is difficult to infer a reliable phylogeny on these data due to the large number of sequences in conjunction with the low number of mutations. We further find that rooting the inferred phylogeny with some degree of confidence either via the bat and pangolin outgroups or by applying novel computational methods on the ingroup phylogeny does not appear to be credible. Finally, an automatic classification of the current sequences into subclasses using the mPTP tool for molecular species delimitation is also, as might be expected, not possible, as the sequences are too closely related. We conclude that, although the application of phylogenetic methods to disentangle the evolution and spread of COVID-19 provides some insight, results of phylogenetic analyses, in particular those conducted under the default settings of current phylogenetic inference tools, as well as downstream analyses on the inferred phylogenies, should be considered and interpreted with extreme caution.


Asunto(s)
COVID-19/genética , Evolución Molecular , Genoma Viral , Mutación , Filogenia , SARS-CoV-2/genética , Humanos
10.
Mol Biol Evol ; 37(9): 2763-2774, 2020 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-32502238

RESUMEN

Inferring phylogenetic trees for individual homologous gene families is difficult because alignments are often too short, and thus contain insufficient signal, while substitution models inevitably fail to capture the complexity of the evolutionary processes. To overcome these challenges, species-tree-aware methods also leverage information from a putative species tree. However, only few methods are available that implement a full likelihood framework or account for horizontal gene transfers. Furthermore, these methods often require expensive data preprocessing (e.g., computing bootstrap trees) and rely on approximations and heuristics that limit the degree of tree space exploration. Here, we present GeneRax, the first maximum likelihood species-tree-aware phylogenetic inference software. It simultaneously accounts for substitutions at the sequence level as well as gene level events, such as duplication, transfer, and loss relying on established maximum likelihood optimization algorithms. GeneRax can infer rooted phylogenetic trees for multiple gene families, directly from the per-gene sequence alignments and a rooted, yet undated, species tree. We show that compared with competing tools, on simulated data GeneRax infers trees that are the closest to the true tree in 90% of the simulations in terms of relative Robinson-Foulds distance. On empirical data sets, GeneRax is the fastest among all tested methods when starting from aligned sequences, and it infers trees with the highest likelihood score, based on our model. GeneRax completed tree inferences and reconciliations for 1,099 Cyanobacteria families in 8 min on 512 CPU cores. Thus, its parallelization scheme enables large-scale analyses. GeneRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax (last accessed June 17, 2020).


Asunto(s)
Duplicación de Gen , Técnicas Genéticas , Filogenia , Programas Informáticos , Cianobacterias/genética , Eliminación de Gen , Transferencia de Gen Horizontal
11.
Genome Biol ; 21(1): 31, 2020 02 07.
Artículo en Inglés | MEDLINE | ID: mdl-32033589

RESUMEN

The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands-or even millions-of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years.


Asunto(s)
Ciencia de los Datos/métodos , Genómica/métodos , RNA-Seq/métodos , Análisis de la Célula Individual/métodos , Animales , Humanos
12.
Mol Biol Evol ; 37(1): 291-294, 2020 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-31432070

RESUMEN

ModelTest-NG is a reimplementation from scratch of jModelTest and ProtTest, two popular tools for selecting the best-fit nucleotide and amino acid substitution models, respectively. ModelTest-NG is one to two orders of magnitude faster than jModelTest and ProtTest but equally accurate and introduces several new features, such as ascertainment bias correction, mixture, and free-rate models, or the automatic processing of single partitions. ModelTest-NG is available under a GNU GPL3 license at https://github.com/ddarriba/modeltest , last accessed September 2, 2019.


Asunto(s)
Sustitución de Aminoácidos , Evolución Molecular , Técnicas Genéticas , Modelos Genéticos , Programas Informáticos
13.
Bioinformatics ; 36(7): 2280-2281, 2020 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-31755898

RESUMEN

MOTIVATION: Recently, Lemoine et al. suggested the transfer bootstrap expectation (TBE) branch support metric as an alternative to classical phylogenetic bootstrap support for taxon-rich datasets. However, the original TBE implementation in the booster tool is compute- and memory-intensive. RESULTS: We developed a fast and memory-efficient TBE implementation. We improve upon the original algorithm by Lemoine et al. via several algorithmic and technical optimizations. On empirical as well as on random tree sets with varying taxon counts, our implementation is up to 480 times faster than booster. Furthermore, it only requires memory that is linear in the number of taxa, which leads to 10× to 40× memory savings compared with booster. AVAILABILITY AND IMPLEMENTATION: Our implementation has been partially integrated into pll-modules and RAxML-NG and is available under the GNU Affero General Public License v3.0 at https://github.com/ddarriba/pll-modules and https://github.com/amkozlov/raxml-ng. The parallel version that also computes additional TBE-related statistics is available at: https://github.com/lutteropp/raxml-ng/tree/tbe. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Programas Informáticos , Filogenia
14.
Mol Ecol Resour ; 20(2): 429-443, 2020 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-31705734

RESUMEN

High-throughput DNA metabarcoding of amplicon sizes below 500 bp has revolutionized the analysis of environmental microbial diversity. However, these short regions contain limited phylogenetic signal, which makes it impractical to use environmental DNA in full phylogenetic inferences. This lesser phylogenetic resolution of short amplicons may be overcome by new long-read sequencing technologies. To test this idea, we amplified soil DNA and used PacBio Circular Consensus Sequencing (CCS) to obtain an ~4500-bp region spanning most of the eukaryotic small subunit (18S) and large subunit (28S) ribosomal DNA genes. We first treated the CCS reads with a novel curation workflow, generating 650 high-quality operational taxonomic units (OTUs) containing the physically linked 18S and 28S regions. To assign taxonomy to these OTUs, we developed a phylogeny-aware approach based on the 18S region that showed greater accuracy and sensitivity than similarity-based methods. The taxonomically annotated OTUs were then combined with available 18S and 28S reference sequences to infer a well-resolved phylogeny spanning all major groups of eukaryotes, allowing us to accurately derive the evolutionary origin of environmental diversity. A total of 1,019 sequences were included, of which a majority (58%) corresponded to the new long environmental OTUs. The long reads also allowed us to directly investigate the relationships among environmental sequences themselves, which represents a key advantage over the placement of short reads on a reference phylogeny. Together, our results show that long amplicons can be treated in a full phylogenetic framework to provide greater taxonomic resolution and a robust evolutionary perspective to environmental DNA.


Asunto(s)
Eucariontes/clasificación , Eucariontes/genética , Eucariontes/aislamiento & purificación , Filogenia , Biodiversidad , Código de Barras del ADN Taxonómico , ADN Ambiental/genética , ADN Ribosómico/genética , Operón , ARN Ribosómico 18S/genética , ARN Ribosómico 28S/genética , Suelo/parasitología
15.
J Chromatogr B Analyt Technol Biomed Life Sci ; 1130-1131: 121808, 2019 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-31669631

RESUMEN

A new sample extraction protocol was developed for pharmacokinetic studies of dabigatran with high-performance liquid chromatography separation - electrospray ionization time-of-flight mass spectrometry analysis. After protein precipitation with acetonitrile, free dabigatran and its metabolites are separated into water phase by water-dichloromethane liquid-liquid extraction to purify the sample from proteins and endogenous lipophilic compounds. Chromatographic separation was achieved on an Agilent Zorbax SB-CN column (150 × 4.6 mm, 5 µm)) using 0.1% aqueous solution of formic acid and acetonitrile (80:20) as the mobile phase. Agilent Zorbax SB-CN column was selected to improve sample resolution and to avoided early elution of dabigatran previously seen when using a C18 column. The extended calibration curve was constructed from 5 to 1000 ng/L while precision and accuracy were assessed at four levels across the linear dynamic ranges. Within-run precision was <5.6% and the between-run precision was <3.9%. The method accuracy ranged from 89.8% to 104.4%. The developed method was successfully applied to 30 patient samples to evaluate antithrombotic efficacy and anticoagulant activity of dabigatran following knee endoprosthesis surgery.


Asunto(s)
Cromatografía Líquida de Alta Presión/métodos , Dabigatrán/sangre , Dabigatrán/aislamiento & purificación , Espectrometría de Masas en Tándem/métodos , Dabigatrán/farmacocinética , Monitoreo de Drogas , Humanos , Límite de Detección , Modelos Lineales , Reproducibilidad de los Resultados
16.
Antiviral Res ; 172: 104617, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31593751

RESUMEN

Ebola fever is an acute highly contagious viral disease characterized by severe course, high mortality and development of hemorrhagic syndrome (tendency to skin hemorrhage and bleeding of mucous membranes). The mortality rate of the disease 60-90%. Nowadays, there are no licensed specific therapeutic agents for Ebola in the world. Monoclonal antibodies (MAbs) having viral neutralizing activity with high specificity to the GP protein of the Ebola virus are considered as candidate highly effective antiviral drugs. In our study, for the first time a panel of mouse monoclonal antibodies specifically binding to EBOV GP protein was obtained using recombinant human adenovirus 5 serotype, expressing GP protein (Ad5-GP). The virus-neutralizing capacities of antibodies were evaluated on the Ebola virus cell infection model, as well as recombinant vesicular stomatitis virus pseudotyped by GP Ebola virus protein (rVSV-GP) cell infection model. Based on the results of virus neutralization, two most promising clones were selected, the specific and protective capacities of which were determined. The study of the protection of selected individual antibody clones, as well as their combinations on the model of lethal infection of rhesus macaques with Ebola virus showed that intravenous administration of a mixture of antibodies in the amount of 50 mg/kg 24 h after infection leads to the survival of 100% of the animals, while individual clones of antibodies possess partial protection (0-30%). The results of the study suggest the important role of antibodies in controlling replication of the Ebola virus in vivo and show the possibility of using a mixture of antibodies specific to the GP to protect against lethal infection with the Ebola virus in the post-infected mode of administration.


Asunto(s)
Anticuerpos Neutralizantes/uso terapéutico , Antivirales , Ebolavirus , Fiebre Hemorrágica Ebola/terapia , Proteínas del Envoltorio Viral/inmunología , Animales , Anticuerpos Monoclonales/administración & dosificación , Anticuerpos Monoclonales/biosíntesis , Anticuerpos Monoclonales/uso terapéutico , Anticuerpos Neutralizantes/administración & dosificación , Anticuerpos Neutralizantes/biosíntesis , Anticuerpos Antivirales/administración & dosificación , Anticuerpos Antivirales/biosíntesis , Anticuerpos Antivirales/uso terapéutico , Antivirales/administración & dosificación , Antivirales/uso terapéutico , Células CHO , Chlorocebus aethiops , Cricetulus , Modelos Animales de Enfermedad , Ebolavirus/efectos de los fármacos , Ebolavirus/inmunología , Macaca mulatta/virología , Ratones , Proteínas Recombinantes/biosíntesis , Proteínas Recombinantes/inmunología , Células Vero , Proteínas del Envoltorio Viral/biosíntesis , Replicación Viral/efectos de los fármacos
17.
Bioinformatics ; 35(21): 4453-4455, 2019 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-31070718

RESUMEN

MOTIVATION: Phylogenies are important for fundamental biological research, but also have numerous applications in biotechnology, agriculture and medicine. Finding the optimal tree under the popular maximum likelihood (ML) criterion is known to be NP-hard. Thus, highly optimized and scalable codes are needed to analyze constantly growing empirical datasets. RESULTS: We present RAxML-NG, a from-scratch re-implementation of the established greedy tree search algorithm of RAxML/ExaML. RAxML-NG offers improved accuracy, flexibility, speed, scalability, and usability compared with RAxML/ExaML. On taxon-rich datasets, RAxML-NG typically finds higher-scoring trees than IQTree, an increasingly popular recent tool for ML-based phylogenetic inference (although IQ-Tree shows better stability). Finally, RAxML-NG introduces several new features, such as the detection of terraces in tree space and the recently introduced transfer bootstrap support metric. AVAILABILITY AND IMPLEMENTATION: The code is available under GNU GPL at https://github.com/amkozlov/raxml-ng. RAxML-NG web service (maintained by Vital-IT) is available at https://raxml-ng.vital-it.ch/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Filogenia , Programas Informáticos , Funciones de Verosimilitud
18.
Mol Biol Evol ; 36(9): 2086-2103, 2019 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-31114882

RESUMEN

Few models of sequence evolution incorporate parameters describing protein structure, despite its high conservation, essential functional role and increasing availability. We present a structurally aware empirical substitution model for amino acid sequence evolution in which proteins are expressed using an expanded alphabet that relays both amino acid identity and structural information. Each character specifies an amino acid as well as information about the rotamer configuration of its side-chain: the discrete geometric pattern of permitted side-chain atomic positions, as defined by the dihedral angles between covalently linked atoms. By assigning rotamer states in 251,194 protein structures and identifying 4,508,390 substitutions between closely related sequences, we generate a 55-state "Dayhoff-like" model that shows that the evolutionary properties of amino acids depend strongly upon side-chain geometry. The model performs as well as or better than traditional 20-state models for divergence time estimation, tree inference, and ancestral state reconstruction. We conclude that not only is rotamer configuration a valuable source of information for phylogenetic studies, but that modeling the concomitant evolution of sequence and structure may have important implications for understanding protein folding and function.


Asunto(s)
Evolución Molecular , Modelos Biológicos , Conformación Proteica , Sustitución de Aminoácidos , Cadenas de Markov
19.
Syst Biol ; 68(2): 365-369, 2019 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-30165689

RESUMEN

Next generation sequencing (NGS) technologies have led to a ubiquity of molecular sequence data. This data avalanche is particularly challenging in metagenetics, which focuses on taxonomic identification of sequences obtained from diverse microbial environments. Phylogenetic placement methods determine how these sequences fit into an evolutionary context. Previous implementations of phylogenetic placement algorithms, such as the evolutionary placement algorithm (EPA) included in RAxML, or PPLACER, are being increasingly used for this purpose. However, due to the steady progress in NGS technologies, the current implementations face substantial scalability limitations. Herein, we present EPA-NG, a complete reimplementation of the EPA that is substantially faster, offers a distributed memory parallelization, and integrates concepts from both, RAxML-EPA and PPLACER. EPA-NG can be executed on standard shared memory, as well as on distributed memory systems (e.g., computing clusters). To demonstrate the scalability of EPA-NG, we placed $1$ billion metagenetic reads from the Tara Oceans Project onto a reference tree with 3748 taxa in just under $7$ h, using 2048 cores. Our performance assessment shows that EPA-NG outperforms RAxML-EPA and PPLACER by up to a factor of $30$ in sequential execution mode, while attaining comparable parallel efficiency on shared memory systems. We further show that the distributed memory parallelization of EPA-NG scales well up to 2048 cores. EPA-NG is available under the AGPLv3 license: https://github.com/Pbdas/epa-ng.


Asunto(s)
Algoritmos , Clasificación/métodos , Filogenia , Análisis de Secuencia de ADN , Programas Informáticos
20.
Bioinformatics ; 35(10): 1771-1773, 2019 05 15.
Artículo en Inglés | MEDLINE | ID: mdl-30321303

RESUMEN

MOTIVATION: Coalescent- and reconciliation-based methods are now widely used to infer species phylogenies from genomic data. They typically use per-gene phylogenies as input, which requires conducting multiple individual tree inferences on a large set of multiple sequence alignments (MSAs). At present, no easy-to-use parallel tool for this task exists. Ad hoc scripts for this purpose do not only induce additional implementation overhead, but can also lead to poor resource utilization and long times-to-solution. We present ParGenes, a tool for simultaneously determining the best-fit model and inferring maximum likelihood (ML) phylogenies on thousands of independent MSAs using supercomputers. RESULTS: ParGenes executes common phylogenetic pipeline steps such as model-testing, ML inference(s), bootstrapping and computation of branch support values via a single parallel program invocation. We evaluated ParGenes by inferring > 20 000 phylogenetic gene trees with bootstrap support values from Ensembl Compara and VectorBase alignments in 28 h on a cluster with 1024 nodes. AVAILABILITY AND IMPLEMENTATION: GNU GPL at https://github.com/BenoitMorel/ParGenes. SUPPLEMENTARY INFORMATION: Supplementary material is available at Bioinformatics online.


Asunto(s)
Filogenia , Genómica , Probabilidad , Alineación de Secuencia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA