Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 114
Filtrar
1.
Proc Natl Acad Sci U S A ; 120(45): e2306899120, 2023 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-37903262

RESUMO

Taxonomic data are a scientific common. Unlike nomenclature, which has strong governance institutions, there are currently no generally accepted governance institutions for the compilation of taxonomic data into an accepted global list. This gap results in challenges for conservation, ecological research, policymaking, international trade, and other areas of scientific and societal importance. Consensus on a global list and its management requires effective governance and standards, including agreed mechanisms for choosing among competing taxonomies and partial lists. However, governance frameworks are currently lacking, and a call for governance in 2017 generated critical responses. Any governance system to which compliance is voluntary requires a high level of legitimacy and credibility among those by and for whom it is created. Legitimacy and credibility, in turn, require adequate and credible consultation. Here, we report on the results of a global survey of taxonomists, scientists from other disciplines, and users of taxonomy designed to assess views and test ideas for a new system of taxonomic list governance. We found a surprisingly high degree of agreement on the need for a global list of accepted species and their names, and consistent views on what such a list should provide to users and how it should be governed. The survey suggests that consensus on a mechanism to create, manage, and govern a single widely accepted list of all the world's species is achievable. This finding was unexpected given past controversies about the merits of list governance.


Assuntos
Comércio , Médicos , Humanos , Internacionalidade
2.
Nucleic Acids Res ; 51(D1): D708-D716, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36271801

RESUMO

Fungal taxonomy is a complex and rapidly changing subject, which makes proper naming of fungi challenging for taxonomists. A registration platform with a standardized and information-integrated database is a powerful tool for efficient research on fungal taxonomy. Fungal Names (FN, https://nmdc.cn/fungalnames/; launched in 2011) is one of the three official fungal nomenclatural repositories authorized by the International Nomenclature Committee for Fungi (NCF). Currently, FN includes >567 000 taxon names from >10 000 related journals and books published since 1596 and covers >147 000 collection records of type specimens/illustrations from >5000 preserving agencies. FN is also a knowledge base that integrates nomenclature information with specimens, culture collections and herbaria/fungaria, publications and taxonomists, and represents a summary of the history and recent advances in fungal taxonomy. Published fungal names are categorized based on well-accepted nomenclature rules and can be readily searched with different keywords and strategies. In combination with a standardized name checking tool and a sequence alignment-based identification package, FN makes the registration and typification of nomenclatural novelties of fungi convenient and accurate.


Assuntos
Fungos , Bases de Conhecimento , Gerenciamento de Dados , Bases de Dados Factuais , Alinhamento de Sequência , Fungos/classificação , Terminologia como Assunto
3.
PLoS Genet ; 18(1): e1009975, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-35085229

RESUMO

Clustering genetic variants based on their associations with different traits can provide insight into their underlying biological mechanisms. Existing clustering approaches typically group variants based on the similarity of their association estimates for various traits. We present a new procedure for clustering variants based on their proportional associations with different traits, which is more reflective of the underlying mechanisms to which they relate. The method is based on a mixture model approach for directional clustering and includes a noise cluster that provides robustness to outliers. The procedure performs well across a range of simulation scenarios. In an applied setting, clustering genetic variants associated with body mass index generates groups reflective of distinct biological pathways. Mendelian randomization analyses support that the clusters vary in their effect on coronary heart disease, including one cluster that represents elevated body mass index with a favourable metabolic profile and reduced coronary heart disease risk. Analysis of the biological pathways underlying this cluster identifies inflammation as potentially explaining differences in the effects of increased body mass index on coronary heart disease.


Assuntos
Biologia Computacional/métodos , Variação Genética , Obesidade/genética , Índice de Massa Corporal , Análise por Conglomerados , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Análise da Randomização Mendeliana , Modelos Genéticos
4.
BMC Bioinformatics ; 24(1): 161, 2023 Apr 21.
Artigo em Inglês | MEDLINE | ID: mdl-37085771

RESUMO

In this paper we propose PIICM, a probabilistic framework for dose-response prediction in high-throughput drug combination datasets. PIICM utilizes a permutation invariant version of the intrinsic co-regionalization model for multi-output Gaussian process regression, to predict dose-response surfaces in untested drug combination experiments. Coupled with an observation model that incorporates experimental uncertainty, PIICM is able to learn from noisily observed cell-viability measurements in settings where the underlying dose-response experiments are of varying quality, utilize different experimental designs, and the resulting training dataset is sparsely observed. We show that the model can accurately predict dose-response in held out experiments, and the resulting function captures relevant features indicating synergistic interaction between drugs.


Assuntos
Projetos de Pesquisa , Incerteza , Combinação de Medicamentos
5.
PLoS Med ; 20(11): e1004310, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37922316

RESUMO

BACKGROUND: Multimorbidity, characterised by the coexistence of multiple chronic conditions in an individual, is a rising public health concern. While much of the existing research has focused on cross-sectional patterns of multimorbidity, there remains a need to better understand the longitudinal accumulation of diseases. This includes examining the associations between important sociodemographic characteristics and the rate of progression of chronic conditions. METHODS AND FINDINGS: We utilised electronic primary care records from 13.48 million participants in England, drawn from the Clinical Practice Research Datalink (CPRD Aurum), spanning from 2005 to 2020 with a median follow-up of 4.71 years (IQR: 1.78, 11.28). The study focused on 5 important chronic conditions: cardiovascular disease (CVD), type 2 diabetes (T2D), chronic kidney disease (CKD), heart failure (HF), and mental health (MH) conditions. Key sociodemographic characteristics considered include ethnicity, social and material deprivation, gender, and age. We employed a flexible spline-based parametric multistate model to investigate the associations between these sociodemographic characteristics and the rate of different disease transitions throughout multimorbidity development. Our findings reveal distinct association patterns across different disease transition types. Deprivation, gender, and age generally demonstrated stronger associations with disease diagnosis compared to ethnic group differences. Notably, the impact of these factors tended to attenuate with an increase in the number of preexisting conditions, especially for deprivation, gender, and age. For example, the hazard ratio (HR) (95% CI; p-value) for the association of deprivation with T2D diagnosis (comparing the most deprived quintile to the least deprived) is 1.76 ([1.74, 1.78]; p < 0.001) for those with no preexisting conditions and decreases to 0.95 ([0.75, 1.21]; p = 0.69) with 4 preexisting conditions. Furthermore, the impact of deprivation, gender, and age was typically more pronounced when transitioning from an MH condition. For instance, the HR (95% CI; p-value) for the association of deprivation with T2D diagnosis when transitioning from MH is 2.03 ([1.95, 2.12], p < 0.001), compared to transitions from CVD 1.50 ([1.43, 1.58], p < 0.001), CKD 1.37 ([1.30, 1.44], p < 0.001), and HF 1.55 ([1.34, 1.79], p < 0.001). A primary limitation of our study is that potential diagnostic inaccuracies in primary care records, such as underdiagnosis, overdiagnosis, or ascertainment bias of chronic conditions, could influence our results. CONCLUSIONS: Our results indicate that early phases of multimorbidity development could warrant increased attention. The potential importance of earlier detection and intervention of chronic conditions is underscored, particularly for MH conditions and higher-risk populations. These insights may have important implications for the management of multimorbidity.


Assuntos
Doenças Cardiovasculares , Diabetes Mellitus Tipo 2 , Insuficiência Cardíaca , Insuficiência Renal Crônica , Humanos , Multimorbidade , Diabetes Mellitus Tipo 2/diagnóstico , Diabetes Mellitus Tipo 2/epidemiologia , Estudos Transversais , Inglaterra/epidemiologia , Insuficiência Cardíaca/diagnóstico , Insuficiência Cardíaca/epidemiologia , Doença Crônica , Insuficiência Renal Crônica/diagnóstico , Insuficiência Renal Crônica/epidemiologia , Atenção Primária à Saúde
6.
Biostatistics ; 24(1): 85-107, 2022 12 12.
Artigo em Inglês | MEDLINE | ID: mdl-34363680

RESUMO

Risk prediction models are a crucial tool in healthcare. Risk prediction models with a binary outcome (i.e., binary classification models) are often constructed using methodology which assumes the costs of different classification errors are equal. In many healthcare applications, this assumption is not valid, and the differences between misclassification costs can be quite large. For instance, in a diagnostic setting, the cost of misdiagnosing a person with a life-threatening disease as healthy may be larger than the cost of misdiagnosing a healthy person as a patient. In this article, we present Tailored Bayes (TB), a novel Bayesian inference framework which "tailors" model fitting to optimize predictive performance with respect to unbalanced misclassification costs. We use simulation studies to showcase when TB is expected to outperform standard Bayesian methods in the context of logistic regression. We then apply TB to three real-world applications, a cardiac surgery, a breast cancer prognostication task, and a breast cancer tumor classification task and demonstrate the improvement in predictive performance over standard methods.


Assuntos
Neoplasias da Mama , Modelos Estatísticos , Humanos , Feminino , Teorema de Bayes , Modelos Logísticos , Simulação por Computador , Neoplasias da Mama/diagnóstico
7.
Bioinformatics ; 38(9): 2529-2535, 2022 04 28.
Artigo em Inglês | MEDLINE | ID: mdl-35191485

RESUMO

MOTIVATION: Inferring the parameters of models describing biological systems is an important problem in the reverse engineering of the mechanisms underlying these systems. Much work has focused on parameter inference of stochastic and ordinary differential equation models using Approximate Bayesian Computation (ABC). While there is some recent work on inference in spatial models, this remains an open problem. Simultaneously, advances in topological data analysis (TDA), a field of computational mathematics, have enabled spatial patterns in data to be characterized. RESULTS: Here, we focus on recent work using TDA to study different regimes of parameter space for a well-studied model of angiogenesis. We propose a method for combining TDA with ABC to infer parameters in the Anderson-Chaplain model of angiogenesis. We demonstrate that this topological approach outperforms ABC approaches that use simpler statistics based on spatial features of the data. This is a first step toward a general framework of spatial parameter inference for biological systems, for which there may be a variety of filtrations, vectorizations and summary statistics to be considered. AVAILABILITY AND IMPLEMENTATION: All code used to produce our results is available as a Snakemake workflow from github.com/tt104/tabc_angio.


Assuntos
Algoritmos , Teorema de Bayes , Simulação por Computador
8.
BMC Bioinformatics ; 23(1): 290, 2022 Jul 21.
Artigo em Inglês | MEDLINE | ID: mdl-35864476

RESUMO

BACKGROUND: Cluster analysis is an integral part of precision medicine and systems biology, used to define groups of patients or biomolecules. Consensus clustering is an ensemble approach that is widely used in these areas, which combines the output from multiple runs of a non-deterministic clustering algorithm. Here we consider the application of consensus clustering to a broad class of heuristic clustering algorithms that can be derived from Bayesian mixture models (and extensions thereof) by adopting an early stopping criterion when performing sampling-based inference for these models. While the resulting approach is non-Bayesian, it inherits the usual benefits of consensus clustering, particularly in terms of computational scalability and providing assessments of clustering stability/robustness. RESULTS: In simulation studies, we show that our approach can successfully uncover the target clustering structure, while also exploring different plausible clusterings of the data. We show that, when a parallel computation environment is available, our approach offers significant reductions in runtime compared to performing sampling-based Bayesian inference for the underlying model, while retaining many of the practical benefits of the Bayesian approach, such as exploring different numbers of clusters. We propose a heuristic to decide upon ensemble size and the early stopping criterion, and then apply consensus clustering to a clustering algorithm derived from a Bayesian integrative clustering method. We use the resulting approach to perform an integrative analysis of three 'omics datasets for budding yeast and find clusters of co-expressed genes with shared regulatory proteins. We validate these clusters using data external to the analysis. CONCLUSTIONS: Our approach can be used as a wrapper for essentially any existing sampling-based Bayesian clustering implementation, and enables meaningful clustering analyses to be performed using such implementations, even when computational Bayesian inference is not feasible, e.g. due to poor exploration of the target density (often as a result of increasing numbers of features) or a limited computational budget that does not along sufficient samples to drawn from a single chain. This enables researchers to straightforwardly extend the applicability of existing software to much larger datasets, including implementations of sophisticated models such as those that jointly model multiple datasets.


Assuntos
Algoritmos , Software , Teorema de Bayes , Análise por Conglomerados , Consenso , Humanos
9.
Bioinformatics ; 37(4): 531-541, 2021 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-32915962

RESUMO

MOTIVATION: Mendelian randomization is an epidemiological technique that uses genetic variants as instrumental variables to estimate the causal effect of a risk factor on an outcome. We consider a scenario in which causal estimates based on each variant in turn differ more strongly than expected by chance alone, but the variants can be divided into distinct clusters, such that all variants in the cluster have similar causal estimates. This scenario is likely to occur when there are several distinct causal mechanisms by which a risk factor influences an outcome with different magnitudes of causal effect. We have developed an algorithm MR-Clust that finds such clusters of variants, and so can identify variants that reflect distinct causal mechanisms. Two features of our clustering algorithm are that it accounts for differential uncertainty in the causal estimates, and it includes 'null' and 'junk' clusters, to provide protection against the detection of spurious clusters. RESULTS: Our algorithm correctly detected the number of clusters in a simulation analysis, outperforming methods that either do not account for uncertainty or do not include null and junk clusters. In an applied example considering the effect of blood pressure on coronary artery disease risk, the method detected four clusters of genetic variants. A post hoc hypothesis-generating search suggested that variants in the cluster with a negative effect of blood pressure on coronary artery disease risk were more strongly related to trunk fat percentage and other adiposity measures than variants not in this cluster. AVAILABILITY AND IMPLEMENTATION: MR-Clust can be downloaded from https://github.com/cnfoley/mrclust. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Análise da Randomização Mendeliana , Causalidade , Análise por Conglomerados , Simulação por Computador , Fatores de Risco
10.
Bioinformatics ; 36(18): 4789-4796, 2020 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-32592464

RESUMO

MOTIVATION: Diverse applications-particularly in tumour subtyping-have demonstrated the importance of integrative clustering techniques for combining information from multiple data sources. Cluster Of Clusters Analysis (COCA) is one such approach that has been widely applied in the context of tumour subtyping. However, the properties of COCA have never been systematically explored, and its robustness to the inclusion of noisy datasets is unclear. RESULTS: We rigorously benchmark COCA, and present Kernel Learning Integrative Clustering (KLIC) as an alternative strategy. KLIC frames the challenge of combining clustering structures as a multiple kernel learning problem, in which different datasets each provide a weighted contribution to the final clustering. This allows the contribution of noisy datasets to be down-weighted relative to more informative datasets. We compare the performances of KLIC and COCA in a variety of situations through simulation studies. We also present the output of KLIC and COCA in real data applications to cancer subtyping and transcriptional module discovery. AVAILABILITY AND IMPLEMENTATION: R packages klic and coca are available on the Comprehensive R Archive Network. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Neoplasias , Algoritmos , Análise por Conglomerados , Consenso , Humanos , Armazenamento e Recuperação da Informação , Neoplasias/genética
11.
Bioinformatics ; 36(5): 1484-1491, 2020 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-31608923

RESUMO

MOTIVATION: Many methods have been developed to cluster genes on the basis of their changes in mRNA expression over time, using bulk RNA-seq or microarray data. However, single-cell data may present a particular challenge for these algorithms, since the temporal ordering of cells is not directly observed. One way to address this is to first use pseudotime methods to order the cells, and then apply clustering techniques for time course data. However, pseudotime estimates are subject to high levels of uncertainty, and failing to account for this uncertainty is liable to lead to erroneous and/or over-confident gene clusters. RESULTS: The proposed method, GPseudoClust, is a novel approach that jointly infers pseudotemporal ordering and gene clusters, and quantifies the uncertainty in both. GPseudoClust combines a recent method for pseudotime inference with non-parametric Bayesian clustering methods, efficient Markov Chain Monte Carlo sampling and novel subsampling strategies which aid computation. We consider a broad array of simulated and experimental datasets to demonstrate the effectiveness of GPseudoClust in a range of settings. AVAILABILITY AND IMPLEMENTATION: An implementation is available on GitHub: https://github.com/magStra/nonparametricSummaryPSM and https://github.com/magStra/GPseudoClust. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Análise de Célula Única , Teorema de Bayes , Análise por Conglomerados , Cadeias de Markov
12.
PLoS Comput Biol ; 16(11): e1008288, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-33166281

RESUMO

The cell is compartmentalised into complex micro-environments allowing an array of specialised biological processes to be carried out in synchrony. Determining a protein's sub-cellular localisation to one or more of these compartments can therefore be a first step in determining its function. High-throughput and high-accuracy mass spectrometry-based sub-cellular proteomic methods can now shed light on the localisation of thousands of proteins at once. Machine learning algorithms are then typically employed to make protein-organelle assignments. However, these algorithms are limited by insufficient and incomplete annotation. We propose a semi-supervised Bayesian approach to novelty detection, allowing the discovery of additional, previously unannotated sub-cellular niches. Inference in our model is performed in a Bayesian framework, allowing us to quantify uncertainty in the allocation of proteins to new sub-cellular niches, as well as in the number of newly discovered compartments. We apply our approach across 10 mass spectrometry based spatial proteomic datasets, representing a diverse range of experimental protocols. Application of our approach to hyperLOPIT datasets validates its utility by recovering enrichment with chromatin-associated proteins without annotation and uncovers sub-nuclear compartmentalisation which was not identified in the original analysis. Moreover, using sub-cellular proteomics data from Saccharomyces cerevisiae, we uncover a novel group of proteins trafficking from the ER to the early Golgi apparatus. Overall, we demonstrate the potential for novelty detection to yield biologically relevant niches that are missed by current approaches.


Assuntos
Teorema de Bayes , Proteínas de Saccharomyces cerevisiae/metabolismo , Frações Subcelulares/metabolismo , Algoritmos , Animais , Conjuntos de Dados como Assunto , Humanos , Aprendizado de Máquina , Espectrometria de Massas , Camundongos , Proteômica
13.
Fungal Divers ; 109(1): 59-98, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34608378

RESUMO

The increasing number of new fungal species described from all over the world along with the use of genetics to define taxa, has dramatically changed the classification system of early-diverging fungi over the past several decades. The number of phyla established for non-Dikarya fungi has increased from 2 to 17. However, to date, both the classification and phylogeny of the basal fungi are still unresolved. In this article, we review the recent taxonomy of the basal fungi and re-evaluate the relationships among early-diverging lineages of fungal phyla. We also provide information on the ecology and distribution in Mucoromycota and highlight the impact of chytrids on amphibian populations. Species concepts in Chytridiomycota, Aphelidiomycota, Rozellomycota, Neocallimastigomycota are discussed in this paper. To preserve the current application of the genus Nephridiophaga (Chytridiomycota: Nephridiophagales), a new type species, Nephridiophaga blattellae, is proposed.

14.
Fungal Divers ; 111(1): 1-335, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34899100

RESUMO

This article is the 13th contribution in the Fungal Diversity Notes series, wherein 125 taxa from four phyla, ten classes, 31 orders, 69 families, 92 genera and three genera incertae sedis are treated, demonstrating worldwide and geographic distribution. Fungal taxa described and illustrated in the present study include three new genera, 69 new species, one new combination, one reference specimen and 51 new records on new hosts and new geographical distributions. Three new genera, Cylindrotorula (Torulaceae), Scolecoleotia (Leotiales genus incertae sedis) and Xenovaginatispora (Lindomycetaceae) are introduced based on distinct phylogenetic lineages and unique morphologies. Newly described species are Aspergillus lannaensis, Cercophora dulciaquae, Cladophialophora aquatica, Coprinellus punjabensis, Cortinarius alutarius, C. mammillatus, C. quercoflocculosus, Coryneum fagi, Cruentomycena uttarakhandina, Cryptocoryneum rosae, Cyathus uniperidiolus, Cylindrotorula indica, Diaporthe chamaeropicola, Didymella azollae, Diplodia alanphillipsii, Dothiora coronicola, Efibula rodriguezarmasiae, Erysiphe salicicola, Fusarium queenslandicum, Geastrum gorgonicum, G. hansagiense, Helicosporium sexualis, Helminthosporium chiangraiensis, Hongkongmyces kokensis, Hydrophilomyces hydraenae, Hygrocybe boertmannii, Hyphoderma australosetigerum, Hyphodontia yunnanensis, Khaleijomyces umikazeana, Laboulbenia divisa, Laboulbenia triarthronis, Laccaria populina, Lactarius pallidozonarius, Lepidosphaeria strobelii, Longipedicellata megafusiformis, Lophiotrema lincangensis, Marasmius benghalensis, M. jinfoshanensis, M. subtropicus, Mariannaea camelliae, Melanographium smilaxii, Microbotryum polycnemoides, Mimeomyces digitatus, Minutisphaera thailandensis, Mortierella solitaria, Mucor harpali, Nigrograna jinghongensis, Odontia huanrenensis, O. parvispina, Paraconiothyrium ajrekarii, Parafuscosporella niloticus, Phaeocytostroma yomensis, Phaeoisaria synnematicus, Phanerochaete hainanensis, Pleopunctum thailandicum, Pleurotheciella dimorphospora, Pseudochaetosphaeronema chiangraiense, Pseudodactylaria albicolonia, Rhexoacrodictys nigrospora, Russula paravioleipes, Scolecoleotia eriocamporesi, Seriascoma honghense, Synandromyces makranczyi, Thyridaria aureobrunnea, Torula lancangjiangensis, Tubeufia longihelicospora, Wicklowia fusiformispora, Xenovaginatispora phichaiensis and Xylaria apiospora. One new combination, Pseudobactrodesmium stilboideus is proposed. A reference specimen of Comoclathris permunda is designated. New host or distribution records are provided for Acrocalymma fici, Aliquandostipite khaoyaiensis, Camarosporidiella laburni, Canalisporium caribense, Chaetoscutula juniperi, Chlorophyllum demangei, C. globosum, C. hortense, Cladophialophora abundans, Dendryphion hydei, Diaporthe foeniculina, D. pseudophoenicicola, D. pyracanthae, Dictyosporium pandanicola, Dyfrolomyces distoseptatus, Ernakulamia tanakae, Eutypa flavovirens, E. lata, Favolus septatus, Fusarium atrovinosum, F. clavum, Helicosporium luteosporum, Hermatomyces nabanheensis, Hermatomyces sphaericoides, Longipedicellata aquatica, Lophiostoma caudata, L. clematidis-vitalbae, Lophiotrema hydei, L. neoarundinaria, Marasmiellus palmivorus, Megacapitula villosa, Micropsalliota globocystis, M. gracilis, Montagnula thailandica, Neohelicosporium irregulare, N. parisporum, Paradictyoarthrinium diffractum, Phaeoisaria aquatica, Poaceascoma taiwanense, Saproamanita manicata, Spegazzinia camelliae, Submersispora variabilis, Thyronectria caudata, T. mackenziei, Tubeufia chiangmaiensis, T. roseohelicospora, Vaginatispora nypae, Wicklowia submersa, Xanthagaricus necopinatus and Xylaria haemorrhoidalis. The data presented herein are based on morphological examination of fresh specimens, coupled with analysis of phylogenetic sequence data to better integrate taxa into appropriate taxonomic ranks and infer their evolutionary relationships.

15.
Compr Rev Food Sci Food Saf ; 20(2): 1982-2014, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33599116

RESUMO

Wild mushrooms are a vital source of income and nutrition for many poor communities and of value to recreational foragers. Literature relating to the edibility of mushroom species continues to expand, driven by an increasing demand for wild mushrooms, a wider interest in foraging, and the study of traditional foods. Although numerous case reports have been published on edible mushrooms, doubt and confusion persist regarding which species are safe and suitable to consume. Case reports often differ, and the evidence supporting the stated properties of mushrooms can be incomplete or ambiguous. The need for greater clarity on edible species is further underlined by increases in mushroom-related poisonings. We propose a system for categorizing mushroom species and assigning a final edibility status. Using this system, we reviewed 2,786 mushroom species from 99 countries, accessing 9,783 case reports, from over 1,100 sources. We identified 2,189 edible species, of which 2,006 can be consumed safely, and a further 183 species which required some form of pretreatment prior to safe consumption or were associated with allergic reactions by some. We identified 471 species of uncertain edibility because of missing or incomplete evidence of consumption, and 76 unconfirmed species because of unresolved, differing opinions on edibility and toxicity. This is the most comprehensive list of edible mushrooms available to date, demonstrating the huge number of mushrooms species consumed. Our review highlights the need for further information on uncertain and clash species, and the need to present evidence in a clear, unambiguous, and consistent manner.


Assuntos
Agaricales , Intoxicação Alimentar por Cogumelos , Alimentos , Humanos , Intoxicação Alimentar por Cogumelos/epidemiologia
16.
Stat Appl Genet Mol Biol ; 18(6)2019 12 12.
Artigo em Inglês | MEDLINE | ID: mdl-31829970

RESUMO

The Dirichlet Process (DP) mixture model has become a popular choice for model-based clustering, largely because it allows the number of clusters to be inferred. The sequential updating and greedy search (SUGS) algorithm (Wang & Dunson, 2011) was proposed as a fast method for performing approximate Bayesian inference in DP mixture models, by posing clustering as a Bayesian model selection (BMS) problem and avoiding the use of computationally costly Markov chain Monte Carlo methods. Here we consider how this approach may be extended to permit variable selection for clustering, and also demonstrate the benefits of Bayesian model averaging (BMA) in place of BMS. Through an array of simulation examples and well-studied examples from cancer transcriptomics, we show that our method performs competitively with the current state-of-the-art, while also offering computational benefits. We apply our approach to reverse-phase protein array (RPPA) data from The Cancer Genome Atlas (TCGA) in order to perform a pan-cancer proteomic characterisation of 5157 tumour samples. We have implemented our approach, together with the original SUGS algorithm, in an open-source R package named sugsvarsel, which accelerates analysis by performing intensive computations in C++ and provides automated parallel processing. The R package is freely available from: https://github.com/ococrook/sugsvarsel.


Assuntos
Biologia Computacional , Modelos Estatísticos , Neoplasias/metabolismo , Proteoma , Proteômica , Algoritmos , Teorema de Bayes , Biologia Computacional/métodos , Humanos , Proteômica/métodos
17.
PLoS Comput Biol ; 14(11): e1006516, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30481170

RESUMO

Analysis of the spatial sub-cellular distribution of proteins is of vital importance to fully understand context specific protein function. Some proteins can be found with a single location within a cell, but up to half of proteins may reside in multiple locations, can dynamically re-localise, or reside within an unknown functional compartment. These considerations lead to uncertainty in associating a protein to a single location. Currently, mass spectrometry (MS) based spatial proteomics relies on supervised machine learning algorithms to assign proteins to sub-cellular locations based on common gradient profiles. However, such methods fail to quantify uncertainty associated with sub-cellular class assignment. Here we reformulate the framework on which we perform statistical analysis. We propose a Bayesian generative classifier based on Gaussian mixture models to assign proteins probabilistically to sub-cellular niches, thus proteins have a probability distribution over sub-cellular locations, with Bayesian computation performed using the expectation-maximisation (EM) algorithm, as well as Markov-chain Monte-Carlo (MCMC). Our methodology allows proteome-wide uncertainty quantification, thus adding a further layer to the analysis of spatial proteomics. Our framework is flexible, allowing many different systems to be analysed and reveals new modelling opportunities for spatial proteomics. We find our methods perform competitively with current state-of-the art machine learning methods, whilst simultaneously providing more information. We highlight several examples where classification based on the support vector machine is unable to make any conclusions, while uncertainty quantification using our approach provides biologically intriguing results. To our knowledge this is the first Bayesian model of MS-based spatial proteomics data.


Assuntos
Teorema de Bayes , Modelos Teóricos , Proteômica , Algoritmos , Animais , Células-Tronco Embrionárias/metabolismo , Aprendizado de Máquina , Camundongos , Reprodutibilidade dos Testes , Frações Subcelulares/metabolismo , Incerteza
18.
New Phytol ; 220(2): 517-525, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30035303

RESUMO

Incompleteness of reference sequence databases and unresolved taxonomic relationships complicates taxonomic placement of fungal sequences. We developed Protax-fungi, a general tool for taxonomic placement of fungal internal transcribed spacer (ITS) sequences, and implemented it into the PlutoF platform of the UNITE database for molecular identification of fungi. With empirical data on root- and wood-associated fungi, Protax-fungi reliably identified (with at least 90% identification probability) the majority of sequences to the order level but only around one-fifth of them to the species level, reflecting the current limited coverage of the databases. Protax-fungi outperformed the Sintax and Rdb classifiers in terms of increased accuracy and decreased calibration error when applied to data on mock communities representing species groups with poor sequence database coverage. We applied Protax-fungi to examine the internal consistencies of the Index Fungorum and UNITE databases. This revealed inconsistencies in the taxonomy database as well as mislabelling and sequence quality problems in the reference database. The according improvements were implemented in both databases. Protax-fungi provides a robust tool for performing statistically reliable identifications of fungi in spite of the incompleteness of extant reference sequence databases and unresolved taxonomic relationships.


Assuntos
DNA Espaçador Ribossômico/genética , Fungos/classificação , Fungos/genética , Internet , Sequência de Bases , Bases de Dados Genéticas , Raízes de Plantas/microbiologia , Madeira/microbiologia
19.
Ecology ; 99(6): 1306-1315, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-29655179

RESUMO

Here we assess the impact of geographically dependent (latitude, longitude, and altitude) changes in bioclimatic (temperature, precipitation, and primary productivity) variability on fungal fruiting phenology across Europe. Two main nutritional guilds of fungi, saprotrophic and ectomycorrhizal, were further separated into spring and autumn fruiters. We used a path analysis to investigate how biogeographic patterns in fungal fruiting phenology coincided with seasonal changes in climate and primary production. Across central to northern Europe, mean fruiting varied by approximately 25 d, primarily with latitude. Altitude affected fruiting by up to 30 d, with spring delays and autumnal accelerations. Fruiting was as much explained by the effects of bioclimatic variability as by their large-scale spatial patterns. Temperature drove fruiting of autumnal ectomycorrhizal and saprotrophic groups as well as spring saprotrophic groups, while primary production and precipitation were major drivers for spring-fruiting ectomycorrhizal fungi. Species-specific phenology predictors were not stable, instead deviating from the overall mean. There is significant likelihood that further climatic change, especially in temperature, will impact fungal phenology patterns at large spatial scales. The ecological implications are diverse, potentially affecting food webs (asynchrony), nutrient cycling and the timing of nutrient availability in ecosystems.


Assuntos
Clima , Ecossistema , Mudança Climática , Europa (Continente) , Estações do Ano
20.
Bioinformatics ; 32(18): 2863-5, 2016 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-27153663

RESUMO

MOTIVATION: Many biochemical systems require stochastic descriptions. Unfortunately these can only be solved for the simplest cases and their direct simulation can become prohibitively expensive, precluding thorough analysis. As an alternative, moment closure approximation methods generate equations for the time-evolution of the system's moments and apply a closure ansatz to obtain a closed set of differential equations; that can become the basis for the deterministic analysis of the moments of the outputs of stochastic systems. RESULTS: We present a free, user-friendly tool implementing an efficient moment expansion approximation with parametric closures that integrates well with the IPython interactive environment. Our package enables the analysis of complex stochastic systems without any constraints on the number of species and moments studied and the type of rate laws in the system. In addition to the approximation method our package provides numerous tools to help non-expert users in stochastic analysis. AVAILABILITY AND IMPLEMENTATION: https://github.com/theosysbio/means CONTACTS: m.stumpf@imperial.ac.uk or e.lakatos13@imperial.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software , Processos Estocásticos , Simulação por Computador , Expressão Gênica , Cinética , Modelos Estatísticos
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa