Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
Add more filters










Publication year range
1.
Toxins (Basel) ; 16(6)2024 Jun 12.
Article in English | MEDLINE | ID: mdl-38922162

ABSTRACT

Mycotoxins, toxic secondary metabolites produced by certain fungi, pose significant threats to global food safety and public health. These compounds can contaminate a variety of crops, leading to economic losses and health risks to both humans and animals. Traditional lab analysis methods for mycotoxin detection can be time-consuming and may not always be suitable for large-scale screenings. However, in recent years, machine learning (ML) methods have gained popularity for use in the detection of mycotoxins and in the food safety industry in general due to their accurate and timely predictions. We provide a systematic review on some of the recent ML applications for detecting/predicting the presence of mycotoxin on a variety of food ingredients, highlighting their advantages, challenges, and potential for future advancements. We address the need for reproducibility and transparency in ML research through open access to data and code. An observation from our findings is the frequent lack of detailed reporting on hyperparameters in many studies and a lack of open source code, which raises concerns about the reproducibility and optimisation of the ML models used. The findings reveal that while the majority of studies predominantly utilised neural networks for mycotoxin detection, there was a notable diversity in the types of neural network architectures employed, with convolutional neural networks being the most popular.


Subject(s)
Food Contamination , Machine Learning , Mycotoxins , Mycotoxins/analysis , Food Contamination/analysis , Animals , Humans , Neural Networks, Computer
2.
PLoS One ; 17(2): e0263454, 2022.
Article in English | MEDLINE | ID: mdl-35130334

ABSTRACT

Stable isotope ratios are used to reconstruct animal diet in trophic ecology via mixing models. Several assumptions of stable isotope mixing models are critical, i.e., constant trophic discrimination factor and isotopic equilibrium between the consumer and its diet. The isotopic turnover rate (λ and its counterpart the half-life) affects the dynamics of isotopic incorporation for an organism and the isotopic equilibrium assumption: λ involves a time lag between the real assimilated diet and the diet estimated by mixing models at the individual scale. Current stable isotope mixing model studies consider neither this time lag nor even the dynamics of isotopic ratios in general. We developed a mechanistic framework using a dynamic mixing model (DMM) to assess the contribution of λ to the dynamics of isotopic incorporation and to estimate the bias induced by neglecting the time lag in diet reconstruction in conventional static mixing models (SMMs). The DMM includes isotope dynamics of sources (denoted δs), λ and frequency of diet-switch (ω). The results showed a significant bias generated by the SMM compared to the DMM (up to 50% of differences). This bias can be strongly reduced in SMMs by averaging the isotopic variations of the food sources over a time window equal to twice the isotopic half-life. However, the bias will persist (∼15%) for intermediate values of the ω/λ ratio. The inferences generated using a case study highlighted that DMM enhanced estimates of consumer's diet, and this could avoid misinterpretation in ecosystem functioning, food-web structure analysis and underlying biological processes.


Subject(s)
Diet , Feeding Behavior/physiology , Food Chain , Isotopes/pharmacokinetics , Animals , Behavior, Animal/physiology , Computer Simulation , Ecosystem , Half-Life , Statistics as Topic
3.
Front Genet ; 12: 761503, 2021.
Article in English | MEDLINE | ID: mdl-34795696

ABSTRACT

The relative contributions of both copy number variants (CNVs) and single nucleotide polymorphisms (SNPs) to the additive genetic variance of carcass traits in cattle is not well understood. A detailed understanding of the relative importance of CNVs in cattle may have implications for study design of both genomic predictions and genome-wide association studies. The first objective of the present study was to quantify the relative contributions of CNV data and SNP genotype data to the additive genetic variance of carcass weight, fat, and conformation for 945 Charolais, 923 Holstein-Friesian, and 974 Limousin sires. The second objective was to jointly consider SNP and CNV data in a least absolute selection and shrinkage operator (LASSO) regression model to identify genomic regions associated with carcass weight, fat, and conformation within each of the three breeds separately. A genomic relationship matrix (GRM) based on just CNV data did not capture any variance in the three carcass traits when jointly evaluated with a SNP-derived GRM. In the LASSO regression analysis, a total of 987 SNPs and 18 CNVs were associated with at least one of the three carcass traits in at least one of the three breeds. The quantitative trait loci (QTLs) corresponding to the associated SNPs and CNVs overlapped with several candidate genes including previously reported candidate genes such as MSTN and RSAD2, and several potential novel candidate genes such as ACTN2 and THOC1. The results of the LASSO regression analysis demonstrated that CNVs can be used to detect associations with carcass traits which were not detected using the set of SNPs available in the present study. Therefore, the CNVs and SNPs available in the present study were not redundant forms of genomic data.

4.
BMC Genomics ; 22(1): 757, 2021 Oct 23.
Article in English | MEDLINE | ID: mdl-34688258

ABSTRACT

BACKGROUND: The carcass value of cattle is a function of carcass weight and quality. Given the economic importance of carcass merit to producers, it is routinely included in beef breeding objectives. A detailed understanding of the genetic variants that contribute to carcass merit is useful to maximize the efficiency of breeding for improved carcass merit. The objectives of the present study were two-fold: firstly, to perform genome-wide association analyses of carcass weight, carcass conformation, and carcass fat using copy number variant (CNV) data in a population of 923 Holstein-Friesian, 945 Charolais, and 974 Limousin bulls; and secondly to perform separate association analyses of carcass traits on the same population of cattle using the Log R ratio (LRR) values of 712,555 single nucleotide polymorphisms (SNPs). The LRR value of a SNP is a measure of the signal intensity of the SNP generated during the genotyping process. RESULTS: A total of 13,969, 3,954, and 2,805 detected CNVs were tested for association with the three carcass traits for the Holstein-Friesian, Charolais, and Limousin, respectively. The copy number of 16 CNVs and the LRR of 34 SNPs were associated with at least one of the three carcass traits in at least one of the three cattle breeds. With the exception of three SNPs, none of the quantitative trait loci detected in the CNV association analyses or the SNP LRR association analyses were also detected using traditional association analyses based on SNP allele counts. Many of the CNVs and SNPs associated with the carcass traits were located near genes related to the structure and function of the spliceosome and the ribosome; in particular, U6 which encodes a spliceosomal subunit and 5S rRNA which encodes a ribosomal subunit. CONCLUSIONS: The present study demonstrates that CNV data and SNP LRR data can be used to detect genomic regions associated with carcass traits in cattle providing information on quantitative trait loci over and above those detected using just SNP allele counts, as is the approach typically employed in genome-wide association analyses.


Subject(s)
Genome-Wide Association Study , Polymorphism, Single Nucleotide , Animals , Cattle/genetics , DNA Copy Number Variations , Male , Phenotype , Quantitative Trait Loci
5.
PLoS Comput Biol ; 17(8): e1009289, 2021 08.
Article in English | MEDLINE | ID: mdl-34415913

ABSTRACT

The epidemic increase in the incidence of Human Papilloma Virus (HPV) related Oropharyngeal Squamous Cell Carcinomas (OPSCCs) in several countries worldwide represents a significant public health concern. Although gender neutral HPV vaccination programmes are expected to cause a reduction in the incidence rates of OPSCCs, these effects will not be evident in the foreseeable future. Secondary prevention strategies are currently not feasible due to an incomplete understanding of the natural history of oral HPV infections in OPSCCs. The key parameters that govern natural history models remain largely ill-defined for HPV related OPSCCs and cannot be easily inferred from experimental data. Mathematical models have been used to estimate some of these ill-defined parameters in cervical cancer, another HPV related cancer leading to successful implementation of cancer prevention strategies. We outline a "double-Bayesian" mathematical modelling approach, whereby, a Bayesian machine learning model first estimates the probability of an individual having an oral HPV infection, given OPSCC and other covariate information. The model is then inverted using Bayes' theorem to reverse the probability relationship. We use data from the Surveillance, Epidemiology, and End Results (SEER) cancer registry, SEER Head and Neck with HPV Database and the National Health and Nutrition Examination Surveys (NHANES), representing the adult population in the United States to derive our model. The model contains 8,106 OPSCC patients of which 73.0% had an oral HPV infection. When stratified by age, sex, marital status and race/ethnicity, the model estimated a higher conditional probability for developing OPSCCs given an oral HPV infection in non-Hispanic White males and females compared to other races/ethnicities. The proposed Bayesian model represents a proof-of-concept of a natural history model of HPV driven OPSCCs and outlines a strategy for estimating the conditional probability of an individual's risk of developing OPSCC following an oral HPV infection.


Subject(s)
Alphapapillomavirus/pathogenicity , Bayes Theorem , Machine Learning , Oropharyngeal Neoplasms/virology , Probability , Squamous Cell Carcinoma of Head and Neck/virology , Adult , Aged , Female , Humans , Male , Middle Aged , Oropharyngeal Neoplasms/epidemiology , SEER Program , Squamous Cell Carcinoma of Head and Neck/epidemiology
6.
Sci Rep ; 10(1): 15816, 2020 09 25.
Article in English | MEDLINE | ID: mdl-32978550

ABSTRACT

Stable isotope mixing models are regularly used to provide probabilistic estimates of source contributions to dietary mixtures. Whilst Bayesian implementations of isotope mixing models have become prominent, the use of appropriate diet-tissue discrimination factors (DTDFs) remains as the least resolved aspect. The DTDFs are critical in providing accurate inferences from these models. Using both simulated and laboratory-based experimental data, this study provides conceptual and practical applications of isotope mixing models by exploring the role of DTDFs. The experimental study used Mozambique Tilapia Oreochromis mossambicus, a freshwater fish, to explore multi-tissue variations in isotopic incorporation patterns, and to evaluate isotope mixing model outputs based on the experiment- and literature-based DTDFs. Isotope incorporation patterns were variable for both muscle and fin tissues among the consumer groups that fed diet sources with different stable isotope values. Application of literature-based DTDFs in isotope mixing models consistently underestimated the dietary proportions of all single-source consumer groups. In contrast, application of diet-specific DTDFs provided better dietary estimates for single-source consumer groups. Variations in the proportional contributions of the individual sources were, nevertheless, observed for the mixed-source consumer group, which suggests that isotope assimilation of the individual food sources may have been influenced by other underlying physiological processes. This study provides evidence that stable isotope values from different diet sources exhibit large variations as they become incorporated into consumer tissues. This suggests that the application of isotope mixing models requires consideration of several aspects such as diet type and the associated biological processes that may influence DTDFs.


Subject(s)
Carbon Isotopes/analysis , Diet , Feeding Behavior , Fishes/physiology , Models, Statistical , Nitrogen Isotopes/analysis , Animals , Fresh Water
7.
BMC Genomics ; 21(1): 205, 2020 Mar 04.
Article in English | MEDLINE | ID: mdl-32131735

ABSTRACT

BACKGROUND: The trading of individual animal genotype information often involves only the exchange of the called genotypes and not necessarily the additional information required to effectively call structural variants. The main aim here was to determine if it is possible to impute copy number variants (CNVs) using the flanking single nucleotide polymorphism (SNP) haplotype structure in cattle. While this objective was achieved using high-density genotype panels (i.e., 713,162 SNPs), a secondary objective investigated the concordance of CNVs called with this high-density genotype panel compared to CNVs called from a medium-density panel (i.e., 45,677 SNPs in the present study). This is the first study to compare CNVs called from high-density and medium-density SNP genotypes from the same animals. High (and medium-density) genotypes were available on 991 Holstein-Friesian, 1015 Charolais, and 1394 Limousin bulls. The concordance between CNVs called from the medium-density and high-density genotypes were calculated separately for each animal. A subset of CNVs which were called from the high-density genotypes was selected for imputation. Imputation was carried out separately for each breed using a set of high-density SNPs flanking the midpoint of each CNV. A CNV was deemed to be imputed correctly when the called copy number matched the imputed copy number. RESULTS: For 97.0% of CNVs called from the high-density genotypes, the corresponding genomic position on the medium-density of the animal did not contain a called CNV. The average accuracy of imputation for CNV deletions was 0.281, with a standard deviation of 0.286. The average accuracy of imputation of the CNV normal state, i.e. the absence of a CNV, was 0.982 with a standard deviation of 0.022. Two CNV duplications were imputed in the Charolais, a single CNV duplication in the Limousins, and a single CNV duplication in the Holstein-Friesians; in all cases the CNV duplications were incorrectly imputed. CONCLUSION: The vast majority of CNVs called from the high-density genotypes were not detected using the medium-density genotypes. Furthermore, CNVs cannot be accurately predicted from flanking SNP haplotypes, at least based on the imputation algorithms routinely used in cattle, and using the SNPs currently available on the high-density genotype panel.


Subject(s)
Computational Biology/methods , DNA Copy Number Variations , Polymorphism, Single Nucleotide , Algorithms , Alleles , Animals , Cattle , Gene Frequency , Genotype , Haplotypes
8.
Front Plant Sci ; 10: 558, 2019.
Article in English | MEDLINE | ID: mdl-31134112

ABSTRACT

Stomatal conductance (g s) in terrestrial vegetation regulates the uptake of atmospheric carbon dioxide for photosynthesis and water loss through transpiration, closely linking the biosphere and atmosphere and influencing climate. Yet, the range and pattern of g s in plants from natural ecosystems across broad geographic, climatic, and taxonomic ranges remains poorly quantified. Furthermore, attempts to characterize g s on such scales have predominantly relied upon meta-analyses compiling data from many different studies. This approach may be inherently problematic as it combines data collected using unstandardized protocols, sometimes over decadal time spans, and from different habitat groups. Using a standardized protocol, we measured leaf-level g s using porometry in 218 C3 woody angiosperm species in natural ecosystems representing seven bioclimatic zones. The resulting dataset of 4273 g s measurements, which we call STraits (Stomatal Traits), was used to determine patterns in maximum g s (g smax) across bioclimatic zones and whether there was similarity in the mean g smax of C3 woody angiosperms across ecosystem types. We also tested for differential g smax in two broadly defined habitat groups - open-canopy and understory-subcanopy - within and across bioclimatic zones. We found strong convergence in mean g smax of C3 woody angiosperms in the understory-subcanopy habitats across six bioclimatic zones, but not in open-canopy habitats. Mean g smax in open-canopy habitats (266 ± 100 mmol m-2 s-1) was significantly higher than in understory-subcanopy habitats (233 ± 86 mmol m-2 s-1). There was also a central tendency in the overall dataset to operate toward a g smax of ∼250 mmol m-2 s-1. We suggest that the observed convergence in mean g smax of C3 woody angiosperms in the understory-subcanopy is due to a buffering of g smax against macroclimate effects which will lead to differential response of C3 woody angiosperm vegetation in these two habitats to future global change. Therefore, it will be important for future studies of g smax to categorize vegetation according to habitat group.

9.
J Anim Ecol ; 88(3): 405-415, 2019 03.
Article in English | MEDLINE | ID: mdl-30548858

ABSTRACT

Pelagic and benthic systems usually interact, but their dynamics and production rates differ. Such differences influence the distribution, reproductive cycles, growth rates, stability and productivity of the consumers they support. Consumer preferences for, and dependence on, pelagic or benthic production are governed by the availability of these sources of production and consumer life history, distribution, habitat, behavioural ecology, ontogenetic stage and morphology. Diet studies may demonstrate the extent to which consumers feed on prey in pelagic or benthic environments. But they do not discriminate benthic production directly supported by phytoplankton from benthic production recycled through detrital pathways. The former will track the dynamics of phytoplankton production more closely than the latter. We develop and apply a new analytical method that uses carbon (C) and sulphur (S) natural abundance stable isotope data to assess the relative contribution of pelagic and benthic pathways to fish consumer production. For 13 species of fish that dominate community biomass in the northern North Sea (estimated >90% of total biomass), relative modal use of pelagic pathways ranged from <25% to >85%. Use of both C and S isotopes as opposed to just C reduced uncertainty in relative modal use estimates. Temporal comparisons of relative modal use of pelagic and benthic pathways revealed similar ranking of species dependency over 4 years, but annual variation in relative modal use within species was typically 10%-40%. For the total fish consumer biomass in the study region, the C and S method linked approximately 70% and 30% of biomass to pelagic and benthic pathways, respectively. As well as providing a new method to define consumers' links to pelagic and benthic pathways, our results demonstrate that a substantial proportion of fish biomass, and by inference production, in the northern North Sea is supported by production that has passed through transformations on the seabed.


Subject(s)
Ecosystem , Food Chain , Animals , Carbon , Ecology , Fishes
10.
Stat Comput ; 28(4): 869-890, 2018 Jul.
Article in English | MEDLINE | ID: mdl-30449953

ABSTRACT

Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the base learners. However for datasets where the number of variables p is large the algorithm can become inefficient and computationally expensive. Another method which is popular for high dimensional data is random forests, a machine learning algorithm which grows trees using a greedy search for the best split points. However its default implementation does not produce probabilistic estimates or predictions. We propose an alternative fitting algorithm for BART called BART-BMA, which uses Bayesian Model Averaging and a greedy search algorithm to obtain a posterior distribution more efficiently than BART for datasets with large p. BART-BMA incorporates elements of both BART and random forests to offer a model-based algorithm which can deal with high-dimensional data. We have found that BART-BMA can be run in a reasonable time on a standard laptop for the "small n large p" scenario which is common in many areas of bioinformatics. We showcase this method using simulated data and data from two real proteomic experiments, one to distinguish between patients with cardiovascular disease and controls and another to classify aggressive from non-aggressive prostate cancer. We compare our results to their main competitors. Open source code written in R and Rcpp to run BART-BMA can be found at: https://github.com/BelindaHernandez/BART-BMA.git.

11.
J Anim Sci ; 96(10): 4112-4124, 2018 Sep 29.
Article in English | MEDLINE | ID: mdl-30239746

ABSTRACT

Copy number variants (CNVs) are a form of genomic variation that changes the structure of the genome through deletion or duplication of stretches of DNA. The objective of the present study was to characterize CNVs in a large multibreed population of beef and dairy bulls. The CNVs were called on the autosomes of 5,551 cattle from 22 different beef and dairy breeds, using 2 freely available software suites, QuantiSNP and PennCNV. All CNVs were classified into either deletions or duplications. The median concordance between PennCNV and QuantiSNP, per animal, was 18.5% for deletions and 0% for duplications. The low concordance rate between PennCNV and QuantiSNP indicated that neither algorithm, by itself, could identify all CNVs in the population. In total, PennCNV and QuantiSNP collectively identified 747,129 deletions and 432,523 duplications; 80.2% of all duplications and 69.1% of all deletions were present only once in the population. Only 0.154% of all CNVs identified were present in more than 50 animals in the population. The distribution of the percentage of the autosomes that were composed of deletions, per animal, was positively skewed, as was the distribution for the percentage of the autosomes that were composed of duplications, per animal. The first quartile, median, and third quartile of the distribution of the percentage of the autosomes that were composed of deletions were 0.019%, 0.037%, and 0.201%, respectively. The first quartile, median, and third quartile of the distribution of the percentage of the autosomes that were composed of duplications were 0.013%, 0.028%, and 0.076%, respectively. The distributions of the number of deletions and duplications per animal were both positively skewed. The interquartile range for the number of deletions per animal in the population was between 16 and 117, whereas for duplications it was between 8 and 23. Per animal, there tended to be twice as many deletions as duplications. The distribution of the length of deletions was positively skewed, as was the distribution of the length of duplications. The interquartile range for the length of deletions in the population was between 25 and 101 kb, and for duplications the interquartile range was between 46 and 235 kb. Per animal, duplications tended to be twice as long as deletions. This study provides a description of the characteristics and distribution of CNVs in a large multibreed population of beef and dairy cattle.


Subject(s)
DNA Copy Number Variations/genetics , Genome-Wide Association Study/veterinary , Genome/genetics , Algorithms , Animals , Cattle , Genomics , Genotype , Humans , Male , Polymorphism, Single Nucleotide , Red Meat , Software
12.
PeerJ ; 6: e5096, 2018.
Article in English | MEDLINE | ID: mdl-29942712

ABSTRACT

The ongoing evolution of tracer mixing models has resulted in a confusing array of software tools that differ in terms of data inputs, model assumptions, and associated analytic products. Here we introduce MixSIAR, an inclusive, rich, and flexible Bayesian tracer (e.g., stable isotope) mixing model framework implemented as an open-source R package. Using MixSIAR as a foundation, we provide guidance for the implementation of mixing model analyses. We begin by outlining the practical differences between mixture data error structure formulations and relate these error structures to common mixing model study designs in ecology. Because Bayesian mixing models afford the option to specify informative priors on source proportion contributions, we outline methods for establishing prior distributions and discuss the influence of prior specification on model outputs. We also discuss the options available for source data inputs (raw data versus summary statistics) and provide guidance for combining sources. We then describe a key advantage of MixSIAR over previous mixing model software-the ability to include fixed and random effects as covariates explaining variability in mixture proportions and calculate relative support for multiple models via information criteria. We present a case study of Alligator mississippiensis diet partitioning to demonstrate the power of this approach. Finally, we conclude with a discussion of limitations to mixing model applications. Through MixSIAR, we have consolidated the disparate array of mixing model tools into a single platform, diversified the set of available parameterizations, and provided developers a platform upon which to continue improving mixing model analyses in the future.

13.
Nat Commun ; 8: 16019, 2017 07 19.
Article in English | MEDLINE | ID: mdl-28722009

ABSTRACT

The devastating 2004 Indian Ocean tsunami caught millions of coastal residents and the scientific community off-guard. Subsequent research in the Indian Ocean basin has identified prehistoric tsunamis, but the timing and recurrence intervals of such events are uncertain. Here we present an extraordinary 7,400 year stratigraphic sequence of prehistoric tsunami deposits from a coastal cave in Aceh, Indonesia. This record demonstrates that at least 11 prehistoric tsunamis struck the Aceh coast between 7,400 and 2,900 years ago. The average time period between tsunamis is about 450 years with intervals ranging from a long, dormant period of over 2,000 years, to multiple tsunamis within the span of a century. Although there is evidence that the likelihood of another tsunamigenic earthquake in Aceh province is high, these variable recurrence intervals suggest that long dormant periods may follow Sunda megathrust ruptures as large as that of the 2004 Indian Ocean tsunami.

14.
Science ; 355(6322): 276-279, 2017 01 20.
Article in English | MEDLINE | ID: mdl-28104887

ABSTRACT

The last interglaciation (LIG, 129 to 116 thousand years ago) was the most recent time in Earth's history when global mean sea level was substantially higher than it is at present. However, reconstructions of LIG global temperature remain uncertain, with estimates ranging from no significant difference to nearly 2°C warmer than present-day temperatures. Here we use a network of sea-surface temperature (SST) records to reconstruct spatiotemporal variability in regional and global SSTs during the LIG. Our results indicate that peak LIG global mean annual SSTs were 0.5 ± 0.3°C warmer than the climatological mean from 1870 to 1889 and indistinguishable from the 1995 to 2014 mean. LIG warming in the extratropical latitudes occurred in response to boreal insolation and the bipolar seesaw, whereas tropical SSTs were slightly cooler than the 1870 to 1889 mean in response to reduced mean annual insolation.

15.
Ir J Med Sci ; 186(1): 201-205, 2017 Feb.
Article in English | MEDLINE | ID: mdl-27189711

ABSTRACT

INTRODUCTION: Suicide is criminalized in more than 100 countries around the world. A dearth of research exists into the effect of suicide legislation on suicide rates and available statistics are mixed. MATERIALS AND METHODS: This study investigates 10,353 suicide deaths in Ireland that took place between 1970 and 2000. Irish 1970-2000 annual suicide data were obtained from the Central Statistics Office and modelled via a negative binomial regression approach. We examined the effect of suicide legislation on different age groups and on both sexes. We used Bonferroni correction for multiple modelling. Statistical analysis was performed using the R statistical package version 3.1.2. The coefficient for the effect of suicide act on overall suicide deaths was -9.094 (95 % confidence interval (CI) -34.086 to 15.899), statistically non-significant (p = 0.476). The coefficient for the effect suicide act on undetermined deaths was statistically significant (p < 0.001) and was estimated to be -644.4 (95 % CI -818.6 to -469.9). CONCLUSION: The results of our study indicate that legalization of suicide is not associated with a significant increase in subsequent suicide deaths. However, undetermined death verdict rates have significantly dropped following legalization of suicide.


Subject(s)
Cause of Death/trends , Suicide/trends , Adolescent , Adult , Aged , Child , Child, Preschool , Crime , Female , Humans , Infant , Ireland , Male , Middle Aged , Suicide/legislation & jurisprudence , Young Adult
16.
BMC Bioinformatics ; 17: 126, 2016 Mar 11.
Article in English | MEDLINE | ID: mdl-26968614

ABSTRACT

BACKGROUND: Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. RESULTS: We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. CONCLUSIONS: GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines.


Subject(s)
Computational Biology/methods , Gene Ontology , Software , Supervised Machine Learning , Transcriptome , RNA, Messenger
17.
BJPsych Open ; 1(2): 164-165, 2015 Oct.
Article in English | MEDLINE | ID: mdl-27703742

ABSTRACT

SUMMARY: Since the proposition of the social integration theory by Émile Durkheim, macro-sociological changes have been speculated to affect suicide rates. This study investigates the effect of the First World War on Irish suicide rates. We applied an interrupted time series design of 1864-1921 annual Irish suicide rates. The 1864-1913 suicide rates exhibited a slow-rising trend with a sharp decline from the year 1914 onwards. The odds for death by suicide for males during the 1914-1918 period was 0.811 (95% CI 0.768-0.963). Irish rates of suicide were significantly reduced during the First World War, most notably for males. DECLARATION OF INTEREST: None. COPYRIGHT AND USAGE: © 2015 The Royal College of Psychiatrists. This is an open access article distributed under the terms of the Creative Commons Non-Commercial, No Derivatives (CC BY-NC-ND) licence.

18.
J Anim Ecol ; 80(3): 595-602, 2011 May.
Article in English | MEDLINE | ID: mdl-21401589

ABSTRACT

1. The use of stable isotope data to infer characteristics of community structure and niche width of community members has become increasingly common. Although these developments have provided ecologists with new perspectives, their full impact has been hampered by an inability to statistically compare individual communities using descriptive metrics. 2. We solve these issues by reformulating the metrics in a Bayesian framework. This reformulation takes account of uncertainty in the sampled data and naturally incorporates error arising from the sampling process, propagating it through to the derived metrics. 3. Furthermore, we develop novel multivariate ellipse-based metrics as an alternative to the currently employed Convex Hull methods when applied to single community members. We show that unlike Convex Hulls, the ellipses are unbiased with respect to sample size, and their estimation via Bayesian inference allows robust comparison to be made among data sets comprising different sample sizes. 4. These new metrics, which we call SIBER (Stable Isotope Bayesian Ellipses in R), open up more avenues for direct comparison of isotopic niches across communities. The computational code to calculate the new metrics is implemented in the free-to-download package Stable Isotope Analysis for the R statistical environment.


Subject(s)
Bayes Theorem , Ecosystem , Isotopes , Models, Biological , Biota , Multivariate Analysis , Sample Size , Selection Bias
19.
PLoS One ; 5(3): e9672, 2010 Mar 12.
Article in English | MEDLINE | ID: mdl-20300637

ABSTRACT

BACKGROUND: Stable isotope analysis is increasingly being utilised across broad areas of ecology and biology. Key to much of this work is the use of mixing models to estimate the proportion of sources contributing to a mixture such as in diet estimation. METHODOLOGY: By accurately reflecting natural variation and uncertainty to generate robust probability estimates of source proportions, the application of Bayesian methods to stable isotope mixing models promises to enable researchers to address an array of new questions, and approach current questions with greater insight and honesty. CONCLUSIONS: We outline a framework that builds on recently published Bayesian isotopic mixing models and present a new open source R package, SIAR. The formulation in R will allow for continued and rapid development of this core model into an all-encompassing single analysis suite for stable isotope research.


Subject(s)
Isotopes/chemistry , Algorithms , Bayes Theorem , Biology/methods , Ecology/methods , Environmental Monitoring/methods , Markov Chains , Models, Statistical , Models, Theoretical
SELECTION OF CITATIONS
SEARCH DETAIL
...