RESUMO
MOTIVATION: We present Eagle, a new method for multi-locus association mapping. The motivation for developing Eagle was to make multi-locus association mapping 'easy' and the method-of-choice. Eagle's strengths are that it (i) is considerably more powerful than single-locus association mapping, (ii) does not suffer from multiple testing issues, (iii) gives results that are immediately interpretable and (iv) has a computational footprint comparable to single-locus association mapping. RESULTS: By conducting a large simulation study, we will show that Eagle finds true and avoids false single-nucleotide polymorphism trait associations better than competing single- and multi-locus methods. We also analyze data from a published mouse study. Eagle found over 50% more validated findings than the state-of-the-art single-locus method. AVAILABILITY AND IMPLEMENTATION: Eagle has been implemented as an R package, with a browser-based Graphical User Interface for users less familiar with R. It is freely available via the CRAN website at https://cran.r-project.org. Videos, Quick Start guides, FAQs and Demos are available via the Eagle website http://eagle.r-forge.r-project.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Águias , Animais , Genoma , Estudo de Associação Genômica Ampla , Camundongos , Polimorfismo de Nucleotídeo Único , SoftwareRESUMO
BACKGROUND: Batch effects are a persistent and pervasive form of measurement noise which undermine the scientific utility of high-throughput genomic datasets. At their most benign, they reduce the power of statistical tests resulting in actual effects going unidentified. At their worst, they constitute confounds and render datasets useless. Attempting to remove batch effects will result in some of the biologically meaningful component of the measurement (i.e. signal) being lost. We present and benchmark a novel technique, called Harman. Harman maximises the removal of batch noise with the constraint that the risk of also losing biologically meaningful component of the measurement is kept to a fraction which is set by the user. RESULTS: Analyses of three independent publically available datasets reveal that Harman removes more batch noise and preserves more signal at the same time, than the current leading technique. Results also show that Harman is able to identify and remove batch effects no matter what their relative size compared to other sources of variation in the dataset. Of particular advantage for meta-analyses and data integration is Harman's superior consistency in achieving comparable noise suppression - signal preservation trade-offs across multiple datasets, with differing number of treatments, replicates and processing batches. CONCLUSION: Harman's ability to better remove batch noise, and better preserve biologically meaningful signal simultaneously within a single study, and maintain the user-set trade-off between batch noise rejection and signal preservation across different studies makes it an effective alternative method to deal with batch effects in high-throughput genomic datasets. Harman is flexible in terms of the data types it can process. It is available publically as an R package ( https://bioconductor.org/packages/release/bioc/html/Harman.html ), as well as a compiled Matlab package ( http://www.bioinformatics.csiro.au/harman/ ) which does not require a Matlab license to run.
Assuntos
Genômica/métodos , Análise de Componente Principal/métodos , Análise de Sequência de RNA/métodos , Humanos , Armazenamento e Recuperação da InformaçãoRESUMO
Purpose: We investigate how an intrinsic speckle tracking approach to speckle-based x-ray imaging is used to extract an object's effective dark-field (DF) signal, which is capable of providing object information in three dimensions. Approach: The effective DF signal was extracted using a Fokker-Planck type formalism, which models the deformations of illuminating reference beam speckles due to both coherent and diffusive scatter from the sample. Here, we assumed that (a) small-angle scattering fans at the exit surface of the sample are rotationally symmetric and (b) the object has both attenuating and refractive properties. The associated inverse problem of extracting the effective DF signal was numerically stabilized using a "weighted determinants" approach. Results: Effective DF projection images, as well as the DF tomographic reconstructions of the wood sample, are presented. DF tomography was performed using a filtered back projection reconstruction algorithm. The DF tomographic reconstructions of the wood sample provided complementary, and otherwise inaccessible, information to augment the phase contrast reconstructions, which were also computed. Conclusions: An intrinsic speckle tracking approach to speckle-based imaging can tomographically reconstruct an object's DF signal at a low sample exposure and with a simple experimental setup. The obtained DF reconstructions have an image quality comparable to alternative x-ray DF techniques.
RESUMO
Eagle is an R package for multi-locus association mapping on a genome-wide scale. It is unlike other multi-locus packages in that it is easy to use for R users and non-users alike. It has two modes of use, command line and graphical user interface. Eagle is fully documented and has its own supporting website, http://eagle.r-forge.r-project.org/index.html. Eagle is a significant improvement over the method-of-choice, single-locus association mapping. It has greater power to detect SNP-trait associations. It is based on model selection, linear mixed models, and a clever idea on how random effects can be used to identify SNP-trait associations. Through an example with real mouse data, we demonstrate Eagle's ability to bring clarity and increased insight to single-locus findings. Initially, we see Eagle complementing single-locus analyses. However, over time, we hope the community will make, increasingly, multi-locus association mapping their method-of-choice for the analysis of genome-wide association study data.
Assuntos
Águias , Estudo de Associação Genômica Ampla , Animais , Mapeamento Cromossômico , Genoma , Camundongos , Polimorfismo de Nucleotídeo ÚnicoRESUMO
De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model organisms' of ecological and evolutionary importance, cancer samples or the microbiome. In this protocol we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-seq data in non-model organisms. We also present Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes. In the procedure, we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sourceforge.net. The run time of this protocol is highly dependent on the size and complexity of data to be analyzed. The example data set analyzed in the procedure detailed herein can be processed in less than 5 h.