Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
PLoS Pathog ; 18(9): e1010848, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-36149920

RESUMEN

Aneuploidy causes system-wide disruptions in the stochiometric balances of transcripts, proteins, and metabolites, often resulting in detrimental effects for the organism. The protozoan parasite Leishmania has an unusually high tolerance for aneuploidy, but the molecular and functional consequences for the pathogen remain poorly understood. Here, we addressed this question in vitro and present the first integrated analysis of the genome, transcriptome, proteome, and metabolome of highly aneuploid Leishmania donovani strains. Our analyses unambiguously establish that aneuploidy in Leishmania proportionally impacts the average transcript- and protein abundance levels of affected chromosomes, ultimately correlating with the degree of metabolic differences between closely related aneuploid strains. This proportionality was present in both proliferative and non-proliferative in vitro promastigotes. However, as in other Eukaryotes, we observed attenuation of dosage effects for protein complex subunits and in addition, non-cytoplasmic proteins. Differentially expressed transcripts and proteins between aneuploid Leishmania strains also originated from non-aneuploid chromosomes. At protein level, these were enriched for proteins involved in protein metabolism, such as chaperones and chaperonins, peptidases, and heat-shock proteins. In conclusion, our results further support the view that aneuploidy in Leishmania can be adaptive. Additionally, we believe that the high karyotype diversity in vitro and absence of classical transcriptional regulation make Leishmania an attractive model to study processes of protein homeostasis in the context of aneuploidy and beyond.


Asunto(s)
Leishmania donovani , Proteoma , Aneuploidia , Proteínas de Choque Térmico/genética , Humanos , Cariotipo , Leishmania donovani/genética , Péptido Hidrolasas/genética , Proteoma/genética
2.
Bioinformatics ; 38(22): 5007-5011, 2022 11 15.
Artículo en Inglés | MEDLINE | ID: mdl-36130276

RESUMEN

MOTIVATION: Protein sequence alignments are essential to structural, evolutionary and functional analysis, but their accuracy is often limited by sequence similarity unless molecular structures are available. Protein structures predicted at experimental grade accuracy, as achieved by AlphaFold2, could therefore have a major impact on sequence analysis. RESULTS: Here, we find that multiple sequence alignments estimated on AlphaFold2 predictions are almost as accurate as alignments estimated on experimental structures and significantly closer to the structural reference than sequence-based alignments. We also show that AlphaFold2 structural models of relatively low quality can be used to obtain highly accurate alignments. These results suggest that, besides structure modeling, AlphaFold2 encodes higher-order dependencies that can be exploited for sequence analysis. AVAILABILITY AND IMPLEMENTATION: All data, analyses and results are available on Zenodo (https://doi.org/10.5281/zenodo.7031286). The code and scripts have been deposited in GitHub (https://github.com/cbcrg/msa-af2-nf) and the various containers in (https://cloud.sylabs.io/library/athbaltzis/af2/alphafold, https://hub.docker.com/r/athbaltzis/pred). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Proteínas , Programas Informáticos , Alineación de Secuencia , Evolución Biológica
3.
Nucleic Acids Res ; 47(W1): W600-W604, 2019 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-31106365

RESUMEN

We present a new web application to query and visualize time-series behavioral data: the Pergola web-server. This server provides a user-friendly interface for exploring longitudinal behavioral data taking advantage of the Pergola Python library. Using the server, users can process the data applying some basic operations, such as binning or grouping, while formatting the data into existing genomic formats. Thanks to this repurposing of genomics standards, the application automatically renders an interactive data visualization based on sophisticated genome visualization tools. Our tool allows behavioral scientists to share, display and navigate complex behavioral data comprising multiple individuals and multiple data types, in a scalable and flexible manner. A download option allows for further analysis using genomic tools. The server can be a great resource for the field in a time where behavioral science is entering a data-intensive cycle thanks to high-throughput behavioral phenotyping platforms. Pergola is publicly available at http://pergola.crg.eu/.


Asunto(s)
Conducta , Programas Informáticos , Gráficos por Computador , Genómica , Internet
4.
Neurobiol Dis ; 127: 210-222, 2019 07.
Artículo en Inglés | MEDLINE | ID: mdl-30831192

RESUMEN

Autism spectrum disorders are early onset neurodevelopmental disorders characterized by deficits in social communication and restricted repetitive behaviors, yet they are quite heterogeneous in terms of their genetic basis and phenotypic manifestations. Recently, de novo pathogenic mutations in DYRK1A, a chromosome 21 gene associated to neuropathological traits of Down syndrome, have been identified in patients presenting a recognizable syndrome included in the autism spectrum. These mutations produce DYRK1A kinases with partial or complete absence of the catalytic domain, or they represent missense mutations located within this domain. Here, we undertook an extensive biochemical characterization of the DYRK1A missense mutations reported to date and show that most of them, but not all, result in enzymatically dead DYRK1A proteins. We also show that haploinsufficient Dyrk1a+/- mutant mice mirror the neurological traits associated with the human pathology, such as defective social interactions, stereotypic behaviors and epileptic activity. These mutant mice present altered proportions of excitatory and inhibitory neocortical neurons and synapses. Moreover, we provide evidence that alterations in the production of cortical excitatory neurons are contributing to these defects. Indeed, by the end of the neurogenic period, the expression of developmental regulated genes involved in neuron differentiation and/or activity is altered. Therefore, our data indicate that altered neocortical neurogenesis could critically affect the formation of cortical circuits, thereby contributing to the neuropathological changes in DYRK1A haploinsufficiency syndrome.


Asunto(s)
Trastorno Autístico/metabolismo , Haploinsuficiencia , Neocórtex/metabolismo , Red Nerviosa/metabolismo , Proteínas Serina-Treonina Quinasas/metabolismo , Proteínas Tirosina Quinasas/metabolismo , Conducta Social , Animales , Trastorno Autístico/genética , Conducta Animal/fisiología , Masculino , Ratones , Mutación Missense , Proteínas Serina-Treonina Quinasas/genética , Proteínas Tirosina Quinasas/genética , Quinasas DyrK
5.
Nat Methods ; 18(1): 37-39, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-33398187
6.
Bioinformatics ; 34(16): 2870-2878, 2018 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-29608657

RESUMEN

Motivation: Although seldom acknowledged explicitly, count data generated by sequencing platforms exist as compositions for which the abundance of each component (e.g. gene or transcript) is only coherently interpretable relative to other components within that sample. This property arises from the assay technology itself, whereby the number of counts recorded for each sample is constrained by an arbitrary total sum (i.e. library size). Consequently, sequencing data, as compositional data, exist in a non-Euclidean space that, without normalization or transformation, renders invalid many conventional analyses, including distance measures, correlation coefficients and multivariate statistical models. Results: The purpose of this review is to summarize the principles of compositional data analysis (CoDA), provide evidence for why sequencing data are compositional, discuss compositionally valid methods available for analyzing sequencing data, and highlight future directions with regard to this field of study. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Análisis de Secuencia , Biblioteca de Genes , Humanos , Modelos Estadísticos , Análisis de Secuencia/estadística & datos numéricos
7.
Brief Bioinform ; 17(6): 1009-1023, 2016 11.
Artículo en Inglés | MEDLINE | ID: mdl-26615024

RESUMEN

This review provides an overview on the development of Multiple sequence alignment (MSA) methods and their main applications. It is focused on progress made over the past decade. The three first sections review recent algorithmic developments for protein, RNA/DNA and genomic alignments. The fourth section deals with benchmarks and explores the relationship between empirical and simulated data, along with the impact on method developments. The last part of the review gives an overview on available MSA local reliability estimators and their dependence on various algorithmic properties of available methods.


Asunto(s)
Alineación de Secuencia , Algoritmos , ADN , Genómica , Proteínas , Reproducibilidad de los Resultados
8.
Addict Biol ; 23(2): 544-555, 2018 03.
Artículo en Inglés | MEDLINE | ID: mdl-29282813

RESUMEN

A major problem in treating obesity is the high rate of relapse to abnormal food-taking habits after maintaining an energy balanced diet. Alterations of eating behavior such as compulsive-like behavior and lack of self-control over food intake play a critical role in relapse. In this study, we used an operant paradigm of food-seeking behavior on two different diet-induced obesity models, a free-choice chocolate-mixture diet and a high-fat diet with face validity for a rapid development of obesity or for unhealthy food regularly consumed in our societies. A reduced operant performance and motivation for the hedonic value of palatable chocolate pellets was revealed in both obesity mouse models. However, only mice exposed to high-fat diet showed an increased compulsive-like behavior in the absence of the reinforcer further characterized by impaired operant learning, enhanced impulsivity and intensified inflexibility. We used principal component analysis to globally identify the specific behaviors responsible for the differences among diet groups. Learning impairment and inflexible behaviors contributed to a first principal component, explaining the largest proportion of the variance in the high-fat diet mice phenotype. Reinforcement, impulsion and compulsion were the main contributors to the second principal component explaining the differences in the chocolate-mixture mice behavioral phenotype. These behaviors were not exclusive of chocolate group because some high-fat individuals showed similar values on this component. These data indicate that extended access to hypercaloric diets differentially modifies operant behavior learning, behavioral flexibility, impulsive-like and compulsive-like behavior, and these effects were dependent on the exposure to each specific diet.


Asunto(s)
Condicionamiento Operante , Conducta Alimentaria , Alimentos , Obesidad , Animales , Conducta Animal , Chocolate , Conducta Compulsiva , Dieta Alta en Grasa , Ingestión de Alimentos , Extinción Psicológica , Conducta Impulsiva , Aprendizaje , Masculino , Ratones , Análisis de Componente Principal , Refuerzo en Psicología , Autocontrol
9.
Addict Biol ; 23(2): 531-543, 2018 03.
Artículo en Inglés | MEDLINE | ID: mdl-29318700

RESUMEN

Obesity represents an important risk factor contributing to the global burden of disease. The current obesogenic environment with easy access to calorie-dense foods is fueling this obesity epidemic. However, how these foods contribute to the progression of feeding behavior changes that lead to overeating is not well understood and needs systematic assessment. Using novel automated methods for the high-throughput screening of behavior, we here examine mice meal pattern upon long-term exposure to a free-choice chocolate-mixture diet and a high-fat diet with face validity for a rapid development of obesity induced by unhealthy food regularly consumed in our societies. We identified rapid diet-specific behavioral changes after exposure to those high-caloric diets. Mice fed with high-fat chow, showed long-lasting meal pattern disturbances, which initiate with a stable loss of circadian feeding rhythmicity. Mice receiving a chocolate-mixture showed qualitatively similar changes, though less marked, consisting in a transient disruption of the feeding behavior and the circadian feeding rhytmicity. Strikingly, compulsive-like eating behavior is triggered immediately after exposure to both high-fat food and chocolate-mixture diet, well before any changes in body weight could be observed. We propose these changes as behavioral biomarkers of prodromal states of obesity that could allow early intervention.


Asunto(s)
Chocolate , Dieta Alta en Grasa , Ingestión de Energía , Conducta Alimentaria , Obesidad , Animales , Ritmo Circadiano , Conducta Compulsiva , Alimentos , Hiperfagia , Masculino , Ratones
10.
Genome Res ; 24(12): 2077-89, 2014 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-25273068

RESUMEN

Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.


Asunto(s)
Genoma , Genómica/métodos , Alineación de Secuencia/métodos , Programas Informáticos , Animales , Biología Computacional/métodos , Simulación por Computador , Conjuntos de Datos como Asunto , Estudio de Asociación del Genoma Completo , Humanos , Mamíferos/genética , Filogenia , Reproducibilidad de los Resultados
11.
Nucleic Acids Res ; 40(7): e52, 2012 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-22230796

RESUMEN

We address the challenge of regulatory sequence alignment with a new method, Pro-Coffee, a multiple aligner specifically designed for homologous promoter regions. Pro-Coffee uses a dinucleotide substitution matrix estimated on alignments of functional binding sites from TRANSFAC. We designed a validation framework using several thousand families of orthologous promoters. This dataset was used to evaluate the accuracy for predicting true human orthologs among their paralogs. We found that whereas other methods achieve on average 73.5% accuracy, and 77.6% when trained on that same dataset, the figure goes up to 80.4% for Pro-Coffee. We then applied a novel validation procedure based on multi-species ChIP-seq data. Trained and untrained methods were tested for their capacity to correctly align experimentally detected binding sites. Whereas the average number of correctly aligned sites for two transcription factors is 284 for default methods and 316 for trained methods, Pro-Coffee achieves 331, 16.5% above the default average. We find a high correlation between a method's performance when classifying orthologs and its ability to correctly align proven binding sites. Not only has this interesting biological consequences, it also allows us to conclude that any method that is trained on the ortholog data set will result in functionally more informative alignments.


Asunto(s)
Inmunoprecipitación de Cromatina , Regiones Promotoras Genéticas , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN , Animales , Sitios de Unión , Bovinos , Perros , Evolución Molecular , Humanos , Ratones , Programas Informáticos , Factores de Transcripción/metabolismo
12.
Bioinformatics ; 28(4): 487-94, 2012 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-22334039

RESUMEN

MOTIVATION: Probabilistic approaches for inferring transcription factor binding sites (TFBSs) and regulatory motifs from DNA sequences have been developed for over two decades. Previous work has shown that prediction accuracy can be significantly improved by incorporating features such as the competition of multiple transcription factors (TFs) for binding to nearby sites, the tendency of TFBSs for co-regulated TFs to cluster and form cis-regulatory modules and explicit evolutionary modeling of conservation of TFBSs across orthologous sequences. However, currently available tools only incorporate some of these features, and significant methodological hurdles hampered their synthesis into a single consistent probabilistic framework. RESULTS: We present MotEvo, a integrated suite of Bayesian probabilistic methods for the prediction of TFBSs and inference of regulatory motifs from multiple alignments of phylogenetically related DNA sequences, which incorporates all features just mentioned. In addition, MotEvo incorporates a novel model for detecting unknown functional elements that are under evolutionary constraint, and a new robust model for treating gain and loss of TFBSs along a phylogeny. Rigorous benchmarking tests on ChIP-seq datasets show that MotEvo's novel features significantly improve the accuracy of TFBS prediction, motif inference and enhancer prediction. AVAILABILITY: Source code, a user manual and files with several example applications are available at www.swissregulon.unibas.ch.


Asunto(s)
Teorema de Bayes , Alineación de Secuencia/métodos , Factores de Transcripción/metabolismo , Animales , Secuencia de Bases , Sitios de Unión , Elementos de Facilitación Genéticos , Humanos , Filogenia , Unión Proteica
13.
Nucleic Acids Res ; 39(16): 6886-95, 2011 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-21624887

RESUMEN

We present and validate BlastR, a method for efficiently and accurately searching non-coding RNAs. Our approach relies on the comparison of di-nucleotides using BlosumR, a new log-odd substitution matrix. In order to use BlosumR for comparison, we recoded RNA sequences into protein-like sequences. We then showed that BlosumR can be used along with the BlastP algorithm in order to search non-coding RNA sequences. Using Rfam as a gold standard, we benchmarked this approach and show BlastR to be more sensitive than BlastN. We also show that BlastR is both faster and more sensitive than BlastP used with a single nucleotide log-odd substitution matrix. BlastR, when used in combination with WU-BlastP, is about 5% more accurate than WU-BlastN and about 50 times slower. The approach shown here is equally effective when combined with the NCBI-Blast package. The software is an open source freeware available from www.tcoffee.org/blastr.html.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , ARN no Traducido/química , Análisis de Secuencia de ARN , Algoritmos , Alineación de Secuencia , Programas Informáticos
14.
Methods Mol Biol ; 2231: 89-97, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33289888

RESUMEN

Many fields of biology rely on the inference of accurate multiple sequence alignments (MSA) of biological sequences. Unfortunately, the problem of assembling an MSA is NP-complete thus limiting computation to approximate solutions using heuristics solutions. The progressive algorithm is one of the most popular frameworks for the computation of MSAs. It involves pre-clustering the sequences and aligning them starting with the most similar ones. The scalability of this framework is limited, especially with respect to accuracy. We present here an alternative approach named regressive algorithm. In this framework, sequences are first clustered and then aligned starting with the most distantly related ones. This approach has been shown to greatly improve accuracy during scale-up, especially on datasets featuring 10,000 sequences or more. Another benefit is the possibility to integrate third-party clustering methods and third-party MSA aligners. The regressive algorithm has been tested on up to 1.5 million sequences, its implementation is available in the T-Coffee package.


Asunto(s)
Biología Computacional/métodos , Alineación de Secuencia/métodos , Programas Informáticos , Algoritmos , Análisis por Conglomerados , Biología Computacional/instrumentación , Alineación de Secuencia/instrumentación
16.
NAR Genom Bioinform ; 2(4): lqaa076, 2020 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-33575624

RESUMEN

Many next-generation sequencing datasets contain only relative information because of biological and technical factors that limit the total number of transcripts observed for a given sample. It is not possible to interpret any one component in isolation. The field of compositional data analysis has emerged with alternative methods for relative data based on log-ratio transforms. However, these data often contain many more features than samples, and thus require creative new ways to reduce the dimensionality of the data. The summation of parts, called amalgamation, is a practical way of reducing dimensionality, but can introduce a non-linear distortion to the data. We exploit this non-linearity to propose a powerful yet interpretable dimension method called data-driven amalgamation. Our new method, implemented in the user-friendly R package amalgam, can reduce the dimensionality of compositional data by finding amalgamations that optimally (i) preserve the distance between samples, or (ii) classify samples as diseased or not. Our benchmark on 13 real datasets confirm that these amalgamations compete with state-of-the-art methods in terms of performance, but result in new features that are easily understood: they are groups of parts added together.

17.
mSystems ; 5(2)2020 Apr 07.
Artículo en Inglés | MEDLINE | ID: mdl-32265314

RESUMEN

Since the turn of the century, technological advances have made it possible to obtain the molecular profile of any tissue in a cost-effective manner. Among these advances are sophisticated high-throughput assays that measure the relative abundances of microorganisms, RNA molecules, and metabolites. While these data are most often collected to gain new insights into biological systems, they can also be used as biomarkers to create clinically useful diagnostic classifiers. How best to classify high-dimensional -omics data remains an area of active research. However, few explicitly model the relative nature of these data and instead rely on cumbersome normalizations. This report (i) emphasizes the relative nature of health biomarkers, (ii) discusses the literature surrounding the classification of relative data, and (iii) benchmarks how different transformations perform for regularized logistic regression across multiple biomarker types. We show how an interpretable set of log contrasts, called balances, can prepare data for classification. We propose a simple procedure, called discriminative balance analysis, to select groups of 2 and 3 bacteria that can together discriminate between experimental conditions. Discriminative balance analysis is a fast, accurate, and interpretable alternative to data normalization.IMPORTANCE High-throughput sequencing provides an easy and cost-effective way to measure the relative abundance of bacteria in any environmental or biological sample. When these samples come from humans, the microbiome signatures can act as biomarkers for disease prediction. However, because bacterial abundance is measured as a composition, the data have unique properties that make conventional analyses inappropriate. To overcome this, analysts often use cumbersome normalizations. This article proposes an alternative method that identifies pairs and trios of bacteria whose stoichiometric presence can differentiate between diseased and nondiseased samples. By using interpretable log contrasts called balances, we developed an entirely normalization-free classification procedure that reduces the feature space and improves the interpretability, without sacrificing classifier performance.

18.
Nucleic Acids Res ; 35(Database issue): D127-31, 2007 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-17130146

RESUMEN

SwissRegulon (http://www.swissregulon.unibas.ch) is a database containing genome-wide annotations of regulatory sites in the intergenic regions of genomes. The regulatory site annotations are produced using a number of recently developed algorithms that operate on multiple alignments of orthologous intergenic regions from related genomes in combination with, whenever available, known sites from the literature, and ChIP-on-chip binding data. Currently SwissRegulon contains annotations for yeast and 17 prokaryotic genomes. The database provides information about the sequence, location, orientation, posterior probability and, whenever available, binding factor of each annotated site. To enable easy viewing of the regulatory site annotations in the context of other features annotated on the genomes, the sites are displayed using the GBrowse genome browser interface and can be queried based on any annotated genomic feature. The database can also be queried for regulons, i.e. sites bound by a common factor.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Elementos Reguladores de la Transcripción , Regulón , Factores de Transcripción/metabolismo , Algoritmos , Bacterias/genética , Sitios de Unión , ADN Intergénico/química , Genómica , Internet , Interfaz Usuario-Computador , Levaduras/genética
19.
Bio Protoc ; 9(14): e3308, 2019 Jul 20.
Artículo en Inglés | MEDLINE | ID: mdl-33654818

RESUMEN

Obesity is an important health problem with a strong environmental component that is acquiring pandemic proportion. The high availability of caloric dense foods promotes overeating potentially causing obesity. Animal models are key to validate novel therapeutic strategies, but researchers must carefully select the appropriate model to draw the right conclusions. Obesity is defined by an increased body mass index greater than 30 and characterized by an excess of adipose tissue. However, the regulation of food intake involves a close interrelationship between homeostatic and non-homeostatic factors. Studies in animal models have shown that intermittent access to sweetened or calorie-dense foods induces changes in feeding behavior. However, these studies are focused mainly on the final outcome (obesity) rather than on the primary dysfunction underlying the overeating of palatable foods. We describe a protocol to study overeating in mice using diet-induced obesity (DIO). This method can be applied to free choice between palatable food and a standard rodent chow or to forced intake of calorie-dense and/or palatable diets. Exposure to such diets is sufficient to promote changes in meal pattern that we register and analyze during the period of weight gain allowing the longitudinal characterization of feeding behavior in mice. Abnormal eating behaviors such as binge eating or snacking, behavioral alterations commonly observed in obese humans, can be detected using our protocol. In the free-choice procedure, mice develop a preference for the rewarding palatable food showing the reinforcing effect of this diet. Compulsive components of feeding are reflected by maintenance of feeding despite an adverse bitter taste caused by adulteration with quinine and by the negligence of standard chow when access to palatable food is ceased or temporally limited. Our strategy also enables to identify compulsive overeating in mice under a high-caloric regime by using limited food access and finally, we propose complementary behavioral tests to confirm the non-homeostatic food-taking triggered by these foods. Finally, we describe how to computationally explore large longitudinal behavioral datasets.

20.
Gigascience ; 8(9)2019 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-31544212

RESUMEN

BACKGROUND: Next-generation sequencing (NGS) has made it possible to determine the sequence and relative abundance of all nucleotides in a biological or environmental sample. A cornerstone of NGS is the quantification of RNA or DNA presence as counts. However, these counts are not counts per se: their magnitude is determined arbitrarily by the sequencing depth, not by the input material. Consequently, counts must undergo normalization prior to use. Conventional normalization methods require a set of assumptions: they assume that the majority of features are unchanged and that all environments under study have the same carrying capacity for nucleotide synthesis. These assumptions are often untestable and may not hold when heterogeneous samples are compared. RESULTS: Methods developed within the field of compositional data analysis offer a general solution that is assumption-free and valid for all data. Herein, we synthesize the extant literature to provide a concise guide on how to apply compositional data analysis to NGS count data. CONCLUSIONS: In highlighting the limitations of total library size, effective library size, and spike-in normalizations, we propose the log-ratio transformation as a general solution to answer the question, "Relative to some important activity of the cell, what is changing?"


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Animales , Secuencia de Bases , Células Dendríticas/efectos de los fármacos , Células Dendríticas/metabolismo , Biblioteca de Genes , Lipopolisacáridos/farmacología , Espectrometría de Masas , Ratones , ARN Mensajero/metabolismo , Análisis de la Célula Individual , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA