Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
1.
PLoS Pathog ; 18(9): e1010848, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-36149920

RESUMO

Aneuploidy causes system-wide disruptions in the stochiometric balances of transcripts, proteins, and metabolites, often resulting in detrimental effects for the organism. The protozoan parasite Leishmania has an unusually high tolerance for aneuploidy, but the molecular and functional consequences for the pathogen remain poorly understood. Here, we addressed this question in vitro and present the first integrated analysis of the genome, transcriptome, proteome, and metabolome of highly aneuploid Leishmania donovani strains. Our analyses unambiguously establish that aneuploidy in Leishmania proportionally impacts the average transcript- and protein abundance levels of affected chromosomes, ultimately correlating with the degree of metabolic differences between closely related aneuploid strains. This proportionality was present in both proliferative and non-proliferative in vitro promastigotes. However, as in other Eukaryotes, we observed attenuation of dosage effects for protein complex subunits and in addition, non-cytoplasmic proteins. Differentially expressed transcripts and proteins between aneuploid Leishmania strains also originated from non-aneuploid chromosomes. At protein level, these were enriched for proteins involved in protein metabolism, such as chaperones and chaperonins, peptidases, and heat-shock proteins. In conclusion, our results further support the view that aneuploidy in Leishmania can be adaptive. Additionally, we believe that the high karyotype diversity in vitro and absence of classical transcriptional regulation make Leishmania an attractive model to study processes of protein homeostasis in the context of aneuploidy and beyond.


Assuntos
Leishmania donovani , Proteoma , Aneuploidia , Proteínas de Choque Térmico/genética , Humanos , Cariótipo , Leishmania donovani/genética , Peptídeo Hidrolases/genética , Proteoma/genética
2.
Bioinformatics ; 38(22): 5007-5011, 2022 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-36130276

RESUMO

MOTIVATION: Protein sequence alignments are essential to structural, evolutionary and functional analysis, but their accuracy is often limited by sequence similarity unless molecular structures are available. Protein structures predicted at experimental grade accuracy, as achieved by AlphaFold2, could therefore have a major impact on sequence analysis. RESULTS: Here, we find that multiple sequence alignments estimated on AlphaFold2 predictions are almost as accurate as alignments estimated on experimental structures and significantly closer to the structural reference than sequence-based alignments. We also show that AlphaFold2 structural models of relatively low quality can be used to obtain highly accurate alignments. These results suggest that, besides structure modeling, AlphaFold2 encodes higher-order dependencies that can be exploited for sequence analysis. AVAILABILITY AND IMPLEMENTATION: All data, analyses and results are available on Zenodo (https://doi.org/10.5281/zenodo.7031286). The code and scripts have been deposited in GitHub (https://github.com/cbcrg/msa-af2-nf) and the various containers in (https://cloud.sylabs.io/library/athbaltzis/af2/alphafold, https://hub.docker.com/r/athbaltzis/pred). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteínas , Software , Alinhamento de Sequência , Evolução Biológica
3.
Nucleic Acids Res ; 47(W1): W600-W604, 2019 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-31106365

RESUMO

We present a new web application to query and visualize time-series behavioral data: the Pergola web-server. This server provides a user-friendly interface for exploring longitudinal behavioral data taking advantage of the Pergola Python library. Using the server, users can process the data applying some basic operations, such as binning or grouping, while formatting the data into existing genomic formats. Thanks to this repurposing of genomics standards, the application automatically renders an interactive data visualization based on sophisticated genome visualization tools. Our tool allows behavioral scientists to share, display and navigate complex behavioral data comprising multiple individuals and multiple data types, in a scalable and flexible manner. A download option allows for further analysis using genomic tools. The server can be a great resource for the field in a time where behavioral science is entering a data-intensive cycle thanks to high-throughput behavioral phenotyping platforms. Pergola is publicly available at http://pergola.crg.eu/.


Assuntos
Comportamento , Software , Gráficos por Computador , Genômica , Internet
4.
Neurobiol Dis ; 127: 210-222, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-30831192

RESUMO

Autism spectrum disorders are early onset neurodevelopmental disorders characterized by deficits in social communication and restricted repetitive behaviors, yet they are quite heterogeneous in terms of their genetic basis and phenotypic manifestations. Recently, de novo pathogenic mutations in DYRK1A, a chromosome 21 gene associated to neuropathological traits of Down syndrome, have been identified in patients presenting a recognizable syndrome included in the autism spectrum. These mutations produce DYRK1A kinases with partial or complete absence of the catalytic domain, or they represent missense mutations located within this domain. Here, we undertook an extensive biochemical characterization of the DYRK1A missense mutations reported to date and show that most of them, but not all, result in enzymatically dead DYRK1A proteins. We also show that haploinsufficient Dyrk1a+/- mutant mice mirror the neurological traits associated with the human pathology, such as defective social interactions, stereotypic behaviors and epileptic activity. These mutant mice present altered proportions of excitatory and inhibitory neocortical neurons and synapses. Moreover, we provide evidence that alterations in the production of cortical excitatory neurons are contributing to these defects. Indeed, by the end of the neurogenic period, the expression of developmental regulated genes involved in neuron differentiation and/or activity is altered. Therefore, our data indicate that altered neocortical neurogenesis could critically affect the formation of cortical circuits, thereby contributing to the neuropathological changes in DYRK1A haploinsufficiency syndrome.


Assuntos
Transtorno Autístico/metabolismo , Haploinsuficiência , Neocórtex/metabolismo , Rede Nervosa/metabolismo , Proteínas Serina-Treonina Quinases/metabolismo , Proteínas Tirosina Quinases/metabolismo , Comportamento Social , Animais , Transtorno Autístico/genética , Comportamento Animal/fisiologia , Masculino , Camundongos , Mutação de Sentido Incorreto , Proteínas Serina-Treonina Quinases/genética , Proteínas Tirosina Quinases/genética , Quinases Dyrk
5.
Nat Methods ; 18(1): 37-39, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33398187
6.
Bioinformatics ; 34(16): 2870-2878, 2018 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-29608657

RESUMO

Motivation: Although seldom acknowledged explicitly, count data generated by sequencing platforms exist as compositions for which the abundance of each component (e.g. gene or transcript) is only coherently interpretable relative to other components within that sample. This property arises from the assay technology itself, whereby the number of counts recorded for each sample is constrained by an arbitrary total sum (i.e. library size). Consequently, sequencing data, as compositional data, exist in a non-Euclidean space that, without normalization or transformation, renders invalid many conventional analyses, including distance measures, correlation coefficients and multivariate statistical models. Results: The purpose of this review is to summarize the principles of compositional data analysis (CoDA), provide evidence for why sequencing data are compositional, discuss compositionally valid methods available for analyzing sequencing data, and highlight future directions with regard to this field of study. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Análise de Sequência , Biblioteca Gênica , Humanos , Modelos Estatísticos , Análise de Sequência/estatística & dados numéricos
7.
Brief Bioinform ; 17(6): 1009-1023, 2016 11.
Artigo em Inglês | MEDLINE | ID: mdl-26615024

RESUMO

This review provides an overview on the development of Multiple sequence alignment (MSA) methods and their main applications. It is focused on progress made over the past decade. The three first sections review recent algorithmic developments for protein, RNA/DNA and genomic alignments. The fourth section deals with benchmarks and explores the relationship between empirical and simulated data, along with the impact on method developments. The last part of the review gives an overview on available MSA local reliability estimators and their dependence on various algorithmic properties of available methods.


Assuntos
Alinhamento de Sequência , Algoritmos , DNA , Genômica , Proteínas , Reprodutibilidade dos Testes
8.
Addict Biol ; 23(2): 544-555, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-29282813

RESUMO

A major problem in treating obesity is the high rate of relapse to abnormal food-taking habits after maintaining an energy balanced diet. Alterations of eating behavior such as compulsive-like behavior and lack of self-control over food intake play a critical role in relapse. In this study, we used an operant paradigm of food-seeking behavior on two different diet-induced obesity models, a free-choice chocolate-mixture diet and a high-fat diet with face validity for a rapid development of obesity or for unhealthy food regularly consumed in our societies. A reduced operant performance and motivation for the hedonic value of palatable chocolate pellets was revealed in both obesity mouse models. However, only mice exposed to high-fat diet showed an increased compulsive-like behavior in the absence of the reinforcer further characterized by impaired operant learning, enhanced impulsivity and intensified inflexibility. We used principal component analysis to globally identify the specific behaviors responsible for the differences among diet groups. Learning impairment and inflexible behaviors contributed to a first principal component, explaining the largest proportion of the variance in the high-fat diet mice phenotype. Reinforcement, impulsion and compulsion were the main contributors to the second principal component explaining the differences in the chocolate-mixture mice behavioral phenotype. These behaviors were not exclusive of chocolate group because some high-fat individuals showed similar values on this component. These data indicate that extended access to hypercaloric diets differentially modifies operant behavior learning, behavioral flexibility, impulsive-like and compulsive-like behavior, and these effects were dependent on the exposure to each specific diet.


Assuntos
Condicionamento Operante , Comportamento Alimentar , Alimentos , Obesidade , Animais , Comportamento Animal , Chocolate , Comportamento Compulsivo , Dieta Hiperlipídica , Ingestão de Alimentos , Extinção Psicológica , Comportamento Impulsivo , Aprendizagem , Masculino , Camundongos , Análise de Componente Principal , Reforço Psicológico , Autocontrole
9.
Addict Biol ; 23(2): 531-543, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-29318700

RESUMO

Obesity represents an important risk factor contributing to the global burden of disease. The current obesogenic environment with easy access to calorie-dense foods is fueling this obesity epidemic. However, how these foods contribute to the progression of feeding behavior changes that lead to overeating is not well understood and needs systematic assessment. Using novel automated methods for the high-throughput screening of behavior, we here examine mice meal pattern upon long-term exposure to a free-choice chocolate-mixture diet and a high-fat diet with face validity for a rapid development of obesity induced by unhealthy food regularly consumed in our societies. We identified rapid diet-specific behavioral changes after exposure to those high-caloric diets. Mice fed with high-fat chow, showed long-lasting meal pattern disturbances, which initiate with a stable loss of circadian feeding rhythmicity. Mice receiving a chocolate-mixture showed qualitatively similar changes, though less marked, consisting in a transient disruption of the feeding behavior and the circadian feeding rhytmicity. Strikingly, compulsive-like eating behavior is triggered immediately after exposure to both high-fat food and chocolate-mixture diet, well before any changes in body weight could be observed. We propose these changes as behavioral biomarkers of prodromal states of obesity that could allow early intervention.


Assuntos
Chocolate , Dieta Hiperlipídica , Ingestão de Energia , Comportamento Alimentar , Obesidade , Animais , Ritmo Circadiano , Comportamento Compulsivo , Alimentos , Hiperfagia , Masculino , Camundongos
10.
Genome Res ; 24(12): 2077-89, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25273068

RESUMO

Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.


Assuntos
Genoma , Genômica/métodos , Alinhamento de Sequência/métodos , Software , Animais , Biologia Computacional/métodos , Simulação por Computador , Conjuntos de Dados como Assunto , Estudo de Associação Genômica Ampla , Humanos , Mamíferos/genética , Filogenia , Reprodutibilidade dos Testes
11.
Nucleic Acids Res ; 40(7): e52, 2012 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-22230796

RESUMO

We address the challenge of regulatory sequence alignment with a new method, Pro-Coffee, a multiple aligner specifically designed for homologous promoter regions. Pro-Coffee uses a dinucleotide substitution matrix estimated on alignments of functional binding sites from TRANSFAC. We designed a validation framework using several thousand families of orthologous promoters. This dataset was used to evaluate the accuracy for predicting true human orthologs among their paralogs. We found that whereas other methods achieve on average 73.5% accuracy, and 77.6% when trained on that same dataset, the figure goes up to 80.4% for Pro-Coffee. We then applied a novel validation procedure based on multi-species ChIP-seq data. Trained and untrained methods were tested for their capacity to correctly align experimentally detected binding sites. Whereas the average number of correctly aligned sites for two transcription factors is 284 for default methods and 316 for trained methods, Pro-Coffee achieves 331, 16.5% above the default average. We find a high correlation between a method's performance when classifying orthologs and its ability to correctly align proven binding sites. Not only has this interesting biological consequences, it also allows us to conclude that any method that is trained on the ortholog data set will result in functionally more informative alignments.


Assuntos
Imunoprecipitação da Cromatina , Regiões Promotoras Genéticas , Alinhamento de Sequência/métodos , Análise de Sequência de DNA , Animais , Sítios de Ligação , Bovinos , Cães , Evolução Molecular , Humanos , Camundongos , Software , Fatores de Transcrição/metabolismo
12.
Bioinformatics ; 28(4): 487-94, 2012 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-22334039

RESUMO

MOTIVATION: Probabilistic approaches for inferring transcription factor binding sites (TFBSs) and regulatory motifs from DNA sequences have been developed for over two decades. Previous work has shown that prediction accuracy can be significantly improved by incorporating features such as the competition of multiple transcription factors (TFs) for binding to nearby sites, the tendency of TFBSs for co-regulated TFs to cluster and form cis-regulatory modules and explicit evolutionary modeling of conservation of TFBSs across orthologous sequences. However, currently available tools only incorporate some of these features, and significant methodological hurdles hampered their synthesis into a single consistent probabilistic framework. RESULTS: We present MotEvo, a integrated suite of Bayesian probabilistic methods for the prediction of TFBSs and inference of regulatory motifs from multiple alignments of phylogenetically related DNA sequences, which incorporates all features just mentioned. In addition, MotEvo incorporates a novel model for detecting unknown functional elements that are under evolutionary constraint, and a new robust model for treating gain and loss of TFBSs along a phylogeny. Rigorous benchmarking tests on ChIP-seq datasets show that MotEvo's novel features significantly improve the accuracy of TFBS prediction, motif inference and enhancer prediction. AVAILABILITY: Source code, a user manual and files with several example applications are available at www.swissregulon.unibas.ch.


Assuntos
Teorema de Bayes , Alinhamento de Sequência/métodos , Fatores de Transcrição/metabolismo , Animais , Sequência de Bases , Sítios de Ligação , Elementos Facilitadores Genéticos , Humanos , Filogenia , Ligação Proteica
13.
Nucleic Acids Res ; 39(16): 6886-95, 2011 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-21624887

RESUMO

We present and validate BlastR, a method for efficiently and accurately searching non-coding RNAs. Our approach relies on the comparison of di-nucleotides using BlosumR, a new log-odd substitution matrix. In order to use BlosumR for comparison, we recoded RNA sequences into protein-like sequences. We then showed that BlosumR can be used along with the BlastP algorithm in order to search non-coding RNA sequences. Using Rfam as a gold standard, we benchmarked this approach and show BlastR to be more sensitive than BlastN. We also show that BlastR is both faster and more sensitive than BlastP used with a single nucleotide log-odd substitution matrix. BlastR, when used in combination with WU-BlastP, is about 5% more accurate than WU-BlastN and about 50 times slower. The approach shown here is equally effective when combined with the NCBI-Blast package. The software is an open source freeware available from www.tcoffee.org/blastr.html.


Assuntos
Bases de Dados de Ácidos Nucleicos , RNA não Traduzido/química , Análise de Sequência de RNA , Algoritmos , Alinhamento de Sequência , Software
14.
Methods Mol Biol ; 2231: 89-97, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33289888

RESUMO

Many fields of biology rely on the inference of accurate multiple sequence alignments (MSA) of biological sequences. Unfortunately, the problem of assembling an MSA is NP-complete thus limiting computation to approximate solutions using heuristics solutions. The progressive algorithm is one of the most popular frameworks for the computation of MSAs. It involves pre-clustering the sequences and aligning them starting with the most similar ones. The scalability of this framework is limited, especially with respect to accuracy. We present here an alternative approach named regressive algorithm. In this framework, sequences are first clustered and then aligned starting with the most distantly related ones. This approach has been shown to greatly improve accuracy during scale-up, especially on datasets featuring 10,000 sequences or more. Another benefit is the possibility to integrate third-party clustering methods and third-party MSA aligners. The regressive algorithm has been tested on up to 1.5 million sequences, its implementation is available in the T-Coffee package.


Assuntos
Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Software , Algoritmos , Análise por Conglomerados , Biologia Computacional/instrumentação , Alinhamento de Sequência/instrumentação
16.
NAR Genom Bioinform ; 2(4): lqaa076, 2020 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-33575624

RESUMO

Many next-generation sequencing datasets contain only relative information because of biological and technical factors that limit the total number of transcripts observed for a given sample. It is not possible to interpret any one component in isolation. The field of compositional data analysis has emerged with alternative methods for relative data based on log-ratio transforms. However, these data often contain many more features than samples, and thus require creative new ways to reduce the dimensionality of the data. The summation of parts, called amalgamation, is a practical way of reducing dimensionality, but can introduce a non-linear distortion to the data. We exploit this non-linearity to propose a powerful yet interpretable dimension method called data-driven amalgamation. Our new method, implemented in the user-friendly R package amalgam, can reduce the dimensionality of compositional data by finding amalgamations that optimally (i) preserve the distance between samples, or (ii) classify samples as diseased or not. Our benchmark on 13 real datasets confirm that these amalgamations compete with state-of-the-art methods in terms of performance, but result in new features that are easily understood: they are groups of parts added together.

17.
mSystems ; 5(2)2020 Apr 07.
Artigo em Inglês | MEDLINE | ID: mdl-32265314

RESUMO

Since the turn of the century, technological advances have made it possible to obtain the molecular profile of any tissue in a cost-effective manner. Among these advances are sophisticated high-throughput assays that measure the relative abundances of microorganisms, RNA molecules, and metabolites. While these data are most often collected to gain new insights into biological systems, they can also be used as biomarkers to create clinically useful diagnostic classifiers. How best to classify high-dimensional -omics data remains an area of active research. However, few explicitly model the relative nature of these data and instead rely on cumbersome normalizations. This report (i) emphasizes the relative nature of health biomarkers, (ii) discusses the literature surrounding the classification of relative data, and (iii) benchmarks how different transformations perform for regularized logistic regression across multiple biomarker types. We show how an interpretable set of log contrasts, called balances, can prepare data for classification. We propose a simple procedure, called discriminative balance analysis, to select groups of 2 and 3 bacteria that can together discriminate between experimental conditions. Discriminative balance analysis is a fast, accurate, and interpretable alternative to data normalization.IMPORTANCE High-throughput sequencing provides an easy and cost-effective way to measure the relative abundance of bacteria in any environmental or biological sample. When these samples come from humans, the microbiome signatures can act as biomarkers for disease prediction. However, because bacterial abundance is measured as a composition, the data have unique properties that make conventional analyses inappropriate. To overcome this, analysts often use cumbersome normalizations. This article proposes an alternative method that identifies pairs and trios of bacteria whose stoichiometric presence can differentiate between diseased and nondiseased samples. By using interpretable log contrasts called balances, we developed an entirely normalization-free classification procedure that reduces the feature space and improves the interpretability, without sacrificing classifier performance.

18.
Nucleic Acids Res ; 35(Database issue): D127-31, 2007 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17130146

RESUMO

SwissRegulon (http://www.swissregulon.unibas.ch) is a database containing genome-wide annotations of regulatory sites in the intergenic regions of genomes. The regulatory site annotations are produced using a number of recently developed algorithms that operate on multiple alignments of orthologous intergenic regions from related genomes in combination with, whenever available, known sites from the literature, and ChIP-on-chip binding data. Currently SwissRegulon contains annotations for yeast and 17 prokaryotic genomes. The database provides information about the sequence, location, orientation, posterior probability and, whenever available, binding factor of each annotated site. To enable easy viewing of the regulatory site annotations in the context of other features annotated on the genomes, the sites are displayed using the GBrowse genome browser interface and can be queried based on any annotated genomic feature. The database can also be queried for regulons, i.e. sites bound by a common factor.


Assuntos
Bases de Dados de Ácidos Nucleicos , Elementos Reguladores de Transcrição , Regulon , Fatores de Transcrição/metabolismo , Algoritmos , Bactérias/genética , Sítios de Ligação , DNA Intergênico/química , Genômica , Internet , Interface Usuário-Computador , Leveduras/genética
19.
Bio Protoc ; 9(14): e3308, 2019 Jul 20.
Artigo em Inglês | MEDLINE | ID: mdl-33654818

RESUMO

Obesity is an important health problem with a strong environmental component that is acquiring pandemic proportion. The high availability of caloric dense foods promotes overeating potentially causing obesity. Animal models are key to validate novel therapeutic strategies, but researchers must carefully select the appropriate model to draw the right conclusions. Obesity is defined by an increased body mass index greater than 30 and characterized by an excess of adipose tissue. However, the regulation of food intake involves a close interrelationship between homeostatic and non-homeostatic factors. Studies in animal models have shown that intermittent access to sweetened or calorie-dense foods induces changes in feeding behavior. However, these studies are focused mainly on the final outcome (obesity) rather than on the primary dysfunction underlying the overeating of palatable foods. We describe a protocol to study overeating in mice using diet-induced obesity (DIO). This method can be applied to free choice between palatable food and a standard rodent chow or to forced intake of calorie-dense and/or palatable diets. Exposure to such diets is sufficient to promote changes in meal pattern that we register and analyze during the period of weight gain allowing the longitudinal characterization of feeding behavior in mice. Abnormal eating behaviors such as binge eating or snacking, behavioral alterations commonly observed in obese humans, can be detected using our protocol. In the free-choice procedure, mice develop a preference for the rewarding palatable food showing the reinforcing effect of this diet. Compulsive components of feeding are reflected by maintenance of feeding despite an adverse bitter taste caused by adulteration with quinine and by the negligence of standard chow when access to palatable food is ceased or temporally limited. Our strategy also enables to identify compulsive overeating in mice under a high-caloric regime by using limited food access and finally, we propose complementary behavioral tests to confirm the non-homeostatic food-taking triggered by these foods. Finally, we describe how to computationally explore large longitudinal behavioral datasets.

20.
Gigascience ; 8(9)2019 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-31544212

RESUMO

BACKGROUND: Next-generation sequencing (NGS) has made it possible to determine the sequence and relative abundance of all nucleotides in a biological or environmental sample. A cornerstone of NGS is the quantification of RNA or DNA presence as counts. However, these counts are not counts per se: their magnitude is determined arbitrarily by the sequencing depth, not by the input material. Consequently, counts must undergo normalization prior to use. Conventional normalization methods require a set of assumptions: they assume that the majority of features are unchanged and that all environments under study have the same carrying capacity for nucleotide synthesis. These assumptions are often untestable and may not hold when heterogeneous samples are compared. RESULTS: Methods developed within the field of compositional data analysis offer a general solution that is assumption-free and valid for all data. Herein, we synthesize the extant literature to provide a concise guide on how to apply compositional data analysis to NGS count data. CONCLUSIONS: In highlighting the limitations of total library size, effective library size, and spike-in normalizations, we propose the log-ratio transformation as a general solution to answer the question, "Relative to some important activity of the cell, what is changing?"


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Animais , Sequência de Bases , Células Dendríticas/efeitos dos fármacos , Células Dendríticas/metabolismo , Biblioteca Gênica , Lipopolissacarídeos/farmacologia , Espectrometria de Massas , Camundongos , RNA Mensageiro/metabolismo , Análise de Célula Única , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA