Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
PLoS Pathog ; 18(9): e1010848, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-36149920

RESUMEN

Aneuploidy causes system-wide disruptions in the stochiometric balances of transcripts, proteins, and metabolites, often resulting in detrimental effects for the organism. The protozoan parasite Leishmania has an unusually high tolerance for aneuploidy, but the molecular and functional consequences for the pathogen remain poorly understood. Here, we addressed this question in vitro and present the first integrated analysis of the genome, transcriptome, proteome, and metabolome of highly aneuploid Leishmania donovani strains. Our analyses unambiguously establish that aneuploidy in Leishmania proportionally impacts the average transcript- and protein abundance levels of affected chromosomes, ultimately correlating with the degree of metabolic differences between closely related aneuploid strains. This proportionality was present in both proliferative and non-proliferative in vitro promastigotes. However, as in other Eukaryotes, we observed attenuation of dosage effects for protein complex subunits and in addition, non-cytoplasmic proteins. Differentially expressed transcripts and proteins between aneuploid Leishmania strains also originated from non-aneuploid chromosomes. At protein level, these were enriched for proteins involved in protein metabolism, such as chaperones and chaperonins, peptidases, and heat-shock proteins. In conclusion, our results further support the view that aneuploidy in Leishmania can be adaptive. Additionally, we believe that the high karyotype diversity in vitro and absence of classical transcriptional regulation make Leishmania an attractive model to study processes of protein homeostasis in the context of aneuploidy and beyond.


Asunto(s)
Leishmania donovani , Proteoma , Aneuploidia , Proteínas de Choque Térmico/genética , Humanos , Cariotipo , Leishmania donovani/genética , Péptido Hidrolasas/genética , Proteoma/genética
2.
Bioinformatics ; 38(22): 5007-5011, 2022 11 15.
Artículo en Inglés | MEDLINE | ID: mdl-36130276

RESUMEN

MOTIVATION: Protein sequence alignments are essential to structural, evolutionary and functional analysis, but their accuracy is often limited by sequence similarity unless molecular structures are available. Protein structures predicted at experimental grade accuracy, as achieved by AlphaFold2, could therefore have a major impact on sequence analysis. RESULTS: Here, we find that multiple sequence alignments estimated on AlphaFold2 predictions are almost as accurate as alignments estimated on experimental structures and significantly closer to the structural reference than sequence-based alignments. We also show that AlphaFold2 structural models of relatively low quality can be used to obtain highly accurate alignments. These results suggest that, besides structure modeling, AlphaFold2 encodes higher-order dependencies that can be exploited for sequence analysis. AVAILABILITY AND IMPLEMENTATION: All data, analyses and results are available on Zenodo (https://doi.org/10.5281/zenodo.7031286). The code and scripts have been deposited in GitHub (https://github.com/cbcrg/msa-af2-nf) and the various containers in (https://cloud.sylabs.io/library/athbaltzis/af2/alphafold, https://hub.docker.com/r/athbaltzis/pred). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Proteínas , Programas Informáticos , Alineación de Secuencia , Evolución Biológica
3.
Nat Methods ; 18(1): 37-39, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-33398187
4.
Methods Mol Biol ; 2231: 89-97, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33289888

RESUMEN

Many fields of biology rely on the inference of accurate multiple sequence alignments (MSA) of biological sequences. Unfortunately, the problem of assembling an MSA is NP-complete thus limiting computation to approximate solutions using heuristics solutions. The progressive algorithm is one of the most popular frameworks for the computation of MSAs. It involves pre-clustering the sequences and aligning them starting with the most similar ones. The scalability of this framework is limited, especially with respect to accuracy. We present here an alternative approach named regressive algorithm. In this framework, sequences are first clustered and then aligned starting with the most distantly related ones. This approach has been shown to greatly improve accuracy during scale-up, especially on datasets featuring 10,000 sequences or more. Another benefit is the possibility to integrate third-party clustering methods and third-party MSA aligners. The regressive algorithm has been tested on up to 1.5 million sequences, its implementation is available in the T-Coffee package.


Asunto(s)
Biología Computacional/métodos , Alineación de Secuencia/métodos , Programas Informáticos , Algoritmos , Análisis por Conglomerados , Biología Computacional/instrumentación , Alineación de Secuencia/instrumentación
5.
mSystems ; 5(2)2020 Apr 07.
Artículo en Inglés | MEDLINE | ID: mdl-32265314

RESUMEN

Since the turn of the century, technological advances have made it possible to obtain the molecular profile of any tissue in a cost-effective manner. Among these advances are sophisticated high-throughput assays that measure the relative abundances of microorganisms, RNA molecules, and metabolites. While these data are most often collected to gain new insights into biological systems, they can also be used as biomarkers to create clinically useful diagnostic classifiers. How best to classify high-dimensional -omics data remains an area of active research. However, few explicitly model the relative nature of these data and instead rely on cumbersome normalizations. This report (i) emphasizes the relative nature of health biomarkers, (ii) discusses the literature surrounding the classification of relative data, and (iii) benchmarks how different transformations perform for regularized logistic regression across multiple biomarker types. We show how an interpretable set of log contrasts, called balances, can prepare data for classification. We propose a simple procedure, called discriminative balance analysis, to select groups of 2 and 3 bacteria that can together discriminate between experimental conditions. Discriminative balance analysis is a fast, accurate, and interpretable alternative to data normalization.IMPORTANCE High-throughput sequencing provides an easy and cost-effective way to measure the relative abundance of bacteria in any environmental or biological sample. When these samples come from humans, the microbiome signatures can act as biomarkers for disease prediction. However, because bacterial abundance is measured as a composition, the data have unique properties that make conventional analyses inappropriate. To overcome this, analysts often use cumbersome normalizations. This article proposes an alternative method that identifies pairs and trios of bacteria whose stoichiometric presence can differentiate between diseased and nondiseased samples. By using interpretable log contrasts called balances, we developed an entirely normalization-free classification procedure that reduces the feature space and improves the interpretability, without sacrificing classifier performance.

6.
NAR Genom Bioinform ; 2(4): lqaa076, 2020 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-33575624

RESUMEN

Many next-generation sequencing datasets contain only relative information because of biological and technical factors that limit the total number of transcripts observed for a given sample. It is not possible to interpret any one component in isolation. The field of compositional data analysis has emerged with alternative methods for relative data based on log-ratio transforms. However, these data often contain many more features than samples, and thus require creative new ways to reduce the dimensionality of the data. The summation of parts, called amalgamation, is a practical way of reducing dimensionality, but can introduce a non-linear distortion to the data. We exploit this non-linearity to propose a powerful yet interpretable dimension method called data-driven amalgamation. Our new method, implemented in the user-friendly R package amalgam, can reduce the dimensionality of compositional data by finding amalgamations that optimally (i) preserve the distance between samples, or (ii) classify samples as diseased or not. Our benchmark on 13 real datasets confirm that these amalgamations compete with state-of-the-art methods in terms of performance, but result in new features that are easily understood: they are groups of parts added together.

8.
Nat Biotechnol ; 37(12): 1466-1470, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31792410

RESUMEN

Multiple sequence alignments (MSAs) are used for structural1,2 and evolutionary predictions1,2, but the complexity of aligning large datasets requires the use of approximate solutions3, including the progressive algorithm4. Progressive MSA methods start by aligning the most similar sequences and subsequently incorporate the remaining sequences, from leaf to root, based on a guide tree. Their accuracy declines substantially as the number of sequences is scaled up5. We introduce a regressive algorithm that enables MSA of up to 1.4 million sequences on a standard workstation and substantially improves accuracy on datasets larger than 10,000 sequences. Our regressive algorithm works the other way around from the progressive algorithm and begins by aligning the most dissimilar sequences. It uses an efficient divide-and-conquer strategy to run third-party alignment methods in linear time, regardless of their original complexity. Our approach will enable analyses of extremely large genomic datasets such as the recently announced Earth BioGenome Project, which comprises 1.5 million eukaryotic genomes6.


Asunto(s)
Algoritmos , Alineación de Secuencia/métodos , Bases de Datos Genéticas , Eucariontes/genética , Genómica/métodos , Análisis de Regresión
9.
Gigascience ; 8(9)2019 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-31544212

RESUMEN

BACKGROUND: Next-generation sequencing (NGS) has made it possible to determine the sequence and relative abundance of all nucleotides in a biological or environmental sample. A cornerstone of NGS is the quantification of RNA or DNA presence as counts. However, these counts are not counts per se: their magnitude is determined arbitrarily by the sequencing depth, not by the input material. Consequently, counts must undergo normalization prior to use. Conventional normalization methods require a set of assumptions: they assume that the majority of features are unchanged and that all environments under study have the same carrying capacity for nucleotide synthesis. These assumptions are often untestable and may not hold when heterogeneous samples are compared. RESULTS: Methods developed within the field of compositional data analysis offer a general solution that is assumption-free and valid for all data. Herein, we synthesize the extant literature to provide a concise guide on how to apply compositional data analysis to NGS count data. CONCLUSIONS: In highlighting the limitations of total library size, effective library size, and spike-in normalizations, we propose the log-ratio transformation as a general solution to answer the question, "Relative to some important activity of the cell, what is changing?"


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Animales , Secuencia de Bases , Células Dendríticas/efectos de los fármacos , Células Dendríticas/metabolismo , Biblioteca de Genes , Lipopolisacáridos/farmacología , Espectrometría de Masas , Ratones , ARN Mensajero/metabolismo , Análisis de la Célula Individual , Programas Informáticos
10.
Nucleic Acids Res ; 47(W1): W600-W604, 2019 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-31106365

RESUMEN

We present a new web application to query and visualize time-series behavioral data: the Pergola web-server. This server provides a user-friendly interface for exploring longitudinal behavioral data taking advantage of the Pergola Python library. Using the server, users can process the data applying some basic operations, such as binning or grouping, while formatting the data into existing genomic formats. Thanks to this repurposing of genomics standards, the application automatically renders an interactive data visualization based on sophisticated genome visualization tools. Our tool allows behavioral scientists to share, display and navigate complex behavioral data comprising multiple individuals and multiple data types, in a scalable and flexible manner. A download option allows for further analysis using genomic tools. The server can be a great resource for the field in a time where behavioral science is entering a data-intensive cycle thanks to high-throughput behavioral phenotyping platforms. Pergola is publicly available at http://pergola.crg.eu/.


Asunto(s)
Conducta , Programas Informáticos , Gráficos por Computador , Genómica , Internet
11.
Neurobiol Dis ; 127: 210-222, 2019 07.
Artículo en Inglés | MEDLINE | ID: mdl-30831192

RESUMEN

Autism spectrum disorders are early onset neurodevelopmental disorders characterized by deficits in social communication and restricted repetitive behaviors, yet they are quite heterogeneous in terms of their genetic basis and phenotypic manifestations. Recently, de novo pathogenic mutations in DYRK1A, a chromosome 21 gene associated to neuropathological traits of Down syndrome, have been identified in patients presenting a recognizable syndrome included in the autism spectrum. These mutations produce DYRK1A kinases with partial or complete absence of the catalytic domain, or they represent missense mutations located within this domain. Here, we undertook an extensive biochemical characterization of the DYRK1A missense mutations reported to date and show that most of them, but not all, result in enzymatically dead DYRK1A proteins. We also show that haploinsufficient Dyrk1a+/- mutant mice mirror the neurological traits associated with the human pathology, such as defective social interactions, stereotypic behaviors and epileptic activity. These mutant mice present altered proportions of excitatory and inhibitory neocortical neurons and synapses. Moreover, we provide evidence that alterations in the production of cortical excitatory neurons are contributing to these defects. Indeed, by the end of the neurogenic period, the expression of developmental regulated genes involved in neuron differentiation and/or activity is altered. Therefore, our data indicate that altered neocortical neurogenesis could critically affect the formation of cortical circuits, thereby contributing to the neuropathological changes in DYRK1A haploinsufficiency syndrome.


Asunto(s)
Trastorno Autístico/metabolismo , Haploinsuficiencia , Neocórtex/metabolismo , Red Nerviosa/metabolismo , Proteínas Serina-Treonina Quinasas/metabolismo , Proteínas Tirosina Quinasas/metabolismo , Conducta Social , Animales , Trastorno Autístico/genética , Conducta Animal/fisiología , Masculino , Ratones , Mutación Missense , Proteínas Serina-Treonina Quinasas/genética , Proteínas Tirosina Quinasas/genética , Quinasas DyrK
12.
Bio Protoc ; 9(14): e3308, 2019 Jul 20.
Artículo en Inglés | MEDLINE | ID: mdl-33654818

RESUMEN

Obesity is an important health problem with a strong environmental component that is acquiring pandemic proportion. The high availability of caloric dense foods promotes overeating potentially causing obesity. Animal models are key to validate novel therapeutic strategies, but researchers must carefully select the appropriate model to draw the right conclusions. Obesity is defined by an increased body mass index greater than 30 and characterized by an excess of adipose tissue. However, the regulation of food intake involves a close interrelationship between homeostatic and non-homeostatic factors. Studies in animal models have shown that intermittent access to sweetened or calorie-dense foods induces changes in feeding behavior. However, these studies are focused mainly on the final outcome (obesity) rather than on the primary dysfunction underlying the overeating of palatable foods. We describe a protocol to study overeating in mice using diet-induced obesity (DIO). This method can be applied to free choice between palatable food and a standard rodent chow or to forced intake of calorie-dense and/or palatable diets. Exposure to such diets is sufficient to promote changes in meal pattern that we register and analyze during the period of weight gain allowing the longitudinal characterization of feeding behavior in mice. Abnormal eating behaviors such as binge eating or snacking, behavioral alterations commonly observed in obese humans, can be detected using our protocol. In the free-choice procedure, mice develop a preference for the rewarding palatable food showing the reinforcing effect of this diet. Compulsive components of feeding are reflected by maintenance of feeding despite an adverse bitter taste caused by adulteration with quinine and by the negligence of standard chow when access to palatable food is ceased or temporally limited. Our strategy also enables to identify compulsive overeating in mice under a high-caloric regime by using limited food access and finally, we propose complementary behavioral tests to confirm the non-homeostatic food-taking triggered by these foods. Finally, we describe how to computationally explore large longitudinal behavioral datasets.

13.
iScience ; 9: 244-257, 2018 Nov 30.
Artículo en Inglés | MEDLINE | ID: mdl-30419504

RESUMEN

The growing appetite of behavioral neuroscience for automated data production is prompting the need for new computational standards allowing improved interoperability, reproducibility, and shareability. We show here how these issues can be solved by repurposing existing genomic formats whose structure perfectly supports the handling of time series. This allows existing genomic analysis and visualization tools to be deployed onto behavioral data. As a proof of principle, we implemented the conversion procedure in Pergola, an open source software, and used genomics tools to reproduce results obtained in mouse, fly, and worm. We also show how common genomics techniques such as principal component analysis, hidden Markov modeling, and volcano plots can be deployed on the reformatted behavioral data. These analyses are easy to share because they depend on the scripting of public software. They are also easy to reproduce thanks to their integration within Nextflow, a workflow manager using containerized software.

14.
Bioinformatics ; 34(16): 2870-2878, 2018 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-29608657

RESUMEN

Motivation: Although seldom acknowledged explicitly, count data generated by sequencing platforms exist as compositions for which the abundance of each component (e.g. gene or transcript) is only coherently interpretable relative to other components within that sample. This property arises from the assay technology itself, whereby the number of counts recorded for each sample is constrained by an arbitrary total sum (i.e. library size). Consequently, sequencing data, as compositional data, exist in a non-Euclidean space that, without normalization or transformation, renders invalid many conventional analyses, including distance measures, correlation coefficients and multivariate statistical models. Results: The purpose of this review is to summarize the principles of compositional data analysis (CoDA), provide evidence for why sequencing data are compositional, discuss compositionally valid methods available for analyzing sequencing data, and highlight future directions with regard to this field of study. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Análisis de Secuencia , Biblioteca de Genes , Humanos , Modelos Estadísticos , Análisis de Secuencia/estadística & datos numéricos
15.
Addict Biol ; 23(2): 531-543, 2018 03.
Artículo en Inglés | MEDLINE | ID: mdl-29318700

RESUMEN

Obesity represents an important risk factor contributing to the global burden of disease. The current obesogenic environment with easy access to calorie-dense foods is fueling this obesity epidemic. However, how these foods contribute to the progression of feeding behavior changes that lead to overeating is not well understood and needs systematic assessment. Using novel automated methods for the high-throughput screening of behavior, we here examine mice meal pattern upon long-term exposure to a free-choice chocolate-mixture diet and a high-fat diet with face validity for a rapid development of obesity induced by unhealthy food regularly consumed in our societies. We identified rapid diet-specific behavioral changes after exposure to those high-caloric diets. Mice fed with high-fat chow, showed long-lasting meal pattern disturbances, which initiate with a stable loss of circadian feeding rhythmicity. Mice receiving a chocolate-mixture showed qualitatively similar changes, though less marked, consisting in a transient disruption of the feeding behavior and the circadian feeding rhytmicity. Strikingly, compulsive-like eating behavior is triggered immediately after exposure to both high-fat food and chocolate-mixture diet, well before any changes in body weight could be observed. We propose these changes as behavioral biomarkers of prodromal states of obesity that could allow early intervention.


Asunto(s)
Chocolate , Dieta Alta en Grasa , Ingestión de Energía , Conducta Alimentaria , Obesidad , Animales , Ritmo Circadiano , Conducta Compulsiva , Alimentos , Hiperfagia , Masculino , Ratones
16.
Addict Biol ; 23(2): 544-555, 2018 03.
Artículo en Inglés | MEDLINE | ID: mdl-29282813

RESUMEN

A major problem in treating obesity is the high rate of relapse to abnormal food-taking habits after maintaining an energy balanced diet. Alterations of eating behavior such as compulsive-like behavior and lack of self-control over food intake play a critical role in relapse. In this study, we used an operant paradigm of food-seeking behavior on two different diet-induced obesity models, a free-choice chocolate-mixture diet and a high-fat diet with face validity for a rapid development of obesity or for unhealthy food regularly consumed in our societies. A reduced operant performance and motivation for the hedonic value of palatable chocolate pellets was revealed in both obesity mouse models. However, only mice exposed to high-fat diet showed an increased compulsive-like behavior in the absence of the reinforcer further characterized by impaired operant learning, enhanced impulsivity and intensified inflexibility. We used principal component analysis to globally identify the specific behaviors responsible for the differences among diet groups. Learning impairment and inflexible behaviors contributed to a first principal component, explaining the largest proportion of the variance in the high-fat diet mice phenotype. Reinforcement, impulsion and compulsion were the main contributors to the second principal component explaining the differences in the chocolate-mixture mice behavioral phenotype. These behaviors were not exclusive of chocolate group because some high-fat individuals showed similar values on this component. These data indicate that extended access to hypercaloric diets differentially modifies operant behavior learning, behavioral flexibility, impulsive-like and compulsive-like behavior, and these effects were dependent on the exposure to each specific diet.


Asunto(s)
Condicionamiento Operante , Conducta Alimentaria , Alimentos , Obesidad , Animales , Conducta Animal , Chocolate , Conducta Compulsiva , Dieta Alta en Grasa , Ingestión de Alimentos , Extinción Psicológica , Conducta Impulsiva , Aprendizaje , Masculino , Ratones , Análisis de Componente Principal , Refuerzo en Psicología , Autocontrol
17.
Genome Biol ; 17(1): 251, 2016 12 14.
Artículo en Inglés | MEDLINE | ID: mdl-27964752

RESUMEN

BACKGROUND: Genomic studies of endangered species provide insights into their evolution and demographic history, reveal patterns of genomic erosion that might limit their viability, and offer tools for their effective conservation. The Iberian lynx (Lynx pardinus) is the most endangered felid and a unique example of a species on the brink of extinction. RESULTS: We generate the first annotated draft of the Iberian lynx genome and carry out genome-based analyses of lynx demography, evolution, and population genetics. We identify a series of severe population bottlenecks in the history of the Iberian lynx that predate its known demographic decline during the 20th century and have greatly impacted its genome evolution. We observe drastically reduced rates of weak-to-strong substitutions associated with GC-biased gene conversion and increased rates of fixation of transposable elements. We also find multiple signatures of genetic erosion in the two remnant Iberian lynx populations, including a high frequency of potentially deleterious variants and substitutions, as well as the lowest genome-wide genetic diversity reported so far in any species. CONCLUSIONS: The genomic features observed in the Iberian lynx genome may hamper short- and long-term viability through reduced fitness and adaptive potential. The knowledge and resources developed in this study will boost the research on felid evolution and conservation genomics and will benefit the ongoing conservation and management of this emblematic species.


Asunto(s)
Genética de Población , Genoma , Lynx/genética , Animales , Especies en Peligro de Extinción , Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento , Anotación de Secuencia Molecular , Análisis de Secuencia de ADN
18.
eNeuro ; 3(5)2016.
Artículo en Inglés | MEDLINE | ID: mdl-27844057

RESUMEN

Intellectual disability in Down syndrome (DS) is accompanied by altered neuro-architecture, deficient synaptic plasticity, and excitation-inhibition imbalance in critical brain regions for learning and memory. Recently, we have demonstrated beneficial effects of a combined treatment with green tea extract containing (-)-epigallocatechin-3-gallate (EGCG) and cognitive stimulation in young adult DS individuals. Although we could reproduce the cognitive-enhancing effects in mouse models, the underlying mechanisms of these beneficial effects are unknown. Here, we explored the effects of a combined therapy with environmental enrichment (EE) and EGCG in the Ts65Dn mouse model of DS at young age. Our results show that combined EE-EGCG treatment improved corticohippocampal-dependent learning and memory. Cognitive improvements were accompanied by a rescue of cornu ammonis 1 (CA1) dendritic spine density and a normalization of the proportion of excitatory and inhibitory synaptic markers in CA1 and dentate gyrus.


Asunto(s)
Región CA1 Hipocampal/patología , Catequina/análogos & derivados , Síndrome de Down/terapia , Vivienda para Animales , Aprendizaje , Nootrópicos/farmacología , Animales , Región CA1 Hipocampal/efectos de los fármacos , Región CA1 Hipocampal/metabolismo , Catequina/farmacología , Espinas Dendríticas/efectos de los fármacos , Espinas Dendríticas/metabolismo , Espinas Dendríticas/patología , Modelos Animales de Enfermedad , Síndrome de Down/metabolismo , Síndrome de Down/patología , Aprendizaje/efectos de los fármacos , Ratones Transgénicos , Extractos Vegetales/farmacología , Distribución Aleatoria , Reconocimiento en Psicología/efectos de los fármacos , Sinapsis/efectos de los fármacos , Sinapsis/metabolismo , Sinapsis/patología , , Proteína 1 de Transporte Vesicular de Glutamato/metabolismo , Proteínas del Transporte Vesicular de Aminoácidos Inhibidores/metabolismo
19.
Genome Biol ; 17: 32, 2016 Feb 25.
Artículo en Inglés | MEDLINE | ID: mdl-26911872

RESUMEN

BACKGROUND: Legumes are the third largest family of angiosperms and the second most important crop class. Legume genomes have been shaped by extensive large-scale gene duplications, including an approximately 58 million year old whole genome duplication shared by most crop legumes. RESULTS: We report the genome and the transcription atlas of coding and non-coding genes of a Mesoamerican genotype of common bean (Phaseolus vulgaris L., BAT93). Using a comprehensive phylogenomics analysis, we assessed the past and recent evolution of common bean, and traced the diversification of patterns of gene expression following duplication. We find that successive rounds of gene duplications in legumes have shaped tissue and developmental expression, leading to increased levels of specialization in larger gene families. We also find that many long non-coding RNAs are preferentially expressed in germ-line-related tissues (pods and seeds), suggesting that they play a significant role in fruit development. Our results also suggest that most bean-specific gene family expansions, including resistance gene clusters, predate the split of the Mesoamerican and Andean gene pools. CONCLUSIONS: The genome and transcriptome data herein generated for a Mesoamerican genotype represent a counterpart to the genomic resources already available for the Andean gene pool. Altogether, this information will allow the genetic dissection of the characters involved in the domestication and adaptation of the crop, and their further implementation in breeding strategies for this important crop.


Asunto(s)
Genoma de Planta , Repeticiones de Microsatélite/genética , Phaseolus/genética , Transcriptoma/genética , ADN de Plantas/genética , Duplicación de Gen , Perfilación de la Expresión Génica , Genotipo , Humanos , Filogenia , Semillas/genética , Análisis de Secuencia de ADN
20.
Theory Biosci ; 135(1-2): 21-36, 2016 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-26762323

RESUMEN

Correlation is ubiquitously used in gene expression analysis although its validity as an objective criterion is often questionable. If no normalization reflecting the original mRNA counts in the cells is available, correlation between genes becomes spurious. Yet the need for normalization can be bypassed using a relative analysis approach called log-ratio analysis. This approach can be used to identify proportional gene pairs, i.e. a subset of pairs whose correlation can be inferred correctly from unnormalized data due to their vanishing log-ratio variance. To interpret the size of non-zero log-ratio variances, a proposal for a scaling with respect to the variance of one member of the gene pair was recently made by Lovell et al. Here we derive analytically how spurious proportionality is introduced when using a scaling. We base our analysis on a symmetric proportionality coefficient (briefly mentioned in Lovell et al.) that has a number of advantages over their statistic. We show in detail how the choice of reference needed for the scaling determines which gene pairs are identified as proportional. We demonstrate that using an unchanged gene as a reference has huge advantages in terms of sensitivity. We also explore the link between proportionality and partial correlation and derive expressions for a partial proportionality coefficient. A brief data-analysis part puts the discussed concepts into practice.


Asunto(s)
Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Schizosaccharomyces/genética , Redes Reguladoras de Genes , Genes Fúngicos , Análisis de los Mínimos Cuadrados , Modelos Biológicos , Modelos Estadísticos , ARN Mensajero/metabolismo , Análisis de Secuencia de ARN , Procesos Estocásticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA