Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
1.
BMC Bioinformatics ; 25(1): 323, 2024 Oct 05.
Artigo em Inglês | MEDLINE | ID: mdl-39369208

RESUMO

In the past two decades, genomics has advanced significantly, with single-cell RNA-sequencing (scRNA-seq) marking a pivotal milestone. ScRNA-seq provides unparalleled insights into cellular diversity and has spurred diverse studies across multiple conditions and samples, resulting in an influx of complex multidimensional genomics data. This highlights the need for robust methodologies capable of handling the complexity and multidimensionality of such genomics data. Furthermore, single-cell data grapples with sparsity due to issues like low capture efficiency and dropout effects. Tensor factorizations (TF) have emerged as powerful tools to unravel the complex patterns from multi-dimensional genomics data. Classic TF methods, based on maximum likelihood estimation, struggle with zero-inflated count data, while the inherent stochasticity in TFs further complicates result interpretation and reproducibility. Our paper introduces Zero Inflated Poisson Tensor Factorization (ZIPTF), a novel method for high-dimensional zero-inflated count data factorization. We also present Consensus-ZIPTF (C-ZIPTF), merging ZIPTF with a consensus-based approach to address stochasticity. We evaluate our proposed methods on synthetic zero-inflated count data, simulated scRNA-seq data, and real multi-sample multi-condition scRNA-seq datasets. ZIPTF consistently outperforms baseline matrix and tensor factorization methods, displaying enhanced reconstruction accuracy for zero-inflated data. When dealing with high probabilities of excess zeros, ZIPTF achieves up to 2.4 × better accuracy. Moreover, C-ZIPTF notably enhances the factorization's consistency. When tested on synthetic and real scRNA-seq data, ZIPTF and C-ZIPTF consistently uncover known and biologically meaningful gene expression programs. Access our data and code at: https://github.com/klarman-cell-observatory/scBTF and https://github.com/klarman-cell-observatory/scbtf_experiments .


Assuntos
Genômica , Genômica/métodos , Algoritmos , Análise de Célula Única/métodos , Análise de Sequência de RNA/métodos , Humanos , Software
2.
ArXiv ; 2024 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-39130195

RESUMO

Advances in spatially-resolved transcriptomics (SRT) technologies have propelled the development of new computational analysis methods to unlock biological insights. As the cost of generating these data decreases, these technologies provide an exciting opportunity to create large-scale atlases that integrate SRT data across multiple tissues, individuals, species, or phenotypes to perform population-level analyses. Here, we describe unique challenges of varying spatial resolutions in SRT data, as well as highlight the opportunities for standardized preprocessing methods along with computational algorithms amenable to atlas-scale datasets leading to improved sensitivity and reproducibility in the future.

3.
bioRxiv ; 2024 Jul 23.
Artigo em Inglês | MEDLINE | ID: mdl-39091836

RESUMO

Low-pass genome sequencing is cost-effective and enables analysis of large cohorts. However, it introduces biases by reducing heterozygous genotypes and low-frequency alleles, impacting subsequent analyses such as demographic history inference. We developed a probabilistic model of low-pass biases from the Genome Analysis Toolkit (GATK) multi-sample calling pipeline, and we implemented it in the population genomic inference software dadi. We evaluated the model using simulated low-pass datasets and found that it alleviated low-pass biases in inferred demographic parameters. We further validated the model by downsampling 1000 Genomes Project data, demonstrating its effectiveness on real data. Our model is widely applicable and substantially improves model-based inferences from low-pass population genomic data.

4.
Front Public Health ; 12: 1342313, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38962766

RESUMO

Background: Studies have shown that gut dysbiosis contributes to the pathophysiology of type 2 diabetes mellitus (T2DM). Identifying specific gut microbiota dysbiosis may provide insight into the pathogenesis of T2DM. Purpose: This study investigated the causal relationship between gut microbiota and T2DM using meta-analysis and Mendelian randomization (MR). Methods: In the first part, we searched for literature on gut microbiota and T2DM, and conducted a meta-analysis. We observed differences in glycosylated hemoglobin and fasting blood glucose levels in both groups. Second, we obtained GWAS data from genome-wide association study database 19 (GWAS). We used two-sample MR analysis to verify the forward and reverse causal associations between gut microbiota and T2DM. Additionally, we selected the European GWAS data from the European Bioinformatics Institute (EBI) as a validation set for external validation of the MR analysis. In the third part, we aimed to clarify which gut microbiota contribute to the degree of causal association between group disorders and T2DM through multivariate MR analysis and Bayesian model averaging (MR-BMA). Results: 1. According to the meta-analysis results, the glycated hemoglobin concentration in the gut probiotic intervention group was significantly lower than in the control group. Following treatment, fasting blood glucose levels in the intervention group were significantly lower than those in the control group. 2. The results of two samples MR analysis revealed that there were causal relationships between six gut microbiota and T2DM. Genus Haemophilus and order Pasteurellaceae were negatively correlated with T2DM. Genus Actinomycetes, class Melanobacteria and genus Lactobacillus were positively correlated. Reverse MR analysis demonstrated that T2DM and gut microbiota did not have any reverse causal relationship. The external validation data set showed a causal relationship between gut microbiota and T2DM. 3. Multivariate MR analysis and MR-BMA results showed that the independent genus Haemophilus collection had the largest PP. Conclusion: Our research results suggest that gut microbiota is closely related to T2DM pathogenesis. The results of further MR research and an analysis of the prediction model indicate that a variety of gut microbiota disorders, including genus Haemophilus, are causally related to the development of T2DM. The findings of this study may provide some insight into the diagnosis and treatment of T2DM. Systematic review registration: https://www.crd.york.ac.uk/PROSPERO.


Assuntos
Diabetes Mellitus Tipo 2 , Microbioma Gastrointestinal , Estudo de Associação Genômica Ampla , Análise da Randomização Mendeliana , Diabetes Mellitus Tipo 2/microbiologia , Humanos , Disbiose , Glicemia/análise , Hemoglobinas Glicadas/análise , Probióticos
5.
Patterns (N Y) ; 5(5): 100986, 2024 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-38800365

RESUMO

Spatially resolved transcriptomics has revolutionized genome-scale transcriptomic profiling by providing high-resolution characterization of transcriptional patterns. Here, we present our spatial transcriptomics analysis framework, MUSTANG (MUlti-sample Spatial Transcriptomics data ANalysis with cross-sample transcriptional similarity Guidance), which is capable of performing multi-sample spatial transcriptomics spot cellular deconvolution by allowing both cross-sample expression-based similarity information sharing as well as spatial correlation in gene expression patterns within samples. Experiments on a semi-synthetic spatial transcriptomics dataset and three real-world spatial transcriptomics datasets demonstrate the effectiveness of MUSTANG in revealing biological insights inherent in the cellular characterization of tissue samples under study.

6.
Anal Biochem ; 681: 115329, 2023 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-37722523

RESUMO

The phenol-sulfuric acid (PSA) method is a widely used colorimetric method for determining the total saccharides. Microplate-based PSA methods have been developed to handle a large number of samples and reduce the use of hazardous chemicals. However, the optimal procedures and measurement conditions for this method have not yet been fully established. To address this gap, we investigated the optimal procedure for microplate-based PSA. In addition to glucose (Glc), two types of cellulose nanofibers (CNFs) were also evaluated as they are a new type of nanomaterial, and a technique to quantify the concentration of CNFs is required in their safety assessment. The results showed that the thermal reaction with sulfuric acid before the addition of phenol resulted in a higher coloration than was shown after the addition of phenol. Furthermore, the longer the resting time after shaking with phenol, the greater the coloration and smaller the variation, with a resting time of 60 min or longer being optimal. This research provides valuable insights into improving the reliability and efficiency of the PSA method, which can facilitate the analysis of saccharides and other substances in a range of applications.


Assuntos
Nanofibras , Fenol , Celulose/química , Reprodutibilidade dos Testes , Fenóis , Carboidratos/análise
7.
Entropy (Basel) ; 25(2)2023 Jan 28.
Artigo em Inglês | MEDLINE | ID: mdl-36832605

RESUMO

In this paper, we focus on the homogeneity test that evaluates whether two multivariate samples come from the same distribution. This problem arises naturally in various applications, and there are many methods available in the literature. Based on data depth, several tests have been proposed for this problem but they may not be very powerful. In light of the recent development of data depth as an important measure in quality assurance, we propose two new test statistics for the multivariate two-sample homogeneity test. The proposed test statistics have the same χ2(1) asymptotic null distribution. The generalization of the proposed tests into the multivariate multisample situation is discussed as well. Simulations studies demonstrate the superior performance of the proposed tests. The test procedure is illustrated through two real data examples.

8.
Genome Biol ; 23(1): 168, 2022 08 04.
Artigo em Inglês | MEDLINE | ID: mdl-35927760

RESUMO

Spatial transcriptomic studies are reaching single-cell spatial resolution, with data often collected from multiple tissue sections. Here, we present a computational method, BASS, that enables multi-scale and multi-sample analysis for single-cell resolution spatial transcriptomics. BASS performs cell type clustering at the single-cell scale and spatial domain detection at the tissue regional scale, with the two tasks carried out simultaneously within a Bayesian hierarchical modeling framework. We illustrate the benefits of BASS through comprehensive simulations and applications to three datasets. The substantial power gain brought by BASS allows us to reveal accurate transcriptomic and cellular landscape in both cortex and hypothalamus.


Assuntos
Transcriptoma , Teorema de Bayes , Análise por Conglomerados
9.
Acta investigación psicol. (en línea) ; 12(1): 39-48, ene.-abr. 2022. tab, graf
Artigo em Espanhol | LILACS-Express | LILACS | ID: biblio-1429544

RESUMO

Resumen En el presente estudio se busca analizar los predictores de la agencia personal de estudiantes universitarios, diferenciando entre estudiantes colombianos y mexicanos. Para lo cual se exploraron las estructuras factoriales de la Agencia Personal de ambos grupos de estudiantes, así como las interrelaciones que exhiben los factores en cada una de las muestras. Se trabajó con una muestra no probabilística de 243 estudiantes, 127 mexicanos y 116 colombianos quienes respondieron la Escala de Agencia Personal del IASE. Un AF-MM refleja universalidad a juzgar por los valores de chi cuadrado; sin embargo, sí se presentan diferencias entre los predictores de cada factor para ambas muestras. Se discute acerca de que estudios de este tipo permiten el uso de la escala de Agencia personal como predictor de dicho factor para estudiantes de ambas muestras dado que es un instrumento culturalmente sensible y relevante para la realidad cultural latinoamericana.


Abstract The present study seeks to analyze the predictors of the personal agency of university students, differentiating between Colombian and Mexican students. For which the factorial structures of the Personal Agency of both groups of students were explored, as well as the interrelationships exhibited by the factors in each of the samples. We worked with a non-probabilistic sample of 243 students, 127 Mexicans and 116 Colombians who answered the IASE Personal Agency Scale. An AF-MM reflects universality as judged by chi square values; however, there are differences between the predictors of each factor for both samples. It is discussed that studies of this allow the use of the Personal Agency scale as a predictor of said factor for students from both samples, given that it is a culturally sensitive and relevant instrument for the Latin American cultural reality.

10.
J Atten Disord ; 26(4): 573-586, 2022 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-33998322

RESUMO

OBJECTIVE: To identify common and shared predictors of academic achievement across samples of children with ADHD. METHOD: Two clinically referred samples from New Zealand (1 n = 88, 82% boys; 2 n = 121, 79% boys) and two community samples from the United States (3 n = 111, 65% boys; 4 n = 114, 69% boys), completed similar diagnostic, cognitive and academic assessments. Hierarchical multiple regression analyses identified significant predictors of word reading, spelling, and math computation performance in each sample. RESULTS: Entered after IQ, semantic language, age at testing, and verbal working memory emerged as consistent predictors of achievement across academic subjects and samples. Visual-spatial working memory contributed to variance in math performance only. Symptom severity explained limited variance. CONCLUSIONS: We recommend evaluations of children with ADHD incorporate assessments of working memory and language skills. Classroom/academic interventions should accommodate reduced working memory and address any identified language weaknesses.


Assuntos
Sucesso Acadêmico , Transtorno do Deficit de Atenção com Hiperatividade , Transtorno do Deficit de Atenção com Hiperatividade/diagnóstico , Transtorno do Deficit de Atenção com Hiperatividade/psicologia , Criança , Escolaridade , Feminino , Humanos , Masculino , Matemática , Memória de Curto Prazo
11.
ACS Sens ; 6(8): 2868-2874, 2021 08 27.
Artigo em Inglês | MEDLINE | ID: mdl-34156242

RESUMO

Droplet digital loop-mediated isothermal amplification (ddLAMP) is an important assay for pathogen detection due to its high accuracy, specificity, and ability to quantify nucleic acids. However, performing ddLAMP requires expensive instrumentation and the need for highly trained personnel with expertise in microfluidics. To make ddLAMP more accessible, a ddLAMP assay is developed, featuring significantly decreased operational difficulty and instrumentation requirements. The proposed assay consists of three simplified steps: (1) droplet generation step, in which a LAMP mixture can be emulsified just by manually pulling a syringe connected to a microfluidic device. In this step, for the first time, we verify that highly monodispersed droplets can be generated with unstable flow rates or pressures, allowing untrained personnel to operate the microfluidic device and perform ddLAMP assay; (2) heating step, in which the droplets are isothermally heated in a water bath, which can be found in most laboratories; and (3) result analysis step, in which the ddLAMP result can be determined using only a fluorescence microscopy and an open-source analyzing software. Throughout the process, no droplet microfluidic expertise or equipment is required. More importantly, the proposed system enables multiple samples to be processed simultaneously with a detection limit of 10 copies/µL. The test is simple and intuitive to operate in most laboratories for multi-sample detection, significantly enhancing the accessibility and detection throughput of the ddLAMP technique.


Assuntos
Microfluídica , Técnicas de Amplificação de Ácido Nucleico , Dispositivos Lab-On-A-Chip , Técnicas de Diagnóstico Molecular
12.
Micromachines (Basel) ; 11(9)2020 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-32872601

RESUMO

The spin-column system for the isolation of nucleic acids (NAs) from multiple samples presents the inconvenience of repeated experimentation, time-consumption, and the risk of contamination in the process of the spin-column exchange. Herein, we propose a convenient and universal assay that can be used to diagnose multiple pathogens using a multi-sample preparation assay. The multi-sample preparation assay combines a 96-well filter/membrane plate, a bio-micromaterial lattice-like micro amine-functional diatomaceous earth (D-APDMS), and homobifunctional imidoesters (HI) for the processing of pathogen enrichment and extraction for multiple samples simultaneously. The purity and quantity of the extracted NAs from pathogens (E. coli and Brucella) using the proposed assay is superior to that of the commercialized spin-column kit. The assay also does not require the replacement of several collection tubes during the reaction processing. For the multi-sample testing, we used as many as six samples simultaneously with the proposed assay. This assay can simultaneously separate up to 96 NAs from one plate, and the use of multichannel pipettes allows faster and simpler experimentation. Therefore, we believe it is a convenient and easy process, and can be easily integrated with other detection methods for clinical diagnostics.

13.
Genomics ; 112(6): 4288-4296, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32702417

RESUMO

We posit the likely architecture of complex diseases is that subgroups of patients share variants in genes in specific networks which are sufficient to give rise to a shared phenotype. We developed Proteinarium, a multi-sample protein-protein interaction (PPI) tool, to identify clusters of patients with shared gene networks. Proteinarium converts user defined seed genes to protein symbols and maps them onto the STRING interactome. A PPI network is built for each sample using Dijkstra's algorithm. Pairwise similarity scores are calculated to compare the networks and cluster the samples. A layered graph of PPI networks for the samples in any cluster can be visualized. To test this newly developed analysis pipeline, we reanalyzed publicly available data sets, from which modest outcomes had previously been achieved. We found significant clusters of patients with unique genes which enhanced the findings in the original study.


Assuntos
Mapeamento de Interação de Proteínas/métodos , Software , Análise por Conglomerados , Gráficos por Computador , Feminino , Humanos , Masculino , Gravidez , Nascimento Prematuro , Hiperplasia Prostática/genética , Hiperplasia Prostática/metabolismo , Mapas de Interação de Proteínas , Transcriptoma
14.
Struct Equ Modeling ; 27(6): 931-941, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-35046631

RESUMO

Integrative data analysis (IDA) involves obtaining multiple datasets, scaling the data to a common metric, and jointly analyzing the data. The first step in IDA is to scale the multisample item-level data to a common metric, which is often done with multiple group item response models (MGM). With invariance constraints tested and imposed, the estimated latent variable scores from the MGM serve as an observed variable in subsequent analyses. This approach was used with empirical multiple group data and different latent variable estimates were obtained for individuals with the same response pattern from different studies. A Monte Carlo simulation study was then conducted to compare the accuracy of latent variable estimates from the MGM, a single-group item response model, and an MGM where group differences are ignored. Results suggest that these alternative approaches led to consistent and equally accurate latent variable estimates. Implications for IDA are discussed.

15.
Development ; 146(6)2019 03 21.
Artigo em Inglês | MEDLINE | ID: mdl-30824551

RESUMO

To quantitatively understand biological processes that occur over many hours or days, it is desirable to image multiple samples simultaneously, and automatically process and analyse the resulting datasets. Here, we present a complete multi-sample preparation, imaging, processing and analysis workflow to determine the development of the vascular volume in zebrafish. Up to five live embryos were mounted and imaged simultaneously over several days using selective plane illumination microscopy (SPIM). The resulting large imagery dataset of several terabytes was processed in an automated manner on a high-performance computer cluster and segmented using a novel segmentation approach that uses images of red blood cells as training data. This analysis yielded a precise quantification of growth characteristics of the whole vascular network, head vasculature and tail vasculature over development. Our multi-sample platform demonstrates effective upgrades to conventional single-sample imaging platforms and paves the way for diverse quantitative long-term imaging studies.


Assuntos
Sistema Cardiovascular/embriologia , Processamento de Imagem Assistida por Computador/métodos , Microscopia de Fluorescência/métodos , Animais , Fenômenos Biológicos , Análise por Conglomerados , Embrião não Mamífero , Proteínas de Fluorescência Verde/metabolismo , Software , Peixe-Zebra
16.
Trends Psychol ; 26(2): 669-702, abr.-jun. 2018. tab, graf
Artigo em Português | LILACS | ID: biblio-963049

RESUMO

Resumo Em uma replicação parcial do estudo de Gomes (2011), foi avaliado o efeito da apresentação de pares de estímulos idênticos após pareamentos corretos em tarefas de emparelhamento Típico e Multimodelo. Participaram do estudo 24 indivíduos com Transtorno do Espectro Autista (TEA), divididos em duas condições, com três blocos de tentativas cada: Típico, Multimodelo e os dois tipos misturados. A ordem de exposição foi contrabalanceada entre os participantes de uma mesma condição. Os dois primeiros blocos continham tentativas de treino e de teste e o terceiro bloco apenas tentativas de teste. Na Condição 1, a cada pareamento correto o estímulo modelo era removido da tela e na Condição 2 era apresentado um estímulo composto com dois elementos iguais. Os resultados mostraram que, independentemente da condição e da ordem de exposição as tarefas, as porcentagens de acerto foram iguais ou maiores nas tentativas de teste de Emparelhamento Típico. Na Condição 2, os escores foram superiores aos da Condição 1. Para indivíduos com TEA, o controle pela relação de identidade pode ser afetado pela organização visual das tarefas e por outras variáveis de procedimento tais como as consequências para os pareamentos, a topografia da resposta (clicar ou arrastar), e o critério nos treinos.


Resumen En una réplica parcial del estudio de Gomes (2011), se evaluó el efecto de la presentación de parejas de estímulos idénticos después de emparejamientos correctos en tareas de igualación Típica y Multimuestra. Participaron del estudio 24 individuos con Trastorno del Espectro Autista (TEA), divididos en dos condiciones con tres bloques de ensayos: Típico, Multimuestra y los dos tipos mezclados. Los dos primeros bloques tenían ensayos de entrenamiento y prueba, mientras que el tercer bloque tuvo solamente ensayos de prueba. En la Condición 1, el estímulo muestra era retirado de la pantalla después de cada respuesta correcta mientras que, en la Condición 2, un estímulo compuesto con dos elementos iguales era presentado. Independiente de la condición y del orden de exposición a las tareas, el porcentaje de aciertos fue igual o mayor en los ensayos de prueba con emparejamiento Típico. Asimismo, los puntajes fueron iguales o más altos en la Condición 2. Para individuos con TEA, el control por las relaciones de identidad puede ser afectado por la organización visual de la tarea y por otras variables de procedimiento, tales como las consecuencias para las respuestas, la topografía de la respuesta y los criterios de entrenamiento.


Abstract In a replication of a study by Gomes (2011), the effect of the presentation of pairs of identical stimuli after correct matching was assessed in Typical and Multi-sample matching-to-sample tasks. Twenty-four individuals with Autism Spectrum Disorder (ASD) were distributed in two conditions, each with three blocks of trials: Typical, Multi-sample and a mix of both trial types. The order of exposure was counterbalanced across participants within a condition. The first two blocks of trials consisted of both training and testing trials; the third block consisted only of test trials. In Condition 1, the sample stimulus was removed from the screen upon correct matching; in Condition 2 correct matching was followed by presentation of a compound stimulus with two equal elements. Results showed that, regardless of experimental condition and order of exposure to matching-to-sample tasks, percentage of correct responses was higher in Typical Matching test trials. In Condition 2, scores were greater than in Condition 1. For individuals with ASD, control by the relation of identity may be affected by the visual organization of the tasks and by other procedural variables such as the consequences for matching responses, response topography (clicking or dragging) and the training criterion.

17.
Stat Methods Med Res ; 27(10): 3092-3103, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-28178877

RESUMO

Polytomous discrimination index is a novel and important diagnostic accuracy measure for multi-category classification. After reconstructing its probabilistic definition, we propose a nonparametric approach to the estimation of polytomous discrimination index based on an empirical sample of biomarker values. In this paper, we provide the finite-sample and asymptotic properties of the proposed estimators and such analytic results may facilitate the statistical inference. Simulation studies are performed to examine the performance of the nonparametric estimators. Two real data examples are analysed to illustrate our methodology.


Assuntos
Interpretação Estatística de Dados , Estatísticas não Paramétricas , Biomarcadores/análise , Bases de Dados Factuais/estatística & dados numéricos , Egito , Humanos , Neoplasias Hepáticas/diagnóstico , Modelos Logísticos , Espectrometria de Massas , Curva ROC
18.
Genome Biol ; 18(1): 163, 2017 08 31.
Artigo em Inglês | MEDLINE | ID: mdl-28859663

RESUMO

BACKGROUND: Whole-genome bisulfite sequencing (WGBS) is the gold standard for studying landscape DNA methylation. Current computational methods for WGBS are mainly designed for gene regulatory regions with multiple under-methylated CpGs (UMCs), such as promoters and enhancers. RESULTS: To reliably predict the functional importance of single isolated UMCs across the genome, which is usually not achievable using traditional methods, we develop a multi-sample-based method. We identified 9421 sparse conserved under-methylated CpGs (scUMCs) from 31 high-quality methylomes, which are enriched in distal interacting anchor regions co-occupied by multiple chromatin-loop factors and are flanked by highly methylated CpGs. Moreover, cell lineage-specific scUMCs are associated with essential developmental genes, regulators of cell differentiation, and chromatin remodeling enzymes. Dynamic methylation levels of scUMCs correlate with the intensity of chromatin interactions and binding of looping factors as well as patterns of gene expression. CONCLUSIONS: We introduce an innovative computational method for the identification of scUMCs, which are novel epigenetic features associated with high-order chromatin structure, opening new directions in the study of the inter-relationships between DNA methylation and chromatin structure.


Assuntos
Cromatina , Ilhas de CpG , Metilação de DNA , Genômica/métodos , Sequência Conservada , Conjuntos de Dados como Assunto , Epigênese Genética , Expressão Gênica , Genoma Humano , Humanos , Sequenciamento Completo do Genoma
19.
Stat Biosci ; 9(1): 13-27, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-28959368

RESUMO

The human microbiome, which includes the collective microbes residing in or on the human body, has a profound influence on the human health. DNA sequencing technology has made the large-scale human microbiome studies possible by using shotgun metagenomic sequencing. One important aspect of data analysis of such metagenomic data is to quantify the bacterial abundances based on the metagenomic sequencing data. Existing methods almost always quantify such abundances one sample at a time, which ignore certain systematic differences in read coverage along the genomes due to GC contents, copy number variation and the bacterial origin of replication. In order to account for such differences in read counts, we propose a multi-sample Poisson model to quantify microbial abundances based on read counts that are assigned to species-specific taxonomic markers. Our model takes into account the marker-specific effects when normalizing the sequencing count data in order to obtain more accurate quantification of the species abundances. Compared to currently available methods on simulated data and real data sets, our method has demonstrated an improved accuracy in bacterial abundance quantification, which leads to more biologically interesting results from downstream data analysis.

20.
BMC Bioinformatics ; 18(1): 259, 2017 May 12.
Artigo em Inglês | MEDLINE | ID: mdl-28499349

RESUMO

BACKGROUND: Exponentially increasing numbers of NGS-based epigenomic datasets in public repositories like GEO constitute an enormous source of information that is invaluable for integrative and comparative studies of gene regulatory mechanisms. One of today's challenges for such studies is to identify functionally informative local and global patterns of chromatin states in order to describe the regulatory impact of the epigenome in normal cell physiology and in case of pathological aberrations. Critically, the most preferred Chromatin ImmunoPrecipitation-Sequencing (ChIP-Seq) is inherently prone to significant variability between assays, which poses significant challenge on comparative studies. One challenge concerns data normalization to adjust sequencing depth variation. RESULTS: Currently existing tools either apply linear scaling corrections and/or are restricted to specific genomic regions, which can be prone to biases. To overcome these restrictions without any external biases, we developed Epimetheus, a genome-wide quantile-based multi-profile normalization tool for histone modification data and related datasets. CONCLUSIONS: Epimetheus has been successfully used to normalize epigenomics data in previous studies on X inactivation in breast cancer and in integrative studies of neuronal cell fate acquisition and tumorigenic transformation; Epimetheus is freely available to the scientific community.


Assuntos
Epigenômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Diferenciação Celular/efeitos dos fármacos , Linhagem Celular , Cromatina/metabolismo , Imunoprecipitação da Cromatina , Células Hep G2 , Histonas/genética , Histonas/metabolismo , Humanos , Tretinoína/farmacologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA