Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
1.
Nature ; 598(7879): 103-110, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34616066

RESUMO

Single-cell transcriptomics can provide quantitative molecular signatures for large, unbiased samples of the diverse cell types in the brain1-3. With the proliferation of multi-omics datasets, a major challenge is to validate and integrate results into a biological understanding of cell-type organization. Here we generated transcriptomes and epigenomes from more than 500,000 individual cells in the mouse primary motor cortex, a structure that has an evolutionarily conserved role in locomotion. We developed computational and statistical methods to integrate multimodal data and quantitatively validate cell-type reproducibility. The resulting reference atlas-containing over 56 neuronal cell types that are highly replicable across analysis methods, sequencing technologies and modalities-is a comprehensive molecular and genomic account of the diverse neuronal and non-neuronal cell types in the mouse primary motor cortex. The atlas includes a population of excitatory neurons that resemble pyramidal cells in layer 4 in other cortical regions4. We further discovered thousands of concordant marker genes and gene regulatory elements for these cell types. Our results highlight the complex molecular regulation of cell types in the brain and will directly enable the design of reagents to target specific cell types in the mouse primary motor cortex for functional analysis.


Assuntos
Epigenômica , Perfilação da Expressão Gênica , Córtex Motor/citologia , Neurônios/classificação , Análise de Célula Única , Transcriptoma , Animais , Atlas como Assunto , Conjuntos de Dados como Assunto , Epigênese Genética , Feminino , Masculino , Camundongos , Córtex Motor/anatomia & histologia , Neurônios/citologia , Neurônios/metabolismo , Especificidade de Órgãos , Reprodutibilidade dos Testes
2.
Biostatistics ; 24(1): 1-16, 2022 12 12.
Artigo em Inglês | MEDLINE | ID: mdl-34467372

RESUMO

High-dimensional biological data collection across heterogeneous groups of samples has become increasingly common, creating high demand for dimensionality reduction techniques that capture underlying structure of the data. Discovering low-dimensional embeddings that describe the separation of any underlying discrete latent structure in data is an important motivation for applying these techniques since these latent classes can represent important sources of unwanted variability, such as batch effects, or interesting sources of signal such as unknown cell types. The features that define this discrete latent structure are often hard to identify in high-dimensional data. Principal component analysis (PCA) is one of the most widely used methods as an unsupervised step for dimensionality reduction. This reduction technique finds linear transformations of the data which explain total variance. When the goal is detecting discrete structure, PCA is applied with the assumption that classes will be separated in directions of maximum variance. However, PCA will fail to accurately find discrete latent structure if this assumption does not hold. Visualization techniques, such as t-Distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP), attempt to mitigate these problems with PCA by creating a low-dimensional space where similar objects are modeled by nearby points in the low-dimensional embedding and dissimilar objects are modeled by distant points with high probability. However, since t-SNE and UMAP are computationally expensive, often a PCA reduction is done before applying them which makes it sensitive to PCAs downfalls. Also, tSNE is limited to only two or three dimensions as a visualization tool, which may not be adequate for retaining discriminatory information. The linear transformations of PCA are preferable to non-linear transformations provided by methods like t-SNE and UMAP for interpretable feature weights. Here, we propose iterative discriminant analysis (iDA), a dimensionality reduction technique designed to mitigate these limitations. iDA produces an embedding that carries discriminatory information which optimally separates latent clusters using linear transformations that permit post hoc analysis to determine features that define these latent structures.


Assuntos
Algoritmos , Humanos , Análise de Componente Principal
3.
PLoS Comput Biol ; 17(11): e1009442, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34784344

RESUMO

It is challenging to associate features such as human health outcomes, diet, environmental conditions, or other metadata to microbial community measurements, due in part to their quantitative properties. Microbiome multi-omics are typically noisy, sparse (zero-inflated), high-dimensional, extremely non-normal, and often in the form of count or compositional measurements. Here we introduce an optimized combination of novel and established methodology to assess multivariable association of microbial community features with complex metadata in population-scale observational studies. Our approach, MaAsLin 2 (Microbiome Multivariable Associations with Linear Models), uses generalized linear and mixed models to accommodate a wide variety of modern epidemiological studies, including cross-sectional and longitudinal designs, as well as a variety of data types (e.g., counts and relative abundances) with or without covariates and repeated measurements. To construct this method, we conducted a large-scale evaluation of a broad range of scenarios under which straightforward identification of meta-omics associations can be challenging. These simulation studies reveal that MaAsLin 2's linear model preserves statistical power in the presence of repeated measures and multiple covariates, while accounting for the nuances of meta-omics features and controlling false discovery. We also applied MaAsLin 2 to a microbial multi-omics dataset from the Integrative Human Microbiome (HMP2) project which, in addition to reproducing established results, revealed a unique, integrated landscape of inflammatory bowel diseases (IBD) across multiple time points and omics profiles.


Assuntos
Biologia Computacional , Microbioma Gastrointestinal , Análise Multivariada , Simulação por Computador , Humanos , Doenças Inflamatórias Intestinais/genética , Doenças Inflamatórias Intestinais/metabolismo , Doenças Inflamatórias Intestinais/patologia
4.
Bioinformatics ; 36(Suppl_1): i102-i110, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-32657377

RESUMO

MOTIVATION: Advances in sequencing technology, inference algorithms and differential testing methodology have enabled transcript-level analysis of RNA-seq data. Yet, the inherent inferential uncertainty in transcript-level abundance estimation, even among the most accurate approaches, means that robust transcript-level analysis often remains a challenge. Conversely, gene-level analysis remains a common and robust approach for understanding RNA-seq data, but it coarsens the resulting analysis to the level of genes, even if the data strongly support specific transcript-level effects. RESULTS: We introduce a new data-driven approach for grouping together transcripts in an experiment based on their inferential uncertainty. Transcripts that share large numbers of ambiguously-mapping fragments with other transcripts, in complex patterns, often cannot have their abundances confidently estimated. Yet, the total transcriptional output of that group of transcripts will have greatly reduced inferential uncertainty, thus allowing more robust and confident downstream analysis. Our approach, implemented in the tool terminus, groups together transcripts in a data-driven manner allowing transcript-level analysis where it can be confidently supported, and deriving transcriptional groups where the inferential uncertainty is too high to support a transcript-level result. AVAILABILITY AND IMPLEMENTATION: Terminus is implemented in Rust, and is freely available and open source. It can be obtained from https://github.com/COMBINE-lab/Terminus. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica , Software , Algoritmos , RNA-Seq , Análise de Sequência de RNA
5.
Genome Res ; 26(8): 1110-23, 2016 08.
Artigo em Inglês | MEDLINE | ID: mdl-27311443

RESUMO

Complex gene expression patterns are mediated by the binding of transcription factors (TFs) to specific genomic loci. The in vivo occupancy of a TF is, in large part, determined by the TF's DNA binding interaction partners, motivating genomic context-based models of TF occupancy. However, approaches thus far have assumed a uniform TF binding model to explain genome-wide cell-type-specific binding sites. Therefore, the cell type heterogeneity of TF occupancy models, as well as the extent to which binding rules underlying a TF's occupancy are shared across cell types, has not been investigated. Here, we develop an ensemble-based approach (TRISECT) to identify the heterogeneous binding rules for cell-type-specific TF occupancy and analyze the inter-cell-type sharing of such rules. Comprehensive analysis of 23 TFs, each with ChIP-seq data in four to 12 different cell types, shows that by explicitly capturing the heterogeneity of binding rules, TRISECT accurately identifies in vivo TF occupancy. Importantly, many of the binding rules derived from individual cell types are shared across cell types and reveal distinct yet functionally coherent putative target genes in different cell types. Closer inspection of the predicted cell-type-specific interaction partners provides insights into the context-specific functional landscape of a TF. Together, our novel ensemble-based approach reveals, for the first time, a widespread heterogeneity of binding rules, comprising the interaction partners within a cell type, many of which nevertheless transcend cell types. Notably, the putative targets of shared binding rules in different cell types, while distinct, exhibit significant functional coherence.


Assuntos
Proteínas de Ligação a DNA/genética , Heterogeneidade Genética , Ligação Proteica/genética , Fatores de Transcrição/genética , Sítios de Ligação/genética , Linhagem da Célula/genética , Biologia Computacional , Regulação da Expressão Gênica , Genômica , Humanos , Sensibilidade e Especificidade
6.
Biostatistics ; 19(2): 185-198, 2018 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-29036413

RESUMO

Between-sample normalization is a critical step in genomic data analysis to remove systematic bias and unwanted technical variation in high-throughput data. Global normalization methods are based on the assumption that observed variability in global properties is due to technical reasons and are unrelated to the biology of interest. For example, some methods correct for differences in sequencing read counts by scaling features to have similar median values across samples, but these fail to reduce other forms of unwanted technical variation. Methods such as quantile normalization transform the statistical distributions across samples to be the same and assume global differences in the distribution are induced by only technical variation. However, it remains unclear how to proceed with normalization if these assumptions are violated, for example, if there are global differences in the statistical distributions between biological conditions or groups, and external information, such as negative or control features, is not available. Here, we introduce a generalization of quantile normalization, referred to as smooth quantile normalization (qsmooth), which is based on the assumption that the statistical distribution of each sample should be the same (or have the same distributional shape) within biological groups or conditions, but allowing that they may differ between groups. We illustrate the advantages of our method on several high-throughput datasets with global differences in distributions corresponding to different biological conditions. We also perform a Monte Carlo simulation study to illustrate the bias-variance tradeoff and root mean squared error of qsmooth compared to other global normalization methods. A software implementation is available from https://github.com/stephaniehicks/qsmooth.


Assuntos
Bioestatística/métodos , Interpretação Estatística de Dados , Genômica/estatística & dados numéricos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Modelos Estatísticos , Humanos
8.
Nat Methods ; 12(2): 115-21, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25633503

RESUMO

Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors.


Assuntos
Biologia Computacional , Perfilação da Expressão Gênica , Genômica/métodos , Ensaios de Triagem em Larga Escala/métodos , Software , Linguagens de Programação , Interface Usuário-Computador
9.
Nat Methods ; 11(9): 938-40, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-25086505

RESUMO

Visualization is an integral aspect of genomics data analysis. Algorithmic-statistical analysis and interactive visualization are most effective when used iteratively. Epiviz (http://epiviz.cbcb.umd.edu/), a web-based genome browser, and the Epivizr Bioconductor package allow interactive, extensible and reproducible visualization within a state-of-the-art data-analysis platform.


Assuntos
Mapeamento Cromossômico/métodos , Mineração de Dados/métodos , Bases de Dados Genéticas , Genômica/métodos , Internet , Software , Interface Usuário-Computador , Algoritmos , Sistemas de Gerenciamento de Base de Dados
10.
Bioinformatics ; 32(24): 3836-3838, 2016 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-27540268

RESUMO

Sequencing and microarray samples often are collected or processed in multiple batches or at different times. This often produces technical biases that can lead to incorrect results in the downstream analysis. There are several existing batch adjustment tools for '-omics' data, but they do not indicate a priori whether adjustment needs to be conducted or how correction should be applied. We present a software pipeline, BatchQC, which addresses these issues using interactive visualizations and statistics that evaluate the impact of batch effects in a genomic dataset. BatchQC can also apply existing adjustment tools and allow users to evaluate their benefits interactively. We used the BatchQC pipeline on both simulated and real data to demonstrate the effectiveness of this software toolkit. AVAILABILITY AND IMPLEMENTATION: BatchQC is available through Bioconductor: http://bioconductor.org/packages/BatchQC and GitHub: https://github.com/mani2012/BatchQC CONTACT: wej@bu.eduSupplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Software , Genoma , Humanos , Interface Usuário-Computador
11.
BMC Genomics ; 17: 440, 2016 06 08.
Artigo em Inglês | MEDLINE | ID: mdl-27277524

RESUMO

BACKGROUND: Enterotoxigenic Escherichia coli (ETEC) is a major cause of diarrhea in inhabitants from low-income countries and in visitors to these countries. The impact of the human intestinal microbiota on the initiation and progression of ETEC diarrhea is not yet well understood. RESULTS: We used 16S rRNA (ribosomal RNA) gene sequencing to study changes in the fecal microbiota of 12 volunteers during a human challenge study with ETEC (H10407) and subsequent treatment with ciprofloxacin. Five subjects developed severe diarrhea and seven experienced few or no symptoms. Diarrheal symptoms were associated with high concentrations of fecal E. coli as measured by quantitative culture, quantitative PCR, and normalized number of 16S rRNA gene sequences. Large changes in other members of the microbiota varied greatly from individual to individual, whether or not diarrhea occurred. Nonetheless the variation within an individual was small compared to variation between individuals. Ciprofloxacin treatment reorganized microbiota populations; however, the original structure was largely restored at one and three month follow-up visits. CONCLUSION: Symptomatic ETEC infections, but not asymptomatic infections, were associated with high fecal concentrations of E. coli. Both infection and ciprofloxacin treatment caused variable changes in other bacteria that generally reverted to baseline levels after three months.


Assuntos
Ciprofloxacina/uso terapêutico , Escherichia coli Enterotoxigênica/efeitos dos fármacos , Escherichia coli Enterotoxigênica/fisiologia , Infecções por Escherichia coli/tratamento farmacológico , Infecções por Escherichia coli/microbiologia , Microbioma Gastrointestinal/efeitos dos fármacos , Adulto , Ciprofloxacina/farmacologia , Diarreia/tratamento farmacológico , Diarreia/microbiologia , Fezes/microbiologia , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Metagenoma , Metagenômica/métodos , Pessoa de Meia-Idade , RNA Ribossômico 16S , Curva ROC , Resultado do Tratamento , Adulto Jovem
12.
Nat Methods ; 10(12): 1200-2, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24076764

RESUMO

We introduce a methodology to assess differential abundance in sparse high-throughput microbial marker-gene survey data. Our approach, implemented in the metagenomeSeq Bioconductor package, relies on a novel normalization technique and a statistical model that accounts for undersampling-a common feature of large-scale marker-gene studies. Using simulated data and several published microbiota data sets, we show that metagenomeSeq outperforms the tools currently used in this field.


Assuntos
Marcadores Genéticos , Metagenômica/métodos , Microbiota , RNA Ribossômico 16S/genética , Algoritmos , Animais , Área Sob a Curva , Análise por Conglomerados , Simulação por Computador , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Variação Genética , Humanos , Intestinos/microbiologia , Camundongos , Modelos Genéticos , Modelos Estatísticos , Distribuição Normal , Fenótipo , Análise de Sequência de DNA , Software
13.
BMC Cancer ; 16: 88, 2016 Feb 11.
Artigo em Inglês | MEDLINE | ID: mdl-26868017

RESUMO

BACKGROUND: Large mega base-pair genomic regions show robust alterations in DNA methylation levels in multiple cancers. A vast majority of these regions are hypomethylated in cancers. These regions are generally enriched for CpG islands, Lamin Associated Domains and Large organized chromatin lysine modification domains, and are associated with stochastic variability in gene expression. Given the size and consistency of hypomethylated blocks (HMB) across cancer types, we hypothesized that the immediate causes of methylation instability are likely to be encoded in the genomic region near HMB boundaries, in terms of specific genomic or epigenomic signatures. However, a detailed characterization of the HMB boundaries has not been reported. METHOD: Here, we focused on ~13 k HMBs, encompassing approximately half of the genome, identified in colon cancer. We modeled the genomic features of HMB boundaries by Random Forest to identify their salient features, in terms of transcription factor (TF) binding motifs. Additionally we analyzed various epigenomic marks, and chromatin structural features of HMB boundaries relative to the non-HMB genomic regions. RESULT: We found that the classical promoter epigenomic mark--H3K4me3, is highly enriched at HMB boundaries, as are CTCF bound sites. HMB boundaries harbor distinct combinations of TF motifs. Our Random Forest model based on TF motifs can accurately distinguish boundaries not only from regions inside and outside HMBs, but surprisingly, from active promoters as well. Interestingly, the distinguishing TFs and their interacting proteins are involved in chromatin modification. Finally, HMB boundaries significantly coincide with the boundaries of Topologically Associating Domains of the chromatin. CONCLUSION: Our analyses suggest that the overall architecture of HMBs is guided by pre-existing chromatin architecture, and are associated with aberrant activity of promoter-like sequences at the boundary.


Assuntos
Neoplasias do Colo/genética , Metilação de DNA/genética , Epigenômica , Genoma Humano , Linhagem Celular Tumoral , Cromatina/genética , Neoplasias do Colo/patologia , Ilhas de CpG/genética , Histonas/genética , Humanos , Regiões Promotoras Genéticas
14.
Nat Rev Genet ; 11(10): 733-9, 2010 10.
Artigo em Inglês | MEDLINE | ID: mdl-20838408

RESUMO

High-throughput technologies are widely used, for example to assay genetic variants, gene and protein expression, and epigenetic modifications. One often overlooked complication with such studies is batch effects, which occur because measurements are affected by laboratory conditions, reagent lots and personnel differences. This becomes a major problem when batch effects are correlated with an outcome of interest and lead to incorrect conclusions. Using both published studies and our own analyses, we argue that batch effects (as well as other technical and biological artefacts) are widespread and critical to address. We review experimental and computational approaches for doing so.


Assuntos
Biotecnologia/métodos , Genômica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise de Sequência de DNA/métodos , Biotecnologia/normas , Biotecnologia/estatística & dados numéricos , Biologia Computacional/métodos , Genômica/normas , Genômica/estatística & dados numéricos , Análise de Sequência com Séries de Oligonucleotídeos/normas , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Publicações Periódicas como Assunto/normas , Projetos de Pesquisa/normas , Projetos de Pesquisa/estatística & dados numéricos , Análise de Sequência de DNA/normas , Análise de Sequência de DNA/estatística & dados numéricos
15.
BMC Bioinformatics ; 13: 272, 2012 Oct 22.
Artigo em Inglês | MEDLINE | ID: mdl-23088656

RESUMO

BACKGROUND: Early screening for cancer is arguably one of the greatest public health advances over the last fifty years. However, many cancer screening tests are invasive (digital rectal exams), expensive (mammograms, imaging) or both (colonoscopies). This has spurred growing interest in developing genomic signatures that can be used for cancer diagnosis and prognosis. However, progress has been slowed by heterogeneity in cancer profiles and the lack of effective computational prediction tools for this type of data. RESULTS: We developed anti-profiles as a first step towards translating experimental findings suggesting that stochastic across-sample hyper-variability in the expression of specific genes is a stable and general property of cancer into predictive and diagnostic signatures. Using single-chip microarray normalization and quality assessment methods, we developed an anti-profile for colon cancer in tissue biopsy samples. To demonstrate the translational potential of our findings, we applied the signature developed in the tissue samples, without any further retraining or normalization, to screen patients for colon cancer based on genomic measurements from peripheral blood in an independent study (AUC of 0.89). This method achieved higher accuracy than the signature underlying commercially available peripheral blood screening tests for colon cancer (AUC of 0.81). We also confirmed the existence of hyper-variable genes across a range of cancer types and found that a significant proportion of tissue-specific genes are hyper-variable in cancer. Based on these observations, we developed a universal cancer anti-profile that accurately distinguishes cancer from normal regardless of tissue type (ten-fold cross-validation AUC > 0.92). CONCLUSIONS: We have introduced anti-profiles as a new approach for developing cancer genomic signatures that specifically takes advantage of gene expression heterogeneity. We have demonstrated that anti-profiles can be successfully applied to develop peripheral-blood based diagnostics for cancer and used anti-profiles to develop a highly accurate universal cancer signature. By using single-chip normalization and quality assessment methods, no further retraining of signatures developed by the anti-profile approach would be required before their application in clinical settings. Our results suggest that anti-profiles may be used to develop inexpensive and non-invasive universal cancer screening tests.


Assuntos
Neoplasias do Colo/genética , Perfilação da Expressão Gênica/métodos , Área Sob a Curva , Biomarcadores Tumorais/sangue , Neoplasias do Colo/diagnóstico , Variação Genética , Genômica , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Prognóstico , Transcriptoma
16.
BMC Bioinformatics ; 13: 98, 2012 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-22587526

RESUMO

BACKGROUND: In systems biology, the task of reverse engineering gene pathways from data has been limited not just by the curse of dimensionality (the interaction space is huge) but also by systematic error in the data. The gene expression barcode reduces spurious association driven by batch effects and probe effects. The binary nature of the resulting expression calls lends itself perfectly to modern regularization approaches that thrive in high-dimensional settings. RESULTS: The Partitioned LASSO-Patternsearch algorithm is proposed to identify patterns of multiple dichotomous risk factors for outcomes of interest in genomic studies. A partitioning scheme is used to identify promising patterns by solving many LASSO-Patternsearch subproblems in parallel. All variables that survive this stage proceed to an aggregation stage where the most significant patterns are identified by solving a reduced LASSO-Patternsearch problem in just these variables. This approach was applied to genetic data sets with expression levels dichotomized by gene expression bar code. Most of the genes and second-order interactions thus selected and are known to be related to the outcomes. CONCLUSIONS: We demonstrate with simulations and data analyses that the proposed method not only selects variables and patterns more accurately, but also provides smaller models with better prediction accuracy, in comparison to several alternative methodologies.


Assuntos
Algoritmos , Simulação por Computador , Perfilação da Expressão Gênica/estatística & dados numéricos , Expressão Gênica , Modelos Genéticos , Neoplasias da Mama/genética , Neoplasias da Mama/mortalidade , Feminino , Genômica , Humanos
18.
Proc Natl Acad Sci U S A ; 106(20): 8128-33, 2009 May 19.
Artigo em Inglês | MEDLINE | ID: mdl-19420224

RESUMO

We present a method for examining the relative influence of familial, genetic, and environmental covariate information in flexible nonparametric risk models. Our goal is investigating the relative importance of these three sources of information as they are associated with a particular outcome. To that end, we developed a method for incorporating arbitrary pedigree information in a smoothing spline ANOVA (SS-ANOVA) model. By expressing pedigree data as a positive semidefinite kernel matrix, the SS-ANOVA model is able to estimate a log-odds ratio as a multicomponent function of several variables: one or more functional components representing information from environmental covariates and/or genetic marker data and another representing pedigree relationships. We report a case study on models for retinal pigmentary abnormalities in the Beaver Dam Eye Study. Our model verifies known facts about the epidemiology of this eye lesion--found in eyes with early age-related macular degeneration--and shows significantly increased predictive ability in models that include all three of the genetic, environmental, and familial data sources. The case study also shows that models that contain only two of these data sources, that is, pedigree-environmental covariates, or pedigree-genetic markers, or environmental covariates-genetic markers, have comparable predictive ability, but less than the model with all three. This result is consistent with the notions that genetic marker data encode--at least in part--pedigree data, and that familial correlations encode shared environment data as well.


Assuntos
Suscetibilidade a Doenças/etiologia , Modelos Teóricos , Risco , Adulto , Idoso , Idoso de 80 Anos ou mais , Análise de Variância , Simulação por Computador , Meio Ambiente , Saúde da Família , Marcadores Genéticos , Humanos , Degeneração Macular/etiologia , Pessoa de Meia-Idade , Linhagem , Polimorfismo de Nucleotídeo Único , Curva ROC
19.
Nat Med ; 27(11): 1885-1892, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34789871

RESUMO

The particularly interdisciplinary nature of human microbiome research makes the organization and reporting of results spanning epidemiology, biology, bioinformatics, translational medicine and statistics a challenge. Commonly used reporting guidelines for observational or genetic epidemiology studies lack key features specific to microbiome studies. Therefore, a multidisciplinary group of microbiome epidemiology researchers adapted guidelines for observational and genetic studies to culture-independent human microbiome studies, and also developed new reporting elements for laboratory, bioinformatics and statistical analyses tailored to microbiome studies. The resulting tool, called 'Strengthening The Organization and Reporting of Microbiome Studies' (STORMS), is composed of a 17-item checklist organized into six sections that correspond to the typical sections of a scientific publication, presented as an editable table for inclusion in supplementary materials. The STORMS checklist provides guidance for concise and complete reporting of microbiome studies that will facilitate manuscript preparation, peer review, and reader comprehension of publications and comparative analysis of published results.


Assuntos
Biologia Computacional/métodos , Disbiose/microbiologia , Microbiota/fisiologia , Estudos Observacionais como Assunto/métodos , Projetos de Pesquisa , Humanos , Ciência Translacional Biomédica
20.
Mol Biol Evol ; 26(10): 2363-72, 2009 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-19602540

RESUMO

Microarray platforms are used increasingly to make comparative inferences through genome-wide surveys of gene expression. Although recent studies focus on describing the evidence for natural selection using estimates of the within- and between-taxa mutational variances, these methods do not explicitly or flexibly account for predicted nonindependence due to phylogenetic associations between measurements. In the interest of parsing the effects of selection: we introduce a mixture model for the comparative analysis of variation in gene expression across multiple taxa. This class of models isolates the phylogenetic signal from the nonphylogenetic and the heritable signal from the nonheritable while measuring the proper amount of correction. As a result, the mixture model resolves outstanding differences between existing models, relates different ways to estimate the across taxa variance, and induces a likelihood ratio test for selection. We investigate by simulation and application the feasibility and utility of estimation of the required parameters and the power of the proposed test. We illustrate analysis under this mixture model with a gene duplication family data set.


Assuntos
Evolução Molecular , Regulação Fúngica da Expressão Gênica , Modelos Genéticos , Filogenia , Saccharomyces cerevisiae/genética , Análise de Variância , Calibragem , Simulação por Computador , Funções Verossimilhança , Família Multigênica/genética
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa