Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Methods Mol Biol ; 1883: 283-302, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30547405

RESUMO

Inferring gene regulatory networks from expression data is a very challenging problem that has raised the interest of the scientific community. Different algorithms have been proposed to try to solve this issue, but it has been shown that different methods have some particular biases and strengths, and none of them is the best across all types of data and datasets. As a result, the idea of aggregating various network inferences through a consensus mechanism naturally arises. In this chapter, a common framework to standardize already proposed consensus methods is presented, and based on this framework different proposals are introduced and analyzed in two different scenarios: Homogeneous and Heterogeneous. The first scenario reflects situations where the networks to be aggregated are rather similar because they are obtained with inference algorithms working on the same data, whereas the second scenario deals with very diverse networks because various sources of data are used to generate the individual networks. A procedure for combining multiple network inference algorithms is analyzed in a systematic way. The results show that there is a very significant difference between these two scenarios, and that the best way to combine networks in the Heterogeneous scenario is not the most commonly used. We show in particular that aggregation in the Heterogeneous scenario can be very beneficial if the individual networks are combined with our new proposed method ScaleLSum.


Assuntos
Redes Reguladoras de Genes , Modelos Genéticos , Biologia de Sistemas/métodos , Aprendizado de Máquina não Supervisionado , Conjuntos de Dados como Assunto , Biologia de Sistemas/instrumentação
3.
Gigascience ; 6(10): 1-7, 2017 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-29020748

RESUMO

Genetic analyses of plant root systems require large datasets of extracted architectural traits. To quantify such traits from images of root systems, researchers often have to choose between automated tools (that are prone to error and extract only a limited number of architectural traits) or semi-automated ones (that are highly time consuming). We trained a Random Forest algorithm to infer architectural traits from automatically extracted image descriptors. The training was performed on a subset of the dataset, then applied to its entirety. This strategy allowed us to (i) decrease the image analysis time by 73% and (ii) extract meaningful architectural traits based on image descriptors. We also show that these traits are sufficient to identify the quantitative trait loci that had previously been discovered using a semi-automated method. We have shown that combining semi-automated image analysis with machine learning algorithms has the power to increase the throughput of large-scale root studies. We expect that such an approach will enable the quantification of more complex root systems for genetic studies. We also believe that our approach could be extended to other areas of plant phenotyping.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador , Aprendizado de Máquina , Raízes de Plantas/genética , Locos de Características Quantitativas , Plântula/genética , Triticum/genética
4.
BioData Min ; 10: 15, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28484519

RESUMO

BACKGROUND: Reverse engineering of gene regulatory networks (GRNs) from gene expression data is a classical challenge in systems biology. Thanks to high-throughput technologies, a massive amount of gene-expression data has been accumulated in the public repositories. Modelling GRNs from multiple experiments (also called integrative analysis) has; therefore, naturally become a standard procedure in modern computational biology. Indeed, such analysis is usually more robust than the traditional approaches, which suffer from experimental biases and the low number of samples by analysing individual datasets. To date, there are mainly two strategies for the problem of interest: the first one ("data merging") merges all datasets together and then infers a GRN whereas the other ("networks ensemble") infers GRNs from every dataset separately and then aggregates them using some ensemble rules (such as ranksum or weightsum). Unfortunately, a thorough comparison of these two approaches is lacking. RESULTS: In this work, we are going to present another meta-analysis approach for inferring GRNs from multiple studies. Our proposed meta-analysis approach, adapted to methods based on pairwise measures such as correlation or mutual information, consists of two steps: aggregating matrices of the pairwise measures from every dataset followed by extracting the network from the meta-matrix. Afterwards, we evaluate the performance of the two commonly used approaches mentioned above and our presented approach with a systematic set of experiments based on in silico benchmarks. CONCLUSIONS: We proposed a first systematic evaluation of different strategies for reverse engineering GRNs from multiple datasets. Experiment results strongly suggest that assembling matrices of pairwise dependencies is a better strategy for network inference than the two commonly used ones.

5.
Front Plant Sci ; 8: 447, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28421089

RESUMO

Root system analysis is a complex task, often performed with fully automated image analysis pipelines. However, the outcome is rarely verified by ground-truth data, which might lead to underestimated biases. We have used a root model, ArchiSimple, to create a large and diverse library of ground-truth root system images (10,000). For each image, three levels of noise were created. This library was used to evaluate the accuracy and usefulness of several image descriptors classically used in root image analysis softwares. Our analysis highlighted that the accuracy of the different traits is strongly dependent on the quality of the images and the type, size, and complexity of the root systems analyzed. Our study also demonstrated that machine learning algorithms can be trained on a synthetic library to improve the estimation of several root system traits. Overall, our analysis is a call to caution when using automatic root image analysis tools. If a thorough calibration is not performed on the dataset of interest, unexpected errors might arise, especially for large and complex root images. To facilitate such calibration, both the image library and the different codes used in the study have been made available to the community.

6.
BMC Bioinformatics ; 16: 312, 2015 Sep 29.
Artigo em Inglês | MEDLINE | ID: mdl-26415849

RESUMO

BACKGROUND: In the last decade, a great number of methods for reconstructing gene regulatory networks from expression data have been proposed. However, very few tools and datasets allow to evaluate accurately and reproducibly those methods. Hence, we propose here a new tool, able to perform a systematic, yet fully reproducible, evaluation of transcriptional network inference methods. RESULTS: Our open-source and freely available Bioconductor package aggregates a large set of tools to assess the robustness of network inference algorithms against different simulators, topologies, sample sizes and noise intensities. CONCLUSIONS: The benchmarking framework that uses various datasets highlights the specialization of some methods toward network types and data. As a result, it is possible to identify the techniques that have broad overall performances.


Assuntos
Redes Reguladoras de Genes , Software , Algoritmos , Área Sob a Curva , Benchmarking , Humanos , Curva ROC
7.
Genome Res ; 22(7): 1334-49, 2012 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-22456606

RESUMO

Gaining insights on gene regulation from large-scale functional data sets is a grand challenge in systems biology. In this article, we develop and apply methods for transcriptional regulatory network inference from diverse functional genomics data sets and demonstrate their value for gene function and gene expression prediction. We formulate the network inference problem in a machine-learning framework and use both supervised and unsupervised methods to predict regulatory edges by integrating transcription factor (TF) binding, evolutionarily conserved sequence motifs, gene expression, and chromatin modification data sets as input features. Applying these methods to Drosophila melanogaster, we predict ∼300,000 regulatory edges in a network of ∼600 TFs and 12,000 target genes. We validate our predictions using known regulatory interactions, gene functional annotations, tissue-specific expression, protein-protein interactions, and three-dimensional maps of chromosome conformation. We use the inferred network to identify putative functions for hundreds of previously uncharacterized genes, including many in nervous system development, which are independently confirmed based on their tissue-specific expression patterns. Last, we use the regulatory network to predict target gene expression levels as a function of TF expression, and find significantly higher predictive power for integrative networks than for motif or ChIP-based networks. Our work reveals the complementarity between physical evidence of regulatory interactions (TF binding, motif conservation) and functional evidence (coordinated expression or chromatin patterns) and demonstrates the power of data integration for network inference and studies of gene regulation at the systems level.


Assuntos
Biologia Computacional/métodos , Drosophila melanogaster/genética , Regulação da Expressão Gênica no Desenvolvimento , Redes Reguladoras de Genes , Genoma de Inseto , Animais , Sequência de Bases , Montagem e Desmontagem da Cromatina , Imunoprecipitação da Cromatina , Mapeamento Cromossômico/métodos , Cromossomos/genética , Cromossomos/metabolismo , Sequência Conservada , Drosophila melanogaster/embriologia , Drosophila melanogaster/metabolismo , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica , Modelos Lineares , Modelos Genéticos , Anotação de Sequência Molecular , Sistema Nervoso/citologia , Sistema Nervoso/embriologia , Sistema Nervoso/metabolismo , Motivos de Nucleotídeos , Especificidade de Órgãos , Ligação Proteica , Mapeamento de Interação de Proteínas , Elementos Reguladores de Transcrição , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
8.
J Air Waste Manag Assoc ; 61(3): 285-94, 2011 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-21416755

RESUMO

Heavy-duty vehicles (HDVs) present a growing energy and environmental concern worldwide. These vehicles rely almost entirely on diesel fuel for propulsion and create problems associated with local pollution, climate change, and energy security. Given these problems and the expected global expansion of HDVs in transportation sectors, industry and governments are pursuing biofuels and natural gas as potential alternative fuels for HDVs. Using recent lifecycle datasets, this paper evaluates the energy and emissions impacts of these fuels in the HDV sector by conducting a total fuel-cycle (TFC) analysis for Class 8 HDVs for six fuel pathways: (1) petroleum to ultra low sulfur diesel; (2) petroleum and soyoil to biodiesel (methyl soy ester); (3) petroleum, ethanol, and oxygenate to e-diesel; (4) petroleum and natural gas to Fischer-Tropsch diesel; (5) natural gas to compressed natural gas; and (6) natural gas to liquefied natural gas. TFC emissions are evaluated for three greenhouse gases (GHGs) (carbon dioxide, nitrous oxide, and methane) and five other pollutants (volatile organic compounds, carbon monoxide, nitrogen oxides, particulate matter, and sulfur oxides), along with estimates of total energy and petroleum consumption associated with each of the six fuel pathways. Results show definite advantages with biodiesel and compressed natural gas for most pollutants, negligible benefits for e-diesel, and increased GHG emissions for liquefied natural gas and Fischer-Tropsch diesel (from natural gas).


Assuntos
Biocombustíveis , Combustíveis Fósseis , Veículos Automotores , Emissões de Veículos
9.
Science ; 330(6012): 1787-97, 2010 Dec 24.
Artigo em Inglês | MEDLINE | ID: mdl-21177974

RESUMO

To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.


Assuntos
Cromatina , Drosophila melanogaster/genética , Redes Reguladoras de Genes , Genoma de Inseto , Anotação de Sequência Molecular , Animais , Sítios de Ligação , Cromatina/genética , Cromatina/metabolismo , Biologia Computacional/métodos , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/crescimento & desenvolvimento , Drosophila melanogaster/metabolismo , Epigênese Genética , Regulação da Expressão Gênica , Genes de Insetos , Genômica/métodos , Histonas/metabolismo , Nucleossomos/genética , Nucleossomos/metabolismo , Regiões Promotoras Genéticas , Pequeno RNA não Traduzido/genética , Pequeno RNA não Traduzido/metabolismo , Fatores de Transcrição/metabolismo , Transcrição Genética
10.
Artigo em Inglês | MEDLINE | ID: mdl-19148299

RESUMO

The reverse engineering of transcription regulatory networks from expression data is gaining large interest in the bioinformatics community. An important family of inference techniques is represented by algorithms based on information theoretic measures which rely on the computation of pairwise mutual information. This paper aims to study the impact of the entropy estimator on the quality of the inferred networks. This is done by means of a comprehensive study which takes into consideration three state-of-the-art mutual information algorithms: ARACNE, CLR, and MRNET. Two different setups are considered in this work. The first one considers a set of 12 synthetically generated datasets to compare 8 different entropy estimators and three network inference algorithms. The two methods emerging as the most accurate ones from the first set of experiments are the MRNET method combined with the newly applied Spearman correlation and the CLR method combined with the Pearson correlation. The validation of these two techniques is then carried out on a set of 10 public domain microarray datasets measuring the transcriptional regulatory activity in the yeast organism.

11.
BMC Bioinformatics ; 9: 461, 2008 Oct 29.
Artigo em Inglês | MEDLINE | ID: mdl-18959772

RESUMO

RESULTS: This paper presents the R/Bioconductor package minet (version 1.1.6) which provides a set of functions to infer mutual information networks from a dataset. Once fed with a microarray dataset, the package returns a network where nodes denote genes, edges model statistical dependencies between genes and the weight of an edge quantifies the statistical evidence of a specific (e.g transcriptional) gene-to-gene interaction. Four different entropy estimators are made available in the package minet (empirical, Miller-Madow, Schurmann-Grassberger and shrink) as well as four different inference methods, namely relevance networks, ARACNE, CLR and MRNET. Also, the package integrates accuracy assessment tools, like F-scores, PR-curves and ROC-curves in order to compare the inferred network with a reference one. CONCLUSION: The package minet provides a series of tools for inferring transcriptional networks from microarray data. It is freely available from the Comprehensive R Archive Network (CRAN) as well as from the Bioconductor website.


Assuntos
Biologia Computacional/métodos , Algoritmos , Interpretação Estatística de Dados , Reações Falso-Positivas , Perfilação da Expressão Gênica/métodos , Internet , Modelos Estatísticos , Análise de Sequência com Séries de Oligonucleotídeos , Reconhecimento Automatizado de Padrão/métodos , Linguagens de Programação , Curva ROC , Reprodutibilidade dos Testes , Software , Transcrição Genética
12.
J Air Waste Manag Assoc ; 57(1): 102-10, 2007 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17269235

RESUMO

Regional and global air pollution from marine transportation is a growing concern. In discerning the sources of such pollution, researchers have become interested in tracking where along the total fuel life cycle these emissions occur. In addition, new efforts to introduce alternative fuels in marine vessels have raised questions about the energy use and environmental impacts of such fuels. To address these issues, this paper presents the Total Energy and Emissions Analysis for Marine Systems (TEAMS) model. TEAMS can be used to analyze total fuel life cycle emissions and energy use from marine vessels. TEAMS captures "well-to-hull" emissions, that is, emissions along the entire fuel pathway, including extraction, processing, distribution, and use in vessels. TEAMS conducts analyses for six fuel pathways: (1) petroleum to residual oil, (2) petroleum to conventional diesel, (3) petroleum to low-sulfur diesel, (4) natural gas to compressed natural gas, (5) natural gas to Fischer-Tropsch diesel, and (6) soybeans to biodiesel. TEAMS calculates total fuel-cycle emissions of three greenhouse gases (carbon dioxide, nitrous oxide, and methane) and five criteria pollutants (volatile organic compounds, carbon monoxide, nitrogen oxides, particulate matter with aerodynamic diameters of 10 microm or less, and sulfur oxides). TEAMS also calculates total energy consumption, fossil fuel consumption, and petroleum consumption associated with each of its six fuel cycles. TEAMS can be used to study emissions from a variety of user-defined vessels. This paper presents TEAMS and provides example modeling results for three case studies using alternative fuels: a passenger ferry, a tanker vessel, and a container ship.


Assuntos
Poluentes Ocupacionais do Ar/análise , Poluição do Ar/análise , Óleos Combustíveis/estatística & dados numéricos , Navios , Algoritmos , Efeito Estufa
13.
Artigo em Inglês | MEDLINE | ID: mdl-18354736

RESUMO

The paper presents MRNET, an original method for inferring genetic networks from microarray data. The method is based on maximum relevance/minimum redundancy (MRMR), an effective information-theoretic technique for feature selection in supervised learning. The MRMR principle consists in selecting among the least redundant variables the ones that have the highest mutual information with the target. MRNET extends this feature selection principle to networks in order to infer gene-dependence relationships from microarray data. The paper assesses MRNET by benchmarking it against RELNET, CLR, and ARACNE, three state-of-the-art information-theoretic methods for large (up to several thousands of genes) network inference. Experimental results on thirty synthetically generated microarray datasets show that MRNET is competitive with these methods.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA