Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 298
Filtrar
1.
BMC Bioinformatics ; 25(1): 245, 2024 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-39030497

RESUMO

BACKGROUND: Inference of Gene Regulatory Networks (GRNs) is a difficult and long-standing question in Systems Biology. Numerous approaches have been proposed with the latest methods exploring the richness of single-cell data. One of the current difficulties lies in the fact that many methods of GRN inference do not result in one proposed GRN but in a collection of plausible networks that need to be further refined. In this work, we present a Design of Experiment strategy to use as a second stage after the inference process. It is specifically fitted for identifying the next most informative experiment to perform for deciding between multiple network topologies, in the case where proposed GRNs are executable models. This strategy first performs a topological analysis to reduce the number of perturbations that need to be tested, then predicts the outcome of the retained perturbations by simulation of the GRNs and finally compares predictions with novel experimental data. RESULTS: We apply this method to the results of our divide-and-conquer algorithm called WASABI, adapt its gene expression model to produce perturbations and compare our predictions with experimental results. We show that our networks were able to produce in silico predictions on the outcome of a gene knock-out, which were qualitatively validated for 48 out of 49 genes. Finally, we eliminate as many as two thirds of the candidate networks for which we could identify an incorrect topology, thus greatly improving the accuracy of our predictions. CONCLUSION: These results both confirm the inference accuracy of WASABI and show how executable gene expression models can be leveraged to further refine the topology of inferred GRNs. We hope this strategy will help systems biologists further explore their data and encourage the development of more executable GRN models.


Assuntos
Algoritmos , Redes Reguladoras de Genes , Redes Reguladoras de Genes/genética , Biologia de Sistemas/métodos , Biologia Computacional/métodos , Simulação por Computador , Modelos Genéticos
2.
Sci Total Environ ; 946: 174528, 2024 Oct 10.
Artigo em Inglês | MEDLINE | ID: mdl-38971243

RESUMO

Soil aggregates are crucial for soil organic carbon (OC) accumulation. This study, utilizing a 32-year fertilization experiment, investigates whether the core microbiome can elucidate variations in carbon content and decomposition across different aggregate sizes more effectively than broader bacterial and fungal community analyses. Employing ensemble learning algorithms that integrate machine learning with network inference, we found that the core microbiome accounts for an average increase of 26 % and 20 % in the explained variance of PCoA and Adonis analyses, respectively, in response to fertilization. Compared to the control, inorganic and organic fertilizers decreased the decomposition index (DDI) by 31 % and 38 %, respectively. The fungal core microbiome predominantly influenced OC content and DDI in larger macroaggregates (>2000 µm), explaining over 35 % of the variance, while the bacterial core microbiome had a lesser impact, explaining <30 %. Conversely, in smaller aggregates (<2000 µm), the bacterial core microbiome significantly influenced DDI (R2 > 0.2), and the fungal core microbiome more strongly affected OC content (R2 > 0.3). Mantel tests showed that pH is the most significant environmental factor affecting core microbiome composition across all aggregate sizes (Mantel's r > 0.8, P < 0.01). Linear correlation analysis further confirmed that the core microbiome's community structure could accurately predict OC content and DDI in aggregates (R2 > 0.8, P < 0.05). Overall, our findings suggested that the core microbiome provides deeper insights into the variability of aggregate organic carbon content and decomposition, with the bacterial core microbiome playing a particularly pivotal role within the soil aggregates.


Assuntos
Carbono , Aprendizado de Máquina , Microbiota , Microbiologia do Solo , Solo , Carbono/metabolismo , Carbono/análise , Solo/química , Algoritmos , Fungos/metabolismo , Bactérias/metabolismo , Fertilizantes
3.
Annu Rev Stat Appl ; 11(1): 483-504, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38962089

RESUMO

The microbiome represents a hidden world of tiny organisms populating not only our surroundings but also our own bodies. By enabling comprehensive profiling of these invisible creatures, modern genomic sequencing tools have given us an unprecedented ability to characterize these populations and uncover their outsize impact on our environment and health. Statistical analysis of microbiome data is critical to infer patterns from the observed abundances. The application and development of analytical methods in this area require careful consideration of the unique aspects of microbiome profiles. We begin this review with a brief overview of microbiome data collection and processing and describe the resulting data structure. We then provide an overview of statistical methods for key tasks in microbiome data analysis, including data visualization, comparison of microbial abundance across groups, regression modeling, and network inference. We conclude with a discussion and highlight interesting future directions.

4.
Gut Pathog ; 16(1): 37, 2024 Jul 10.
Artigo em Inglês | MEDLINE | ID: mdl-38987816

RESUMO

BACKGROUND: In gut ecosystems, there is a complex interplay of biotic and abiotic interactions that decide the overall fitness of an individual. Divulging the microbe-microbe and microbe-host interactions may lead to better strategies in disease management, as microbes rarely act in isolation. Network inference for microbial communities is often a challenging task limited by both analytical assumptions as well as experimental approaches. Even after the network topologies are obtained, identification of important nodes within the context of underlying disease aetiology remains a convoluted task. We therefore present a network perspective on complex interactions in gut microbial profiles of individuals who have multiple sclerosis with and without Mycobacterium avium subspecies paratuberculosis (MAP) infection. Our exposé is guided by recent advancements in network-wide statistical measures that identify the keystone nodes. We have utilised several centrality measures, including a recently published metric, Integrated View of Influence (IVI), that is robust against biases. RESULTS: The ecological networks were generated on microbial abundance data (n = 69 samples) utilising 16 S rRNA amplification. Using SPIEC-EASI, a sparse inverse covariance estimation approach, we have obtained networks separately for MAP positive (+), MAP negative (-) and healthy controls (as a baseline). Using IVI metric, we identified top 20 keystone nodes and regressed them against covariates of interest using a generalised linear latent variable model. Our analyses suggest Eisenbergiella to be of pivotal importance in MS irrespective of MAP infection. For MAP + cohort, Pyarmidobacter, and Peptoclostridium were predominately the most influential genera, also hinting at an infection model similar to those observed in Inflammatory Bowel Diseases (IBDs). In MAP- cohort, on the other hand, Coprostanoligenes group was the most influential genera that reduces cholesterol and supports the intestinal barrier. CONCLUSIONS: The identification of keystone nodes, their co-occurrences, and associations with the exposome (meta data) advances our understanding of biological interactions through which MAP infection shapes the microbiome in MS individuals, suggesting the link to the inflammatory process of IBDs. The associations presented in this study may lead to development of improved diagnostics and effective vaccines for the management of the disease.

5.
Entropy (Basel) ; 26(6)2024 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-38920504

RESUMO

Brain-computer interfaces have seen extraordinary surges in developments in recent years, and a significant discrepancy now exists between the abundance of available data and the limited headway made in achieving a unified theoretical framework. This discrepancy becomes particularly pronounced when examining the collective neural activity at the micro and meso scale, where a coherent formalization that adequately describes neural interactions is still lacking. Here, we introduce a mathematical framework to analyze systems of natural neurons and interpret the related empirical observations in terms of lattice field theory, an established paradigm from theoretical particle physics and statistical mechanics. Our methods are tailored to interpret data from chronic neural interfaces, especially spike rasters from measurements of single neuron activity, and generalize the maximum entropy model for neural networks so that the time evolution of the system is also taken into account. This is obtained by bridging particle physics and neuroscience, paving the way for particle physics-inspired models of the neocortex.

6.
bioRxiv ; 2024 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-38915499

RESUMO

Cell type-specific alternative splicing (AS) enables differential gene isoform expression between diverse neuron types with distinct identities and functions. Current studies linking individual RNA-binding proteins (RBPs) to AS in a few neuron types underscore the need for holistic modeling. Here, we use network reverse engineering to derive a map of the neuron type-specific AS regulatory landscape from 133 mouse neocortical cell types defined by single-cell transcriptomes. This approach reliably inferred the regulons of 350 RBPs and their cell type-specific activities. Our analysis revealed driving factors delineating neuronal identities, among which we validated Elavl2 as a key RBP for MGE-specific splicing in GABAergic interneurons using an in vitro ESC differentiation system. We also identified a module of exons and candidate regulators specific for long- and short-projection neurons across multiple neuronal classes. This study provides a resource for elucidating splicing regulatory programs that drive neuronal molecular diversity, including those that do not align with gene expression-based classifications.

7.
Genes (Basel) ; 15(6)2024 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-38927622

RESUMO

BACKGROUND: Malaria results in more than 550,000 deaths each year due to drug resistance in the most lethal Plasmodium (P.) species P. falciparum. A full P. falciparum genome was published in 2002, yet 44.6% of its genes have unknown functions. Improving the functional annotation of genes is important for identifying drug targets and understanding the evolution of drug resistance. RESULTS: Genes function by interacting with one another. So, analyzing gene co-expression networks can enhance functional annotations and prioritize genes for wet lab validation. Earlier efforts to build gene co-expression networks in P. falciparum have been limited to a single network inference method or gaining biological understanding for only a single gene and its interacting partners. Here, we explore multiple inference methods and aim to systematically predict functional annotations for all P. falciparum genes. We evaluate each inferred network based on how well it predicts existing gene-Gene Ontology (GO) term annotations using network clustering and leave-one-out crossvalidation. We assess overlaps of the different networks' edges (gene co-expression relationships), as well as predicted functional knowledge. The networks' edges are overall complementary: 47-85% of all edges are unique to each network. In terms of the accuracy of predicting gene functional annotations, all networks yielded relatively high precision (as high as 87% for the network inferred using mutual information), but the highest recall reached was below 15%. All networks having low recall means that none of them capture a large amount of all existing gene-GO term annotations. In fact, their annotation predictions are highly complementary, with the largest pairwise overlap of only 27%. We provide ranked lists of inferred gene-gene interactions and predicted gene-GO term annotations for future use and wet lab validation by the malaria community. CONCLUSIONS: The different networks seem to capture different aspects of the P. falciparum biology in terms of both inferred interactions and predicted gene functional annotations. Thus, relying on a single network inference method should be avoided when possible. SUPPLEMENTARY DATA: Attached.


Assuntos
Redes Reguladoras de Genes , Plasmodium falciparum , Plasmodium falciparum/genética , Malária Falciparum/parasitologia , Malária Falciparum/genética , Humanos , Ontologia Genética , Anotação de Sequência Molecular/métodos , Proteínas de Protozoários/genética
8.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38886006

RESUMO

Reconstructing the topology of gene regulatory network from gene expression data has been extensively studied. With the abundance functional transcriptomic data available, it is now feasible to systematically decipher regulatory interaction dynamics in a logic form such as a Boolean network (BN) framework, which qualitatively indicates how multiple regulators aggregated to affect a common target gene. However, inferring both the network topology and gene interaction dynamics simultaneously is still a challenging problem since gene expression data are typically noisy and data discretization is prone to information loss. We propose a new method for BN inference from time-series transcriptional profiles, called LogicGep. LogicGep formulates the identification of Boolean functions as a symbolic regression problem that learns the Boolean function expression and solve it efficiently through multi-objective optimization using an improved gene expression programming algorithm. To avoid overly emphasizing dynamic characteristics at the expense of topology structure ones, as traditional methods often do, a set of promising Boolean formulas for each target gene is evolved firstly, and a feed-forward neural network trained with continuous expression data is subsequently employed to pick out the final solution. We validated the efficacy of LogicGep using multiple datasets including both synthetic and real-world experimental data. The results elucidate that LogicGep adeptly infers accurate BN models, outperforming other representative BN inference algorithms in both network topology reconstruction and the identification of Boolean functions. Moreover, the execution of LogicGep is hundreds of times faster than other methods, especially in the case of large network inference.


Assuntos
Algoritmos , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Perfilação da Expressão Gênica/métodos , Humanos , Transcriptoma , Software , Biologia Computacional/métodos , Redes Neurais de Computação
9.
ISME Commun ; 4(1): ycae057, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38812718

RESUMO

Microbial communities are diverse biological systems that include taxa from across multiple kingdoms of life. Notably, interactions between bacteria and fungi play a significant role in determining community structure. However, these statistical associations across kingdoms are more difficult to infer than intra-kingdom associations due to the nature of the data involved using standard network inference techniques. We quantify the challenges of cross-kingdom network inference from both theoretical and practical points of view using synthetic and real-world microbiome data. We detail the theoretical issue presented by combining compositional data sets drawn from the same environment, e.g. 16S and ITS sequencing of a single set of samples, and we survey common network inference techniques for their ability to handle this error. We then test these techniques for the accuracy and usefulness of their intra- and inter-kingdom associations by inferring networks from a set of simulated samples for which a ground-truth set of associations is known. We show that while the two methods mitigate the error of cross-kingdom inference, there is little difference between techniques for key practical applications including identification of strong correlations and identification of possible keystone taxa (i.e. hub nodes in the network). Furthermore, we identify a signature of the error caused by transkingdom network inference and demonstrate that it appears in networks constructed using real-world environmental microbiome data.

10.
Interdiscip Sci ; 2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38778003

RESUMO

Gene regulatory network (GRN) inference based on single-cell RNA sequencing data (scRNAseq) plays a crucial role in understanding the regulatory mechanisms between genes. Various computational methods have been employed for GRN inference, but their performance in terms of network accuracy and model generalization is not satisfactory, and their poor performance is caused by high-dimensional data and network sparsity. In this paper, we propose a self-supervised method for gene regulatory network inference using single-cell RNA sequencing data (CVGAE). CVGAE uses graph neural network for inductive representation learning, which merges gene expression data and observed topology into a low-dimensional vector space. The well-trained vectors will be used to calculate mathematical distance of each gene, and further predict interactions between genes. In overall framework, FastICA is implemented to relief computational complexity caused by high dimensional data, and CVGAE adopts multi-stacked GraphSAGE layers as an encoder and an improved decoder to overcome network sparsity. CVGAE is evaluated on several single cell datasets containing four related ground-truth networks, and the result shows that CVGAE achieve better performance than comparative methods. To validate learning and generalization capabilities, CVGAE is applied in few-shot environment by change the ratio of train set and test set. In condition of few-shot, CVGAE obtains comparable or superior performance.

11.
PNAS Nexus ; 3(4): pgae063, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38560526

RESUMO

Network structures underlie the dynamics of many complex phenomena, from gene regulation and foodwebs to power grids and social media. Yet, as they often cannot be observed directly, their connectivities must be inferred from observations of the dynamics to which they give rise. In this work, we present a powerful computational method to infer large network adjacency matrices from time series data using a neural network, in order to provide uncertainty quantification on the prediction in a manner that reflects both the degree to which the inference problem is underdetermined as well as the noise on the data. This is a feature that other approaches have hitherto been lacking. We demonstrate our method's capabilities by inferring line failure locations in the British power grid from its response to a power cut, providing probability densities on each edge and allowing the use of hypothesis testing to make meaningful probabilistic statements about the location of the cut. Our method is significantly more accurate than both Markov-chain Monte Carlo sampling and least squares regression on noisy data and when the problem is underdetermined, while naturally extending to the case of nonlinear dynamics, which we demonstrate by learning an entire cost matrix for a nonlinear model of economic activity in Greater London. Not having been specifically engineered for network inference, this method in fact represents a general parameter estimation scheme that is applicable to any high-dimensional parameter space.

12.
Microb Ecol ; 87(1): 56, 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-38587642

RESUMO

Microbial interactions function as a fundamental unit in complex ecosystems. By characterizing the type of interaction (positive, negative, neutral) occurring in these dynamic systems, one can begin to unravel the role played by the microbial species. Towards this, various methods have been developed to decipher the function of the microbial communities. The current review focuses on the various qualitative and quantitative methods that currently exist to study microbial interactions. Qualitative methods such as co-culturing experiments are visualized using microscopy-based techniques and are combined with data obtained from multi-omics technologies (metagenomics, metabolomics, metatranscriptomics). Quantitative methods include the construction of networks and network inference, computational models, and development of synthetic microbial consortia. These methods provide a valuable clue on various roles played by interacting partners, as well as possible solutions to overcome pathogenic microbes that can cause life-threatening infections in susceptible hosts. Studying the microbial interactions will further our understanding of complex less-studied ecosystems and enable design of effective frameworks for treatment of infectious diseases.


Assuntos
Interações Microbianas , Microbiota , Humanos , Consórcios Microbianos , Técnicas de Cocultura , Redes Comunitárias
13.
Genome Biol ; 25(1): 88, 2024 04 08.
Artigo em Inglês | MEDLINE | ID: mdl-38589899

RESUMO

Inferring gene regulatory networks (GRNs) from single-cell data is challenging due to heuristic limitations. Existing methods also lack estimates of uncertainty. Here we present Probabilistic Matrix Factorization for Gene Regulatory Network Inference (PMF-GRN). Using single-cell expression data, PMF-GRN infers latent factors capturing transcription factor activity and regulatory relationships. Using variational inference allows hyperparameter search for principled model selection and direct comparison to other generative models. We extensively test and benchmark our method using real single-cell datasets and synthetic data. We show that PMF-GRN infers GRNs more accurately than current state-of-the-art single-cell GRN inference methods, offering well-calibrated uncertainty estimates.


Assuntos
Algoritmos , Redes Reguladoras de Genes
14.
Biophys Rev ; 16(1): 57-67, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38495440

RESUMO

Learning how multicellular organs are developed from single cells to different cell types is a fundamental problem in biology. With the high-throughput scRNA-seq technology, computational methods have been developed to reveal the temporal dynamics of single cells from transcriptomic data, from phenomena on cell trajectories to the underlying mechanism that formed the trajectory. There are several distinct families of computational methods including Trajectory Inference (TI), Lineage Tracing (LT), and Gene Regulatory Network (GRN) Inference which are involved in such studies. This review summarizes these computational approaches which use scRNA-seq data to study cell differentiation and cell fate specification as well as the advantages and limitations of different methods. We further discuss how GRNs can potentially affect cell fate decisions and trajectory structures. Supplementary Information: The online version contains supplementary material available at 10.1007/s12551-023-01090-5.

15.
Comput Struct Biotechnol J ; 23: 1036-1050, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38464935

RESUMO

Melanoma, the deadliest form of skin cancer, can metastasize to different organs. Molecular differences between brain and extracranial melanoma metastases are poorly understood. Here, promoter methylation and gene expression of 11 heterogeneous patient-matched pairs of brain and extracranial metastases were analyzed using melanoma-specific gene regulatory networks learned from public transcriptome and methylome data followed by network-based impact propagation of patient-specific alterations. This innovative data analysis strategy allowed to predict potential impacts of patient-specific driver candidate genes on other genes and pathways. The patient-matched metastasis pairs clustered into three robust subgroups with specific downstream targets with known roles in cancer, including melanoma (SG1: RBM38, BCL11B, SG2: GATA3, FES, SG3: SLAMF6, PYCARD). Patient subgroups and ranking of target gene candidates were confirmed in a validation cohort. Summarizing, computational network-based impact analyses of heterogeneous metastasis pairs predicted individual regulatory differences in melanoma brain metastases, cumulating into three consistent subgroups with specific downstream target genes.

16.
Neural Netw ; 172: 106135, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38271920

RESUMO

Pre-trained models such as BERT have made great achievements in natural language processing tasks in recent years. In this paper, we investigate the privacy-preserving pre-training based neural network inference in a two-server framework based on additive secret sharing technique. Our protocol allows a resource-restrained client to request two powerful servers to cooperatively process the natural processing tasks without revealing any useful information about its data. We first design a series of secure sub-protocols for non-linear functions used in BERT model. These sub-protocols are expected to have broad applications and of independent interest. Based on the building sub-protocols, we propose SecBERT, a privacy-preserving pre-training based neural network inference protocol. SecBERT is the first cryptographically secure privacy-preserving pre-training based neural network inference protocol. We show security, efficiency and accuracy of SecBERT protocol through comprehensive theoretical analysis and experiments.


Assuntos
Segurança Computacional , Privacidade , Humanos , Redes Neurais de Computação
17.
J Theor Biol ; 577: 111671, 2024 01 21.
Artigo em Inglês | MEDLINE | ID: mdl-37979612

RESUMO

After the new Coronavirus disease (COVID-19) emerged in the end of January 2020 in Germany, a large number of individuals suffered from severe symptoms and eventually needed intensive care in hospitals. Due to the rapid spread of the disease, the number of deceased individuals increased as well, which is a motivation to prevent as many new infections as possible. Therefore, the knowledge about the current evolution of the virus spread is crucial to predict its future behavior and to react with suitable interventions. In this paper, the evolution of the COVID-19 pandemic in Germany is forecasted by a network-based inference method, in which the interactions of individuals are taken into account using a contact matrix. Then the results are compared to the predictions without considering a contact matrix as well as to the logistic regression, which shows the advantage of incorporating the contact matrix. Furthermore, the basic reproduction number of the pandemic in Germany using a neural network approach is estimated and used for further predictions of the evolution of COVID-19 in Germany. In order to mathematically model the different compartments of the population in the considered regions, the classical SIR model is employed. In this work, we deploy the LASSO (Least Absolute Shrinkage and Selection Operator) for the unknown parameter estimation. Furthermore, we calculate and illustrate the MAPE (Mean Absolute Percentage Error) of the estimations to show the accuracy of the predictions. The results include model parameter estimation and model validation, as well as the outbreak forecasting using network-informed algorithms. Our findings show that the network-inference based approach outperforms the logistic regression as well as the neural network approach and the SIR model calibration without a contact network. Furthermore according to the results, the network-inference based approach is particularly suitable for short- to mid-term predictions, even when there is not much information about the new disease. Moreover, the predictions based on the estimation of the reproduction number in Germany can yield more reliable results with increasing the availability of data, but could not outperform the network-inference based algorithm.


Assuntos
COVID-19 , Humanos , COVID-19/epidemiologia , SARS-CoV-2 , Pandemias/prevenção & controle , Incerteza , Modelos Teóricos
19.
Mol Biotechnol ; 2023 Nov 11.
Artigo em Inglês | MEDLINE | ID: mdl-37950851

RESUMO

Gene networks allow researchers to understand the underlying mechanisms between diseases and genes while reducing the need for wet lab experiments. Numerous gene network inference (GNI) algorithms have been presented in the literature to infer accurate gene networks. We proposed a hybrid GNI algorithm, k-Strong Inference Algorithm (ksia), to infer more reliable and robust gene networks from omics datasets. To increase reliability, ksia integrates Pearson correlation coefficient (PCC) and Spearman rank correlation coefficient (SCC) scores to determine mutual information scores between molecules to increase diversity of relation predictions. To infer a more robust gene network, ksia applies three different elimination steps to remove redundant and spurious relations between genes. The performance of ksia was evaluated on microbe microarrays database in the overlap analysis with other GNI algorithms, namely ARACNE, C3NET, CLR, and MRNET. Ksia inferred less number of relations due to its strict elimination steps. However, ksia generally performed better on Escherichia coli (E.coli) and Saccharomyces cerevisiae (yeast) gene expression datasets due to F- measure and precision values. The integration of association estimator scores and three elimination stages slightly increases the performance of ksia based gene networks. Users can access ksia R package and user manual of package via https://github.com/ozgurcingiz/ksia .

20.
bioRxiv ; 2023 Nov 13.
Artigo em Inglês | MEDLINE | ID: mdl-38014297

RESUMO

Reconstruction of gene regulatory networks (GRNs) from expression data is a significant open problem. Common approaches train a machine learning (ML) model to predict a gene's expression using transcription factors' (TFs') expression as features and designate important features/TFs as regulators of the gene. Here, we present an entirely different paradigm, where GRN edges are directly predicted by the ML model. The new approach, named "SPREd" is a simulation-supervised neural network for GRN inference. Its inputs comprise expression relationships (e.g., correlation, mutual information) between the target gene and each TF and between pairs of TFs. The output includes binary labels indicating whether each TF regulates the target gene. We train the neural network model using synthetic expression data generated by a biophysics-inspired simulation model that incorporates linear as well as non-linear TF-gene relationships and diverse GRN configurations. We show SPREd to outperform state-of-the-art GRN reconstruction tools GENIE3, ENNET, PORTIA and TIGRESS on synthetic datasets with high co-expression among TFs, similar to that seen in real data. A key advantage of the new approach is its robustness to relatively small numbers of conditions (columns) in the expression matrix, which is a common problem faced by existing methods. Finally, we evaluate SPREd on real data sets in yeast that represent gold standard benchmarks of GRN reconstruction and show it to perform significantly better than or comparably to existing methods. In addition to its high accuracy and speed, SPREd marks a first step towards incorporating biophysics principles of gene regulation into ML-based approaches to GRN reconstruction.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA