RESUMO
Hybrid potato breeding will transform the crop from a clonally propagated tetraploid to a seed-reproducing diploid. Historical accumulation of deleterious mutations in potato genomes has hindered the development of elite inbred lines and hybrids. Utilizing a whole-genome phylogeny of 92 Solanaceae and its sister clade species, we employ an evolutionary strategy to identify deleterious mutations. The deep phylogeny reveals the genome-wide landscape of highly constrained sites, comprising â¼2.4% of the genome. Based on a diploid potato diversity panel, we infer 367,499 deleterious variants, of which 50% occur at non-coding and 15% at synonymous sites. Counterintuitively, diploid lines with relatively high homozygous deleterious burden can be better starting material for inbred-line development, despite showing less vigorous growth. Inclusion of inferred deleterious mutations increases genomic-prediction accuracy for yield by 24.7%. Our study generates insights into the genome-wide incidence and properties of deleterious mutations and their far-reaching consequences for breeding.
Assuntos
Melhoramento Vegetal , Solanum tuberosum , Diploide , Mutação , Filogenia , Solanum tuberosum/genéticaRESUMO
The projected changes in the hydrological cycle under global warming remain highly uncertain across current climate models. Here, we demonstrate that the observational past warming trend can be utilized to effectively co1nstrain future projections in mean and extreme precipitation on both global and regional scales. The physical basis for such constraints relies on the relatively constant climate sensitivity in individual models and the reasonable consistency of regional hydrological sensitivity among the models, which is dominated and regulated by the increases in atmospheric moisture. For the high-emission scenario, on the global average, the projected changes in mean precipitation are lowered from 6.9 to 5.2% and those in extreme precipitation from 24.5 to 18.1%, with the inter-model variances reduced by 31.0 and 22.7%, respectively. Moreover, the constraint can be applied to regions in middle-to-high latitudes, particularly over land. These constraints result in spatially resolved corrections that deviate substantially and inhomogeneously from the global mean corrections. This study provides regionally constrained hydrological responses over the globe, with direct implications for climate adaptation in specific areas.
RESUMO
While pathogenic variants can significantly increase disease risk, it is still challenging to estimate the clinical impact of rare missense variants more generally. Even in genes such as BRCA2 or PALB2, large cohort studies find no significant association between breast cancer and rare missense variants collectively. Here, we introduce REGatta, a method to estimate clinical risk from variants in smaller segments of individual genes. We first define these regions by using the density of pathogenic diagnostic reports and then calculate the relative risk in each region by using over 200,000 exome sequences in the UK Biobank. We apply this method in 13 genes with established roles across several monogenic disorders. In genes with no significant difference at the gene level, this approach significantly separates disease risk for individuals with rare missense variants at higher or lower risk (BRCA2 regional model OR = 1.46 [1.12, 1.79], p = 0.0036 vs. BRCA2 gene model OR = 0.96 [0.85, 1.07] p = 0.4171). We find high concordance between these regional risk estimates and high-throughput functional assays of variant impact. We compare our method with existing methods and the use of protein domains (Pfam) as regions and find REGatta better identifies individuals at elevated or reduced risk. These regions provide useful priors and are potentially useful for improving risk assessment for genes associated with monogenic diseases.
Assuntos
Neoplasias da Mama , Predisposição Genética para Doença , Humanos , Feminino , Proteína BRCA2/genética , Mutação de Sentido Incorreto , Análise de Sequência de DNA , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Estudos de CoortesRESUMO
With the exponential growth of digital data, there is a pressing need for innovative storage media and techniques. DNA molecules, due to their stability, storage capacity, and density, offer a promising solution for information storage. However, DNA storage also faces numerous challenges, such as complex biochemical constraints and encoding efficiency. This paper presents Explorer, a high-efficiency DNA coding algorithm based on the De Bruijn graph, which leverages its capability to characterize local sequences. Explorer enables coding under various biochemical constraints, such as homopolymers, GC content, and undesired motifs. This paper also introduces Codeformer, a fast decoding algorithm based on the transformer architecture, to further enhance decoding efficiency. Numerical experiments indicate that, compared with other advanced algorithms, Explorer not only achieves stable encoding and decoding under various biochemical constraints but also increases the encoding efficiency and bit rate by ¿10%. Additionally, Codeformer demonstrates the ability to efficiently decode large quantities of DNA sequences. Under different parameter settings, its decoding efficiency exceeds that of traditional algorithms by more than two-fold. When Codeformer is combined with Reed-Solomon code, its decoding accuracy exceeds 99%, making it a good choice for high-speed decoding applications. These advancements are expected to contribute to the development of DNA-based data storage systems and the broader exploration of DNA as a novel information storage medium.
Assuntos
Algoritmos , DNA , DNA/genética , DNA/química , Software , Análise de Sequência de DNA/métodos , Biologia Computacional/métodosRESUMO
Protein annotation has long been a challenging task in computational biology. Gene Ontology (GO) has become one of the most popular frameworks to describe protein functions and their relationships. Prediction of a protein annotation with proper GO terms demands high-quality GO term representation learning, which aims to learn a low-dimensional dense vector representation with accompanying semantic meaning for each functional label, also known as embedding. However, existing GO term embedding methods, which mainly take into account ancestral co-occurrence information, have yet to capture the full topological information in the GO-directed acyclic graph (DAG). In this study, we propose a novel GO term representation learning method, PO2Vec, to utilize the partial order relationships to improve the GO term representations. Extensive evaluations show that PO2Vec achieves better outcomes than existing embedding methods in a variety of downstream biological tasks. Based on PO2Vec, we further developed a new protein function prediction method PO2GO, which demonstrates superior performance measured in multiple metrics and annotation specificity as well as few-shot prediction capability in the benchmarks. These results suggest that the high-quality representation of GO structure is critical for diverse biological tasks including computational protein annotation.
Assuntos
Benchmarking , Biologia Computacional , Ontologia Genética , Aprendizagem , Anotação de Sequência MolecularRESUMO
Cancer metabolism is a marvellously complex topic, in part, due to the reprogramming of its pathways to self-sustain the malignant phenotype in the disease, to the detriment of its healthy counterpart. Understanding these adjustments can provide novel targeted therapies that could disrupt and impair proliferation of cancerous cells. For this very purpose, genome-scale metabolic models (GEMs) have been developed, with Human1 being the most recent reconstruction of the human metabolism. Based on GEMs, we introduced the genetic Minimal Cut Set (gMCS) approach, an uncontextualized methodology that exploits the concepts of synthetic lethality to predict metabolic vulnerabilities in cancer. gMCSs define a set of genes whose knockout would render the cell unviable by disrupting an essential metabolic task in GEMs, thus, making cellular proliferation impossible. Here, we summarize the gMCS approach and review the current state of the methodology by performing a systematic meta-analysis based on two datasets of gene essentiality in cancer. First, we assess several thresholds and distinct methodologies for discerning highly and lowly expressed genes. Then, we address the premise that gMCSs of distinct length should have the same predictive power. Finally, we question the importance of a gene partaking in multiple gMCSs and analyze the importance of all the essential metabolic tasks defined in Human1. Our meta-analysis resulted in parameter evaluation to increase the predictive power for the gMCS approach, as well as a significant reduction of computation times by only selecting the crucial gMCS lengths, proposing the pertinency of particular parameters for the peak processing of gMCS.
Assuntos
Neoplasias , Humanos , Neoplasias/genética , Proliferação de Células , Expressão Gênica , Nível de Saúde , FenótipoRESUMO
The human microbiome plays an important role in human health and disease. Meta-omics analyses provide indispensable data for linking changes in microbiome composition and function to disease etiology. Yet, the lack of a mechanistic understanding of, e.g., microbiome-metabolome links hampers the translation of these findings into effective, novel therapeutics. Here, we propose metabolic modeling of microbial communities through constraint-based reconstruction and analysis (COBRA) as a complementary approach to meta-omics analyses. First, we highlight the importance of microbial metabolism in cardiometabolic diseases, inflammatory bowel disease, colorectal cancer, Alzheimer disease, and Parkinson disease. Next, we demonstrate that microbial community modeling can stratify patients and controls, mechanistically link microbes with fecal metabolites altered in disease, and identify host pathways affected by the microbiome. Finally, we outline our vision for COBRA modeling combined with meta-omics analyses and multivariate statistical analyses to inform and guide clinical trials, yield testable hypotheses, and ultimately propose novel dietary and therapeutic interventions.
Assuntos
Microbioma Gastrointestinal , Microbiota , Humanos , Medicina de PrecisãoRESUMO
Current knowledge of cancer genomics remains biased against noncoding mutations. To systematically search for regulatory noncoding mutations, we assessed mutations in conserved positions in the genome under the assumption that these are more likely to be functional than mutations in positions with low conservation. To this end, we use whole-genome sequencing data from the International Cancer Genome Consortium and combined it with evolutionary constraint inferred from 240 mammals, to identify genes enriched in noncoding constraint mutations (NCCMs), mutations likely to be regulatory in nature. We compare medulloblastoma (MB), which is malignant, to pilocytic astrocytoma (PA), a primarily benign tumor, and find highly different NCCM frequencies between the two, in agreement with the fact that malignant cancers tend to have more mutations. In PA, a high NCCM frequency only affects the BRAF locus, which is the most commonly mutated gene in PA. In contrast, in MB, >500 genes have high levels of NCCMs. Intriguingly, several loci with NCCMs in MB are associated with different ages of onset, such as the HOXB cluster in young MB patients. In adult patients, NCCMs occurred in, e.g., the WASF-2/AHDC1/FGR locus. One of these NCCMs led to increased expression of the SRC kinase FGR and augmented responsiveness of MB cells to dasatinib, a SRC kinase inhibitor. Our analysis thus points to different molecular pathways in different patient groups. These newly identified putative candidate driver mutations may aid in patient stratification in MB and could be valuable for future selection of personalized treatment options.
Assuntos
Neoplasias Cerebelares , Meduloblastoma , Adulto , Animais , Humanos , Meduloblastoma/patologia , Mutação , Genoma , Neoplasias Cerebelares/genética , Quinases da Família src/genética , Mamíferos/genética , Proteínas de Ligação a DNA/genéticaRESUMO
Variability in expression levels in response to random genomic mutations varies among genes, influencing both the facilitation and constraint of phenotypic evolution in organisms. Despite its importance, both the underlying mechanisms and evolutionary origins of this variability remain largely unknown due to the mixed contributions of cis- and trans-acting elements. To address this issue, we focused on the mutational variability of cis-acting elements, that is, promoter regions, in Escherichia coli. Random mutations were introduced into the natural and synthetic promoters to generate mutant promoter libraries. By comparing the variance in promoter activity of these mutant libraries, we found no significant difference in mutational variability in promoter activity between promoter groups, suggesting the absence of a signature of natural selection for mutational robustness. In contrast, the promoters controlling essential genes exhibited a remarkable bias in mutational variability, with mutants displaying higher activities than the wild types being relatively rare compared to those with lower activities. Our evolutionary simulation on a rugged fitness landscape provided a rationale for this vulnerability. These findings suggest that past selection created nonuniform mutational variability in promoters biased toward lower activities of random mutants, which now constrains the future evolution of downstream essential genes toward higher expression levels.
Assuntos
Escherichia coli , Evolução Molecular , Genes Essenciais , Mutação , Regiões Promotoras Genéticas , Escherichia coli/genética , Seleção Genética , Regulação Bacteriana da Expressão Gênica , Aptidão GenéticaRESUMO
Large-scale comparative genomics studies offer valuable resources for understanding both functional and evolutionary rate constraints. It is suggested that constraint aligns with the topology of genomic networks, increasing toward the center, with intermediate nodes combining relaxed constraint with higher contributions to the phenotype due to pleiotropy. However, this pattern has yet to be demonstrated in vertebrates. This study shows that constraint intensifies toward the network's center in placental mammals. Genes with rate changes associated with emergence of hibernation cluster mostly toward intermediate positions, with higher constraint in faster-evolving genes, which is indicative of a "sweet spot" for adaptation. If this trend holds universally, network node metrics could predict high-constraint regions even in clades lacking empirical constraint data.
Assuntos
Evolução Biológica , Placenta , Gravidez , Feminino , Animais , Genoma , Genômica , Fenótipo , MamíferosRESUMO
Evolutionary analyses have estimated that â¼60% of nucleotides in intergenic regions of the Drosophila melanogaster genome are functionally relevant, suggesting that regulatory information may be encoded more densely in intergenic regions than has been revealed by most functional dissections of regulatory DNA. Here, we approached this issue through a functional dissection of the regulatory region of the gene shavenbaby (svb). Most of the â¼90â kb of this large regulatory region is highly conserved in the genus Drosophila, though characterized enhancers occupy a small fraction of this region. By analyzing the regulation of svb in different contexts of Drosophila development, we found that the regulatory information that drives svb expression in the abdominal pupal epidermis is organized in a different way than the elements that drive svb expression in the embryonic epidermis. While in the embryonic epidermis svb is activated by compact enhancers separated by large inactive DNA regions, svb expression in the pupal epidermis is driven by regulatory information distributed over broader regions of svb cis-regulatory DNA. In the same vein, we observed that other developmental genes also display a dense distribution of putative regulatory elements in their regulatory regions. Furthermore, we found that a large percentage of conserved noncoding DNA of the Drosophila genome is contained within regions of open chromatin. These results suggest that part of the evolutionary constraint on noncoding DNA of Drosophila is explained by the density of regulatory information, which may be greater than previously appreciated.
Assuntos
Proteínas de Drosophila , Drosophila , Animais , Drosophila/metabolismo , Fatores de Transcrição/metabolismo , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Proteínas de Ligação a DNA/genética , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Regulação da Expressão Gênica no Desenvolvimento , DNA , DNA Intergênico/genética , DNA Intergênico/metabolismo , Elementos Facilitadores GenéticosRESUMO
Do long noncoding RNAs (lncRNAs) contribute little or substantively to human biology? To address how lncRNA loci and their transcripts, structures, interactions, and functions contribute to human traits and disease, we adopt a genome-wide perspective. We intend to provoke alternative interpretation of questionable evidence and thorough inquiry into unsubstantiated claims. We discuss pitfalls of lncRNA experimental and computational methods as well as opposing interpretations of their results. The majority of evidence, we argue, indicates that most lncRNA transcript models reflect transcriptional noise or provide minor regulatory roles, leaving relatively few human lncRNAs that contribute centrally to human development, physiology, or behavior. These important few tend to be spliced and better conserved but lack a simple syntax relating sequence to structure and mechanism, and so resist simple categorization. This genome-wide view should help investigators prioritize individual lncRNAs based on their likely contribution to human biology.
Assuntos
RNA Longo não Codificante , Genoma , Humanos , RNA Longo não Codificante/genéticaRESUMO
The identification of genes that evolve under recessive natural selection is a long-standing goal of population genetics research that has important applications to the discovery of genes associated with disease. We found that commonly used methods to evaluate selective constraint at the gene level are highly sensitive to genes under heterozygous selection but ubiquitously fail to detect recessively evolving genes. Additionally, more sophisticated likelihood-based methods designed to detect recessivity similarly lack power for a human gene of realistic length from current population sample sizes. However, extensive simulations suggested that recessive genes may be detectable in aggregate. Here, we offer a method informed by population genetics simulations designed to detect recessive purifying selection in gene sets. Applying this to empirical gene sets produced significant enrichments for strong recessive selection in genes previously inferred to be under recessive selection in a consanguineous cohort and in genes involved in autosomal recessive monogenic disorders.
Assuntos
Frequência do Gene , Genes Recessivos , Genética Populacional , Seleção Genética , Algoritmos , Alelos , Genes Dominantes , Predisposição Genética para Doença , Variação Genética , Genética Populacional/métodos , Genômica/métodos , Genótipo , Humanos , Padrões de Herança , Funções Verossimilhança , Modelos Genéticos , Mutação , Reino UnidoRESUMO
In multicellular systems, cells communicate with adjacent cells to determine their positions and fates, an arrangement important for cellular development. Orientation of cell division, cell-cell interactions (i.e. attraction and repulsion) and geometric constraints are three major factors that define cell arrangement. In particular, geometric constraints are difficult to reveal in experiments, and the contribution of the local contour of the boundary has remained elusive. In this study, we developed a multicellular morphology model based on the phase-field method so that precise geometric constraints can be incorporated. Our application of the model to nematode embryos predicted that the amount of extra-embryonic space, the empty space within the eggshell that is not occupied by embryonic cells, affects cell arrangement in a manner dependent on the local contour and other factors. The prediction was validated experimentally by increasing the extra-embryonic space in the Caenorhabditis elegans embryo. Overall, our analyses characterized the roles of geometrical contributors, specifically the amount of extra-embryonic space and the local contour, on cell arrangements. These factors should be considered for multicellular systems.
Assuntos
Proteínas de Caenorhabditis elegans , Nematoides , Animais , Caenorhabditis elegans , Proteínas de Caenorhabditis elegans/genética , Divisão Celular , Embrião não Mamífero , Modelos BiológicosRESUMO
Single-cell RNA sequencing (scRNA-seq) measures transcriptome-wide gene expression at single-cell resolution. Clustering analysis of scRNA-seq data enables researchers to characterize cell types and states, shedding new light on cell-to-cell heterogeneity in complex tissues. Recently, self-supervised contrastive learning has become a prominent technique for underlying feature representation learning. However, for the noisy, high-dimensional and sparse scRNA-seq data, existing methods still encounter difficulties in capturing the intrinsic patterns and structures of cells, and seldom utilize prior knowledge, resulting in clusters that mismatch with the real situation. To this end, we propose scDECL, a novel deep enhanced constraint clustering algorithm for scRNA-seq data analysis based on contrastive learning and pairwise constraints. Specifically, based on interpolated contrastive learning, a pre-training model is trained to learn the feature embedding, and then perform clustering according to the constructed enhanced pairwise constraint. In the pre-training stage, a mixup data augmentation strategy and interpolation loss is introduced to improve the diversity of the dataset and the robustness of the model. In the clustering stage, the prior information is converted into enhanced pairwise constraints to guide the clustering. To validate the performance of scDECL, we compare it with six state-of-the-art algorithms on six real scRNA-seq datasets. The experimental results demonstrate the proposed algorithm outperforms the six competing methods. In addition, the ablation studies on each module of the algorithm indicate that these modules are complementary to each other and effective in improving the performance of the proposed algorithm. Our method scDECL is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/DBLABDHU/scDECL.
Assuntos
Perfilação da Expressão Gênica , Análise da Expressão Gênica de Célula Única , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Algoritmos , Análise por ConglomeradosRESUMO
SignificanceMetabolism relies on a small class of molecules (coenzymes) that serve as universal donors and acceptors of key chemical groups and electrons. Although metabolic networks crucially depend on structurally redundant coenzymes [e.g., NAD(H) and NADP(H)] associated with different enzymes, the criteria that led to the emergence of this redundancy remain poorly understood. Our combination of modeling and structural and sequence analysis indicates that coenzyme redundancy may not be essential for metabolism but could rather constitute an evolved strategy promoting efficient usage of enzymes when biochemical reactions are near equilibrium. Our work suggests that early metabolism may have operated with fewer coenzymes and that adaptation for metabolic efficiency may have driven the rise of coenzyme diversity in living systems.
Assuntos
Coenzimas , NAD , Coenzimas/metabolismo , NAD/metabolismo , NADP/metabolismoRESUMO
Proteins, as essential biomolecules, account for a large fraction of cell mass, and thus the synthesis of the complete set of proteins (i.e., the proteome) represents a substantial part of the cellular resource budget. Therefore, cells might be under selective pressures to optimize the resource costs for protein synthesis, particularly the biosynthesis of the 20 proteinogenic amino acids. Previous studies showed that less energetically costly amino acids are more abundant in the proteomes of bacteria that survive under energy-limited conditions, but the energy cost of synthesizing amino acids was reported to be weakly associated with the amino acid usage in Saccharomyces cerevisiae Here we present a modeling framework to estimate the protein cost of synthesizing each amino acid (i.e., the protein mass required for supporting one unit of amino acid biosynthetic flux) and the glucose cost (i.e., the glucose consumed per amino acid synthesized). We show that the logarithms of the relative abundances of amino acids in S. cerevisiae's proteome correlate well with the protein costs of synthesizing amino acids (Pearson's r = -0.89), which is better than that with the glucose costs (Pearson's r = -0.5). Therefore, we demonstrate that S. cerevisiae tends to minimize protein resource, rather than glucose or energy, for synthesizing amino acids.
Assuntos
Aminoácidos/biossíntese , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Aminoácidos/química , Aminoácidos/metabolismo , Evolução Biológica , Metabolismo Energético/fisiologia , Evolução Molecular , Engenharia Metabólica/métodos , Biossíntese de Proteínas/genética , Biossíntese de Proteínas/fisiologia , Proteoma/metabolismo , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genéticaRESUMO
A large stream of literature found that individuals who experience financial strain are particularly concerned about their present needs-that is, they are more likely to choose smaller immediate payoffs over larger future payoffs. In contrast, some recent findings suggest that financially constrained individuals may be more concerned about future needs instead (e.g., they are relatively more likely to invest in long-lived durables than in short-lived experiences). We propose that the use of traditional intertemporal choice tasks has made prior studies overly sensitive to the myopia-inducing effects of financial constraint. These tasks typically offer a choice between receiving a smaller payoff in the present versus a larger payoff in the future. Across three studies, we observe that, as long as some immediate payout is guaranteed, financially constrained individuals are as likely as nonconstrained individuals to accept a delay for a larger payoff. These findings qualify prior demonstrations of the myopic effects of financial constraint and suggest that the traditionally used choice paradigm might not accurately capture time preferences, particularly for financially constrained individuals. Furthermore, they provide possible interventions for those interested in reducing the myopia of financially constrained individuals who are facing all now versus all later decisions.
RESUMO
BACKGROUND: Efficient DNA-based storage systems offer substantial capacity and longevity at reduced costs, addressing anticipated data growth. However, encoding data into DNA sequences is limited by two key constraints: 1) a maximum of h consecutive identical bases (homopolymer constraint h), and 2) a GC ratio between [ 0.5 - c GC , 0.5 + c GC ] (GC content constraint c GC ). Sequencing or synthesis errors tend to increase when these constraints are violated. RESULTS: In this research, we address a pure source coding problem in the context of DNA storage, considering both homopolymer and GC content constraints. We introduce a novel coding technique that adheres to these constraints while maintaining linear complexity for increased block lengths and achieving near-optimal rates. We demonstrate the effectiveness of the proposed method through experiments on both randomly generated data and existing files. For example, when h = 4 and c GC = 0.05 , the rate reached 1.988, close to the theoretical limit of 1.990. The associated code can be accessed at GitHub. CONCLUSION: We propose a variable-to-variable-length encoding method that does not rely on concatenating short predefined sequences, which achieves near-optimal rates.
Assuntos
Composição de Bases , DNA , DNA/química , Análise de Sequência de DNA/métodos , Algoritmos , Armazenamento e Recuperação da Informação/métodosRESUMO
BACKGROUND: The growing abundance of in vitro omics data, coupled with the necessity to reduce animal testing in the safety assessment of chemical compounds and even eliminate it in the evaluation of cosmetics, highlights the need for adequate computational methodologies. Data from omics technologies allow the exploration of a wide range of biological processes, therefore providing a better understanding of mechanisms of action (MoA) related to chemical exposure in biological systems. However, the analysis of these large datasets remains difficult due to the complexity of modulations spanning multiple biological processes. RESULTS: To address this, we propose a strategy to reduce information overload by computing, based on transcriptomics data, a comprehensive metabolic sub-network reflecting the metabolic impact of a chemical. The proposed strategy integrates transcriptomic data to a genome scale metabolic network through enumeration of condition-specific metabolic models hence translating transcriptomics data into reaction activity probabilities. Based on these results, a graph algorithm is applied to retrieve user readable sub-networks reflecting the possible metabolic MoA (mMoA) of chemicals. This strategy has been implemented as a three-step workflow. The first step consists in building cell condition-specific models reflecting the metabolic impact of each exposure condition while taking into account the diversity of possible optimal solutions with a partial enumeration algorithm. In a second step, we address the challenge of analyzing thousands of enumerated condition-specific networks by computing differentially activated reactions (DARs) between the two sets of enumerated possible condition-specific models. Finally, in the third step, DARs are grouped into clusters of functionally interconnected metabolic reactions, representing possible mMoA, using the distance-based clustering and subnetwork extraction method. The first part of the workflow was exemplified on eight molecules selected for their known human hepatotoxic outcomes associated with specific MoAs well described in the literature and for which we retrieved primary human hepatocytes transcriptomic data in Open TG-GATEs. Then, we further applied this strategy to more precisely model and visualize associated mMoA for two of these eight molecules (amiodarone and valproic acid). The approach proved to go beyond gene-based analysis by identifying mMoA when few genes are significantly differentially expressed (2 differentially expressed genes (DEGs) for amiodarone), bringing additional information from the network topology, or when very large number of genes were differentially expressed (5709 DEGs for valproic acid). In both cases, the results of our strategy well fitted evidence from the literature regarding known MoA. Beyond these confirmations, the workflow highlighted potential other unexplored mMoA. CONCLUSION: The proposed strategy allows toxicology experts to decipher which part of cellular metabolism is expected to be affected by the exposition to a given chemical. The approach originality resides in the combination of different metabolic modelling approaches (constraint based and graph modelling). The application to two model molecules shows the strong potential of the approach for interpretation and visual mining of complex omics in vitro data. The presented strategy is freely available as a python module ( https://pypi.org/project/manamodeller/ ) and jupyter notebooks ( https://github.com/LouisonF/MANA ).