RESUMO
We present new models and methods for the posterior drift problem where the regression function in the target domain is modelled as a linear adjustment, on an appropriate scale, of that in the source domain, and study the theoretical properties of our proposed estimators in the binary classification problem. The core idea of our model inherits the simplicity and the usefulness of generalized linear models and accelerated failure time models from the classical statistics literature. Our approach is shown to be flexible and applicable in a variety of statistical settings, and can be adopted for transfer learning problems in various domains including epidemiology, genetics and biomedicine. As concrete applications, we illustrate the power of our approach (i) through mortality prediction for British Asians by borrowing strength from similar data from the larger pool of British Caucasians, using the UK Biobank data, and (ii) in overcoming a spurious correlation present in the source domain of the Waterbirds dataset.
RESUMO
Deep eutectic solvents (DESs) with different acidity and alkalinity were applied for biomass pretreatment, and the conditions were optimized by response surface methodology. The results showed that lactic acid/betaine hydrochloride had the optimal pretreatment efficiency, where the removal rates of hemicellulose and lignin came up to 89% and 73%, and the enzymolysis efficiency was as high as 92%. Furthermore, eight types of chloride salts with different valence states were introduced into the DESs as the third component. The chloride salts could improve the pretreatment efficiency and positively correlated with the metal valence state. Specifically, AlCl3 was significantly superior in improving the pretreatment efficiency, where the enzymolysis efficiency reached 96% due to the destruction of crystalline region and the esterification of partial cellulose. Therefore, it is proposed that adding highly valent metal salts to acidic DESs has higher pretreatment and enzymatic efficiency.
Assuntos
Solventes Eutéticos Profundos , Lignina , Lignina/química , Cloretos , Sais , Solventes/química , Hidrólise , BiomassaRESUMO
Constraint-based reconstruction and analysis (COBRA) provides a molecular mechanistic framework for integrative analysis of experimental molecular systems biology data and quantitative prediction of physicochemically and biochemically feasible phenotypic states. The COBRA Toolbox is a comprehensive desktop software suite of interoperable COBRA methods. It has found widespread application in biology, biomedicine, and biotechnology because its functions can be flexibly combined to implement tailored COBRA protocols for any biochemical network. This protocol is an update to the COBRA Toolbox v.1.0 and v.2.0. Version 3.0 includes new methods for quality-controlled reconstruction, modeling, topological analysis, strain and experimental design, and network visualization, as well as network integration of chemoinformatic, metabolomic, transcriptomic, proteomic, and thermochemical data. New multi-lingual code integration also enables an expansion in COBRA application scope via high-precision, high-performance, and nonlinear numerical optimization solvers for multi-scale, multi-cellular, and reaction kinetic modeling, respectively. This protocol provides an overview of all these new features and can be adapted to generate and analyze constraint-based models in a wide variety of scenarios. The COBRA Toolbox v.3.0 provides an unparalleled depth of COBRA methods.
Assuntos
Modelos Biológicos , Software , Genoma , Redes e Vias Metabólicas , Biologia de SistemasRESUMO
Archetypal analysis and non-negative matrix factorization (NMF) are staples in a statisticians toolbox for dimension reduction and exploratory data analysis. We describe a geometric approach to both NMF and archetypal analysis by interpreting both problems as finding extreme points of the data cloud. We also develop and analyze an efficient approach to finding extreme points in high dimensions. For modern massive datasets that are too large to fit on a single machine and must be stored in a distributed setting, our approach makes only a small number of passes over the data. In fact, it is possible to obtain the NMF or perform archetypal analysis with just two passes over the data.
RESUMO
Finding the minimal set of gene functions needed to sustain life is of both fundamental and practical importance. Minimal gene lists have been proposed by using comparative genomics-based core proteome definitions. A definition of a core proteome that is supported by empirical data, is understood at the systems-level, and provides a basis for computing essential cell functions is lacking. Here, we use a systems biology-based genome-scale model of metabolism and expression to define a functional core proteome consisting of 356 gene products, accounting for 44% of the Escherichia coli proteome by mass based on proteomics data. This systems biology core proteome includes 212 genes not found in previous comparative genomics-based core proteome definitions, accounts for 65% of known essential genes in E. coli, and has 78% gene function overlap with minimal genomes (Buchnera aphidicola and Mycoplasma genitalium). Based on transcriptomics data across environmental and genetic backgrounds, the systems biology core proteome is significantly enriched in nondifferentially expressed genes and depleted in differentially expressed genes. Compared with the noncore, core gene expression levels are also similar across genetic backgrounds (two times higher Spearman rank correlation) and exhibit significantly more complex transcriptional and posttranscriptional regulatory features (40% more transcription start sites per gene, 22% longer 5'UTR). Thus, genome-scale systems biology approaches rigorously identify a functional core proteome needed to support growth. This framework, validated by using high-throughput datasets, facilitates a mechanistic understanding of systems-level core proteome function through in silico models; it de facto defines a paleome.
Assuntos
Regulação Bacteriana da Expressão Gênica , Genes Bacterianos , Ensaios de Triagem em Larga Escala , Metaboloma , Proteoma , Biologia de Sistemas , Buchnera/genética , Buchnera/metabolismo , Simulação por Computador , Conjuntos de Dados como Assunto , Escherichia coli/genética , Escherichia coli/metabolismo , Proteínas de Escherichia coli/genética , Modelos Biológicos , Família Multigênica , Mycoplasma genitalium/genética , Mycoplasma genitalium/metabolismo , TranscriptomaRESUMO
BACKGROUND: Biological processes such as metabolism, signaling, and macromolecular synthesis can be modeled as large networks of biochemical reactions. Large and comprehensive networks, like integrated networks that represent metabolism and macromolecular synthesis, are inherently multiscale because reaction rates can vary over many orders of magnitude. They require special methods for accurate analysis because naive use of standard optimization systems can produce inaccurate or erroneously infeasible results. RESULTS: We describe techniques enabling off-the-shelf optimization software to compute accurate solutions to the poorly scaled optimization problems arising from flux balance analysis of multiscale biochemical reaction networks. We implement lifting techniques for flux balance analysis within the openCOBRA toolbox and demonstrate our techniques using the first integrated reconstruction of metabolism and macromolecular synthesis for E. coli. CONCLUSION: Our techniques enable accurate flux balance analysis of multiscale networks using off-the-shelf optimization software. Although we describe lifting techniques in the context of flux balance analysis, our methods can be used to handle a variety of optimization problems arising from analysis of multiscale network reconstructions.
Assuntos
Fenômenos Bioquímicos , Redes e Vias Metabólicas , Software , Escherichia coli/metabolismoRESUMO
The antifouling and self-cleaning properties of plants such as Nelumbo nucifera (lotus) and Colocasia esculenta (taro) have been attributed to the superhydrophobicity resulting from the hierarchical surface structure of the leaf and the air trapped between the nanosized epicuticular wax crystals. The reported study showed that the nanostructures on the taro leaf surfaces were also highly resistant to particle and bacterial adhesion under completely wetted conditions. Adhesion force measurements using atomic force microscopy revealed that the adhesion force on top of the papilla as well as the area around it was markedly lower than that on the edge of an epidermal cell. The decreased adhesion force and the resistance to particle and bacterial adhesion were attributed to the dense nanostructures found on the epidermal papilla and the area surrounding it. These results suggest that engineered surfaces with properly designed nanoscale topographic structures could potentially reduce or prevent particle/bacterial fouling under submerged conditions.