Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 64
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
J Chem Phys ; 160(6)2024 Feb 14.
Artículo en Inglés | MEDLINE | ID: mdl-38353308

RESUMEN

Stochastic differential equations (SDEs) are a powerful tool to model fluctuations and uncertainty in complex systems. Although numerical methods have been designed to simulate SDEs effectively, it is still problematic when numerical solutions may be negative, but application problems require positive simulations. To address this issue, we propose balanced implicit Patankar-Euler methods to ensure positive simulations of SDEs. Instead of considering the addition of balanced terms to explicit methods in existing balanced methods, we attempt the deletion of possible negative terms from the explicit methods to maintain positivity of numerical simulations. The designed balanced terms include negative-valued drift terms and potential negative diffusion terms. The proposed method successfully addresses the issue of divisions with very small denominators in our recently designed stochastic Patankar method. Stability analysis shows that the balanced implicit Patankar-Euler method has much better stability properties than our recently designed composite Patankar-Euler method. Four SDE systems are used to examine the effectiveness, accuracy, and convergence properties of balanced implicit Patankar-Euler methods. Numerical results suggest that the proposed balanced implicit Patankar-Euler method is an effective and efficient approach to ensure positive simulations when any appropriate stepsize is used in simulating SDEs of biological regulatory systems.

2.
J Chem Phys ; 159(2)2023 Jul 14.
Artículo en Inglés | MEDLINE | ID: mdl-37428041

RESUMEN

Stochastic differential equations (SDE) are a powerful tool to model biological regulatory processes with intrinsic and extrinsic noise. However, numerical simulations of SDE models may be problematic if the values of noise terms are negative and large, which is not realistic for biological systems since the molecular copy numbers or protein concentrations should be non-negative. To address this issue, we propose the composite Patankar-Euler methods to obtain positive simulations of SDE models. A SDE model is separated into three parts, namely, the positive-valued drift terms, negative-valued drift terms, and diffusion terms. We first propose the deterministic Patankar-Euler method to avoid negative solutions generated from the negative-valued drift terms. The stochastic Patankar-Euler method is designed to avoid negative solutions generated from both the negative-valued drift terms and diffusion terms. These Patankar-Euler methods have the strong convergence order of a half. The composite Patankar-Euler methods are the combinations of the explicit Euler method, deterministic Patankar-Euler method, and stochastic Patankar-Euler method. Three SDE system models are used to examine the effectiveness, accuracy, and convergence properties of the composite Patankar-Euler methods. Numerical results suggest that the composite Patankar-Euler methods are effective methods to ensure positive simulations when any appropriate stepsize is used.


Asunto(s)
Modelos Biológicos , Procesos Estocásticos , Difusión
3.
Entropy (Basel) ; 24(5)2022 May 13.
Artículo en Inglés | MEDLINE | ID: mdl-35626576

RESUMEN

One of the key challenges in systems biology and molecular sciences is how to infer regulatory relationships between genes and proteins using high-throughout omics datasets. Although a wide range of methods have been designed to reverse engineer the regulatory networks, recent studies show that the inferred network may depend on the variable order in the dataset. In this work, we develop a new algorithm, called the statistical path-consistency algorithm (SPCA), to solve the problem of the dependence of variable order. This method generates a number of different variable orders using random samples, and then infers a network by using the path-consistent algorithm based on each variable order. We propose measures to determine the edge weights using the corresponding edge weights in the inferred networks, and choose the edges with the largest weights as the putative regulations between genes or proteins. The developed method is rigorously assessed by the six benchmark networks in DREAM challenges, the mitogen-activated protein (MAP) kinase pathway, and a cancer-specific gene regulatory network. The inferred networks are compared with those obtained by using two up-to-date inference methods. The accuracy of the inferred networks shows that the developed method is effective for discovering molecular regulatory systems.

4.
Entropy (Basel) ; 24(6)2022 Jun 09.
Artículo en Inglés | MEDLINE | ID: mdl-35741529

RESUMEN

The correlation-based network is a powerful tool to reveal the influential mechanisms and relations in stock markets. However, current methods for developing network models are dominantly based on the pairwise relationship of positive correlations. This work proposes a new approach for developing stock relationship networks by using the linear relationship model with LASSO to explore negative correlations under a systemic framework. The developed model not only preserves positive links with statistical significance but also includes link directions and negative correlations. We also introduce blends cliques with the balance theory to investigate the consistency properties of the developed networks. The ASX 200 stock data with 194 stocks are applied to evaluate the effectiveness of our proposed method. Results suggest that the developed networks not only are highly consistent with the correlation coefficient in terms of positive or negative correlations but also provide influence directions in stock markets.

5.
Entropy (Basel) ; 23(12)2021 Nov 25.
Artículo en Inglés | MEDLINE | ID: mdl-34945881

RESUMEN

The rapid development of the digital economy is a powerful driving force to promote high-quality economic growth all over the world. Although a number of studies have been conducted to investigate the development of the digital economy in China, these studies pay little attention to the spatial linkages between the 30 provinces in China and the developmental differences between northern and southern China. Using Chinese digital economic data from 2004 to 2019, we propose an index system to measure the developmental levels of the digital economy and obtain the annual developmental levels of these provinces by using the factor analysis method. We analyze the regional differences of developmental levels by using the Theil index and kernel density estimation method. More importantly, the network method is used to analyze the correlations between the developmental levels of the digital economy in all provinces of China. By decomposing regional differences, our study shows that polarized and uncoordinated development is prominent. The development level of the digital economy in the southern region is higher than that in the northern region. In terms of regional correlations, the network study suggests that there are beneficial and spillover effects of the digital economy development between provinces. Based on the analysis results, we propose policies for improving the development of the digital economy in China.

6.
Entropy (Basel) ; 22(7)2020 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-33286545

RESUMEN

Complex network is a powerful tool to discover important information from various types of big data. Although substantial studies have been conducted for the development of stock relation networks, correlation coefficient is dominantly used to measure the relationship between stock pairs. Information theory is much less discussed for this important topic, though mutual information is able to measure nonlinear pairwise relationship. In this work we propose to use part mutual information for developing stock networks. The path-consistency algorithm is used to filter out redundant relationships. Using the Australian stock market data, we develop four stock relation networks using different orders of part mutual information. Compared with the widely used planar maximally filtered graph (PMFG), we can generate networks with cliques of large size. In addition, the large cliques show consistency with the structure of industrial sectors. We also analyze the connectivity and degree distributions of the generated networks. Analysis results suggest that the proposed method is an effective approach to develop stock relation networks using information theory.

7.
J Theor Biol ; 479: 81-89, 2019 10 21.
Artículo en Inglés | MEDLINE | ID: mdl-31299333

RESUMEN

In this paper, we propose a stochastic multistage model that incorporates clonal expansion of premalignant cells and mutational events. Using the age-specific lung cancer as the test system, the proposed model is used to fit the incidence data in the Surveillance, Epidemiology, and End Results (SEER) registry. We first use the model with different numbers of mutations to fit the data of all lung cancer patients. Our results demonstrate that, although from two to six driver mutations in the genome of lung stem cells are reasonable for normal lung stem cells to become a malignant cell, three driver mutations are most likely to occur in the development of lung cancer. In addition, the models are employed to fit the data of female and male patients separately. The interesting result is that, for female patient data the best fit model contains four mutations while that for male patient data is the three-stage model. Finally, robustness analysis suggests that the decrease of cell net proliferation rates is more effective than the decrease of mutation rates in reducing the lung cancer risk.


Asunto(s)
Carcinogénesis/patología , Progresión de la Enfermedad , Neoplasias Pulmonares/patología , Modelos Biológicos , Procesos Estocásticos , Carcinogénesis/genética , Proliferación Celular , Femenino , Humanos , Masculino , Mutación , Células Madre Neoplásicas/patología , Sistema de Registros , Programa de VERF , Factores Sexuales
8.
BMC Genomics ; 18(Suppl 2): 196, 2017 03 14.
Artículo en Inglés | MEDLINE | ID: mdl-28361703

RESUMEN

BACKGROUND: Inbreeding mating has been widely accepted as the key mechanism to enhance homozygosity which normally will decrease the fitness of the population. Although this result has been validated by a large amount of biological data from the natural populations, a mathematical proof of these experimental discoveries is still not complete. A related question is whether we can extend the well-established result regarding the mean fitness from a randomly mating population to inbreeding populations. A confirmative answer may provide insights into the frequent occurrence of self-fertilization populations. RESULTS: This work presents a theoretic proof of the result that, for a large inbreeding population with directional relative genotype fitness, the mean fitness of population increases monotonically. However, it cannot be extended to the case with over-dominant genotype fitness. In addition, by employing multiplicative intersection hypothesis, we prove that inbreeding mating does decrease the mean fitness of polygenic population in general, but does not decrease the mean fitness with mixed dominant-recessive genotypes. We also prove a novel result that inbreeding depression depends on not only the mating pattern but also genetic structure of population. CONCLUSIONS: For natural inbreeding populations without serious inbreeding depression, our theoretical analysis suggests the majority of its genotypes should be additive or dominant-recessive genotypes. This result gives a reason to explain why many hermaphroditism populations do not show severe inbreeding depression. In addition, the calculated purging rate shows that inbreeding mating purges the deleterious mutants more efficiently than randomly mating does.


Asunto(s)
Consanguinidad , Trastornos del Desarrollo Sexual , Genética de Población , Modelos Genéticos , Autofecundación , Animales , Aptitud Genética , Genotipo , Homocigoto , Humanos , Plantas/genética , Selección Genética
9.
J Theor Biol ; 428: 147-152, 2017 09 07.
Artículo en Inglés | MEDLINE | ID: mdl-28645856

RESUMEN

Environment factors such as radiation play an important role in the incidence of lung cancer. In spite of substantial efforts in experimental study and mathematical modeling, it is still a significant challenge to estimate lung cancer risk from radiation. To address this issue, we propose a stochastic model to investigate the impact of radiation on the development of lung cancer. The proposed three-stage model with clonal expansion is used to match the data of the male and female patients in the Osaka Cancer Registry (OCR) and Life Span Study (LSS) cohort of atomic bomb survivors in Hiroshima and Nagasaki. Our results indicate that the major effect of radiation on the development of lung cancer is to induce gene mutations for both male and female patients. In particular, for male patients, radiation affects the mutation in normal cells and the transformation from premalignant cells to malignant ones. However, radiation for female patients increases the mutation rates of the first two mutations in the stochastic model. The established relationship between parameters and radiation will provide insightful prediction for the lung cancer incidence in the radiation exposure.


Asunto(s)
Carcinogénesis/patología , Neoplasias Pulmonares/etiología , Radiación , Distribución de Chi-Cuadrado , Femenino , Humanos , Masculino , Modelos Biológicos , Probabilidad , Exposición a la Radiación/efectos adversos , Exposición a la Radiación/análisis , Procesos Estocásticos
10.
Methods ; 110: 3-13, 2016 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-27514497

RESUMEN

Investigating the dynamics of genetic regulatory networks through high throughput experimental data, such as microarray gene expression profiles, is a very important but challenging task. One of the major hindrances in building detailed mathematical models for genetic regulation is the large number of unknown model parameters. To tackle this challenge, a new integrated method is proposed by combining a top-down approach and a bottom-up approach. First, the top-down approach uses probabilistic graphical models to predict the network structure of DNA repair pathway that is regulated by the p53 protein. Two networks are predicted, namely a network of eight genes with eight inferred interactions and an extended network of 21 genes with 17 interactions. Then, the bottom-up approach using differential equation models is developed to study the detailed genetic regulations based on either a fully connected regulatory network or a gene network obtained by the top-down approach. Model simulation error, parameter identifiability and robustness property are used as criteria to select the optimal network. Simulation results together with permutation tests of input gene network structures indicate that the prediction accuracy and robustness property of the two predicted networks using the top-down approach are better than those of the corresponding fully connected networks. In particular, the proposed approach reduces computational cost significantly for inferring model parameters. Overall, the new integrated method is a promising approach for investigating the dynamics of genetic regulation.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes/genética , Proteína p53 Supresora de Tumor/genética , Algoritmos , Reparación del ADN/genética , Humanos , Modelos Estadísticos , Transducción de Señal/genética
11.
J Chem Phys ; 144(17): 174112, 2016 May 07.
Artículo en Inglés | MEDLINE | ID: mdl-27155630

RESUMEN

The frequently used reduction technique is based on the chemical master equation for stochastic chemical kinetics with two-time scales, which yields the modified stochastic simulation algorithm (SSA). For the chemical reaction processes involving a large number of molecular species and reactions, the collection of slow reactions may still include a large number of molecular species and reactions. Consequently, the SSA is still computationally expensive. Because the chemical Langevin equations (CLEs) can effectively work for a large number of molecular species and reactions, this paper develops a reduction method based on the CLE by the stochastic averaging principle developed in the work of Khasminskii and Yin [SIAM J. Appl. Math. 56, 1766-1793 (1996); ibid. 56, 1794-1819 (1996)] to average out the fast-reacting variables. This reduction method leads to a limit averaging system, which is an approximation of the slow reactions. Because in the stochastic chemical kinetics, the CLE is seen as the approximation of the SSA, the limit averaging system can be treated as the approximation of the slow reactions. As an application, we examine the reduction of computation complexity for the gene regulatory networks with two-time scales driven by intrinsic noise. For linear and nonlinear protein production functions, the simulations show that the sample average (expectation) of the limit averaging system is close to that of the slow-reaction process based on the SSA. It demonstrates that the limit averaging system is an efficient approximation of the slow-reaction process in the sense of the weak convergence.

12.
Adv Exp Med Biol ; 939: 289-307, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27807752

RESUMEN

The rapid advancement of high-throughput technologies provides huge amounts of information for gene expression and protein activity in the genome-wide scale. The availability of genomics, transcriptomics, proteomics, and metabolomics dataset gives an unprecedented opportunity to study detailed molecular regulations that is very important to precision medicine. However, it is still a significant challenge to design effective and efficient method to infer the network structure and dynamic property of regulatory networks. In recent years a number of computing methods have been designed to explore the regulatory mechanisms as well as estimate unknown model parameters. Among them, the Bayesian inference method can combine both prior knowledge and experimental data to generate updated information regarding the regulatory mechanisms. This chapter gives a brief review for Bayesian statistical methods that are used to infer the network structure and estimate model parameters based on experimental data.


Asunto(s)
Algoritmos , Biología Computacional/estadística & datos numéricos , Redes Reguladoras de Genes , Proteínas Quinasas Activadas por Mitógenos/genética , Modelos Genéticos , Teorema de Bayes , Genoma Humano , Humanos , Método de Montecarlo , Medicina de Precisión
13.
BMC Bioinformatics ; 15: 256, 2014 Jul 29.
Artículo en Inglés | MEDLINE | ID: mdl-25070047

RESUMEN

BACKGROUND: The advances of systems biology have raised a large number of sophisticated mathematical models for describing the dynamic property of complex biological systems. One of the major steps in developing mathematical models is to estimate unknown parameters of the model based on experimentally measured quantities. However, experimental conditions limit the amount of data that is available for mathematical modelling. The number of unknown parameters in mathematical models may be larger than the number of observation data. The imbalance between the number of experimental data and number of unknown parameters makes reverse-engineering problems particularly challenging. RESULTS: To address the issue of inadequate experimental data, we propose a continuous optimization approach for making reliable inference of model parameters. This approach first uses a spline interpolation to generate continuous functions of system dynamics as well as the first and second order derivatives of continuous functions. The expanded dataset is the basis to infer unknown model parameters using various continuous optimization criteria, including the error of simulation only, error of both simulation and the first derivative, or error of simulation as well as the first and second derivatives. We use three case studies to demonstrate the accuracy and reliability of the proposed new approach. Compared with the corresponding discrete criteria using experimental data at the measurement time points only, numerical results of the ERK kinase activation module show that the continuous absolute-error criteria using both function and high order derivatives generate estimates with better accuracy. This result is also supported by the second and third case studies for the G1/S transition network and the MAP kinase pathway, respectively. This suggests that the continuous absolute-error criteria lead to more accurate estimates than the corresponding discrete criteria. We also study the robustness property of these three models to examine the reliability of estimates. Simulation results show that the models with estimated parameters using continuous fitness functions have better robustness properties than those using the corresponding discrete fitness functions. CONCLUSIONS: The inference studies and robustness analysis suggest that the proposed continuous optimization criteria are effective and robust for estimating unknown parameters in mathematical models.


Asunto(s)
Redes y Vías Metabólicas , Modelos Biológicos , Biología de Sistemas/métodos , Activación Enzimática , Quinasas MAP Reguladas por Señal Extracelular/metabolismo , Fase G1 , Sistema de Señalización de MAP Quinasas , Reproducibilidad de los Resultados , Fase S
14.
BMC Bioinformatics ; 15 Suppl 12: S3, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25473744

RESUMEN

BACKGROUND: Mathematical modeling is an important tool in systems biology to study the dynamic property of complex biological systems. However, one of the major challenges in systems biology is how to infer unknown parameters in mathematical models based on the experimental data sets, in particular, when the data are sparse and the regulatory network is stochastic. RESULTS: To address this issue, this work proposed a new algorithm to estimate parameters in stochastic models using simulated likelihood density in the framework of approximate Bayesian computation. Two stochastic models were used to demonstrate the efficiency and effectiveness of the proposed method. In addition, we designed another algorithm based on a novel objective function to measure the accuracy of stochastic simulations. CONCLUSIONS: Simulation results suggest that the usage of simulated likelihood density improves the accuracy of estimates substantially. When the error is measured at each observation time point individually, the estimated parameters have better accuracy than those obtained by a published method in which the error is measured using simulations over the entire observation time period.


Asunto(s)
Algoritmos , Modelos Estadísticos , Teorema de Bayes , Modelos Químicos , Procesos Estocásticos , Biología de Sistemas/métodos
15.
Nat Cell Biol ; 9(8): 905-14, 2007 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-17618274

RESUMEN

Ras proteins occupy dynamic plasma membrane nanodomains called nanoclusters. The significance of this spatial organization is unknown. Here we show, using in silico and in vivo analyses of mitogen-activated protein (MAP) kinase signalling, that Ras nanoclusters operate as sensitive switches, converting graded ligand inputs into fixed outputs of activated extracellular signal-regulated kinase (ERK). By generating Ras nanoclusters in direct proportion to ligand input, cells build an analogue-digital-analogue circuit relay that transmits a signal across the plasma membrane with high fidelity. Signal transmission is completely dependent on Ras spatial organization and fails if nanoclustering is abrogated. A requirement for high-fidelity signalling may explain the non-random distribution of other plasma membrane signalling complexes.


Asunto(s)
Sistema de Señalización de MAP Quinasas/fisiología , Microdominios de Membrana , Proteínas Quinasas Activadas por Mitógenos/metabolismo , Nanoestructuras , Proteínas ras/metabolismo , Animales , Línea Celular , Cricetinae , Cricetulus , Activación Enzimática , Microdominios de Membrana/química , Microdominios de Membrana/metabolismo , Proteínas Quinasas Activadas por Mitógenos/genética , Modelos Teóricos , Proteínas Recombinantes de Fusión/genética , Proteínas Recombinantes de Fusión/metabolismo , Quinasas raf/genética , Quinasas raf/metabolismo , Proteínas ras/genética
16.
Math Biosci Eng ; 21(1): 1186-1202, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38303460

RESUMEN

Cancer is the result of continuous accumulation of gene mutations in normal cells. The number of mutations is different in different types of cancer and even in different patients with the same type of cancer. Therefore, studying all possible numbers of gene mutations in malignant cells is of great value for the understanding of tumorigenesis and the treatment of cancer. To this end, we applied a stochastic mathematical model considering the clonal expansion of any premalignant cells with different mutations to analyze the number of gene mutations in colorectal cancer. The age-specific colorectal cancer incidence rates from the Surveillance, Epidemiology and End Results (SEER) registry in the United States and the Life Span Study (LSS) in Nagasaki and Hiroshima, Japan are chosen to test the reasonableness of the model. Our fitting results indicate that the transformation from normal cells to malignant cells may undergo two to five driver mutations for colorectal cancer patients without radiation-exposed environment, two to four driver mutations for colorectal cancer patients with low level radiation-exposure, and two to three driver mutations for colorectal cancer patients with high level radiation-exposure. Furthermore, the net growth rate of the mutated cells with radiation-exposure was is higher than that of the mutated cells without radiation-exposure for the models with two to five driver mutations. These results suggest that radiation environment may affect the clonal expansion of cells and significantly affect the development of tumors.


Asunto(s)
Neoplasias Colorrectales , Exposición a la Radiación , Humanos , Estados Unidos , Modelos Teóricos , Mutación , Carcinogénesis/genética , Carcinogénesis/patología , Neoplasias Colorrectales/genética
17.
Math Biosci ; 371: 109170, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38467302

RESUMEN

Drug resistance is one of the most intractable issues to the targeted therapy for cancer diseases. To explore effective combination therapy schemes, we propose a mathematical model to study the effects of different treatment schemes on the dynamics of cancer cells. Then we characterize the dynamical behavior of the model by finding the equilibrium points and exploring their local stability. Lyapunov functions are constructed to investigate the global asymptotic stability of the model equilibria. Numerical simulations are carried out to verify the stability of equilibria and treatment outcomes using a set of collected model parameters and experimental data on murine colon carcinoma. Simulation results suggest that immunotherapy combined with chemotherapy contributes significantly to the control of tumor growth compared to monotherapy. Sensitivity analysis is performed to identify the importance of model parameters on the variations of model outcomes.


Asunto(s)
Resistencia a Antineoplásicos , Animales , Ratones , Inmunoterapia/métodos , Terapia Combinada , Conceptos Matemáticos , Humanos , Neoplasias del Colon/tratamiento farmacológico , Neoplasias del Colon/patología , Modelos Biológicos , Neoplasias/tratamiento farmacológico , Modelos Teóricos , Simulación por Computador
18.
Phys Rev E ; 109(2-1): 024119, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38491572

RESUMEN

Complex molecular details of transcriptional regulation can be coarse-grained by assuming that reaction waiting times for promoter-state transitions, the mRNA synthesis, and the mRNA degradation follow general distributions. However, how such a generalized two-state model is analytically solved is a long-standing issue. Here we first present analytical formulas of burst-size distributions for this model. Then, we derive an iterative equation for the mRNA moment-generating function, by which mRNA raw and binomial moments of any order can be conveniently calculated. The analytical results obtained in the special cases of phase-type waiting-time distributions not only provide insights into the mechanisms of complex transcriptional regulations but also bring conveniences for experimental data-based statistical inferences.


Asunto(s)
Modelos Genéticos , Listas de Espera , Procesos Estocásticos , Transcripción Genética , ARN Mensajero/genética , ARN Mensajero/metabolismo
19.
Artículo en Inglés | MEDLINE | ID: mdl-36833868

RESUMEN

Digitalization is an excellent opportunity for the manufacturing industry all over the world to improve the core competitiveness and break through the "low-end locking" dilemma. However, it is not clear whether the digitalization of the manufacturing industry has positive ecological and environmental benefits under the resource and environmental constraints. To answer this question, we use the data from the world input-output database (WIOD) to investigate the impact of manufacturing input digitalization on carbon emission intensity by an extended analysis. The results show that the input digitalization of the manufacturing industry has mixed effects on reducing carbon emission intensity. The productive input digitalization can reduce carbon emission intensity, but the distributional input digitalization may increase carbon emission intensity. Non-pollution-intensive manufacturing and high-input digital manufacturing have stronger carbon emission reduction effects than the other industry sectors. From the perspective of input sources, input digitalization from domestic sources has a significant inhibitory effect on the carbon emission intensity. In contrast, input digitalization from foreign sources may increase carbon emission intensity.


Asunto(s)
Carbono , Comercio , Carbono/análisis , Industrias , Industria Manufacturera , Desarrollo Económico , Dióxido de Carbono/análisis , China
20.
Artículo en Inglés | MEDLINE | ID: mdl-37871092

RESUMEN

Feature selection has been extensively applied to identify cancer genes using omics data. Although substantial studies have been conducted to search for cancer genes, the available rich knowledge on various cancers is seldom used as prior information in feature selection. This paper proposes a two-stage prior LASSO (TSPLASSO) method, which represents an early attempt in designing feature selection algorithms using prior information. The first stage performs gene selection via linear regression with LASSO. Candidate genes that are correlated with known cancer genes are retained for subsequent analysis. The second stage establishes a logistic regression model with LASSO to realize final cancer gene selection and sample classification. The key advantages of TSPLASSO include the successive consideration of prior cancer genes and binary sample types as response variables in stages one and two, respectively. In addition, the TSPLASSO performs sample classification and variable selection simultaneously. Compared with six state-of-the-art algorithms, numerical simulations in six real-world datasets show that TSPLASSO can improve the accuracy of variable selection by 5%-400% in the three bulk sequencing datasets and the scRNA-seq dataset; and the performance is robust against data noise and variations of prior cancer genes. The TSPLASSO provides an efficient, stable and practical algorithm for exploring biomedcial and health informatics from omics data.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA