Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 55
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37401373

RESUMO

Recent advances and achievements of artificial intelligence (AI) as well as deep and graph learning models have established their usefulness in biomedical applications, especially in drug-drug interactions (DDIs). DDIs refer to a change in the effect of one drug to the presence of another drug in the human body, which plays an essential role in drug discovery and clinical research. DDIs prediction through traditional clinical trials and experiments is an expensive and time-consuming process. To correctly apply the advanced AI and deep learning, the developer and user meet various challenges such as the availability and encoding of data resources, and the design of computational methods. This review summarizes chemical structure based, network based, natural language processing based and hybrid methods, providing an updated and accessible guide to the broad researchers and development community with different domain knowledge. We introduce widely used molecular representation and describe the theoretical frameworks of graph neural network models for representing molecular structures. We present the advantages and disadvantages of deep and graph learning methods by performing comparative experiments. We discuss the potential technical challenges and highlight future directions of deep and graph learning models for accelerating DDIs prediction.


Assuntos
Inteligência Artificial , Redes Neurais de Computação , Humanos , Interações Medicamentosas , Processamento de Linguagem Natural , Descoberta de Drogas
2.
Chaos ; 34(1)2024 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-38198680

RESUMO

The significance of accurate long-term forecasting of air quality for a long-term policy decision for controlling air pollution and for evaluating its impacts on human health has attracted greater attention recently. This paper proposes an ensemble multi-scale framework to refine the previous version with ensemble empirical mode decomposition (EMD) and nonstationary oscillation resampling (NSOR) for long-term forecasting. Within the proposed ensemble multi-scale framework, we on one hand apply modified EMD to produce more regular and stable EMD components, allowing the long-range oscillation characteristics of the original time series to be better captured. On the other hand, we provide an ensemble mechanism to alleviate the error propagation problem in forecasts caused by iterative implementation of NSOR at all lead times and name it improved NSOR. Application of the proposed multi-scale framework to long-term forecasting of the daily PM2.5 at 14 monitoring stations in Hong Kong demonstrates that it can effectively capture the long-term variation in air pollution processes and significantly increase the forecasting performance. Specifically, the framework can, respectively, reduce the average root-mean-square error and the mean absolute error over all 14 stations by 8.4% and 9.2% for a lead time of 100 days, compared to previous studies. Additionally, better robustness can be obtained by the proposed ensemble framework for 180-day and 365-day long-term forecasting scenarios. It should be emphasized that the proposed ensemble multi-scale framework is a feasible framework, which is applicable for long-term time series forecasting in general.

3.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34111889

RESUMO

Single-cell sequencing is a biotechnology to sequence one layer of genomic information for individual cells in a tissue sample. For example, single-cell DNA sequencing is to sequence the DNA from every single cell. Increasing in complexity, single-cell multi-omics sequencing, or single-cell multimodal omics sequencing, is to profile in parallel multiple layers of omics information from a single cell. In practice, single-cell multi-omics sequencing actually detects multiple traits such as DNA, RNA, methylation information and/or protein profiles from the same cell for many individuals in a tissue sample. Multi-omics sequencing has been widely applied to systematically unravel interplay mechanisms of key components and pathways in cell. This survey overviews recent developments in single-cell multi-omics sequencing, and their applications to understand complex diseases in particular the COVID-19 pandemic. We also summarize machine learning and bioinformatics techniques used in the analysis of the intercorrelated multilayer heterogeneous data. We observed that variational inference and graph-based learning are popular approaches, and Seurat V3 is a commonly used tool to transfer the missing variables and labels. We also discussed two intensively studied issues relating to data consistency and diversity and commented on currently cared issues surrounding the error correction of data pairs and data imputation methods. The survey is concluded with some open questions and opportunities for this extraordinary field.


Assuntos
COVID-19/genética , Pandemias , Proteômica , SARS-CoV-2/genética , Algoritmos , COVID-19/virologia , Biologia Computacional , Análise de Dados , Genômica , Humanos , Aprendizado de Máquina , SARS-CoV-2/patogenicidade , Análise de Célula Única
4.
Mol Phylogenet Evol ; 179: 107662, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36375789

RESUMO

Alignment-based methods have faced disadvantages in sequence comparison and phylogeny reconstruction due to their high computational complexity. Alignment-free methods for sequence comparison and phylogeny inference have attracted a great deal of attention in recent years. Here, we explore an alignment-free approach that uses inner distance distributions of k-mer pairs in biological sequences for phylogeny inference. For every sequence in a dataset, our method transforms the sequence into a numeric feature vector consisting of features each representing a specific k-mer pair's contribution to the characterization of the sequentiality uniqueness of the sequence. This newly defined k-mer pair's contribution is an integration of the reverse Kullback-Leibler divergence, pseudo mode and the classic entropy of an inner distance distribution of the k-mer pair in the sequence. Our method has been tested on datasets of complete genome sequences, complete protein sequences, and gene sequences of rRNA of various lengths. Our method achieves the best performance in comparison with state-of-the-art alignment-free methods as measured by the Robinson-Foulds distance between the reference and the constructed phylogeny trees.


Assuntos
Algoritmos , Genoma , Filogenia
5.
Bioinformatics ; 37(6): 750-758, 2021 05 05.
Artigo em Inglês | MEDLINE | ID: mdl-33063094

RESUMO

MOTIVATION: Infection with strains of different subtypes and the subsequent crossover reading between the two strands of genomic RNAs by host cells' reverse transcriptase are the main causes of the vast HIV-1 sequence diversity. Such inter-subtype genomic recombinants can become circulating recombinant forms (CRFs) after widespread transmissions in a population. Complete prediction of all the subtype sources of a CRF strain is a complicated machine learning problem. It is also difficult to understand whether a strain is an emerging new subtype and if so, how to accurately identify the new components of the genetic source. RESULTS: We introduce a multi-label learning algorithm for the complete prediction of multiple sources of a CRF sequence as well as the prediction of its chronological number. The prediction is strengthened by a voting of various multi-label learning methods to avoid biased decisions. In our steps, frequency and position features of the sequences are both extracted to capture signature patterns of pure subtypes and CRFs. The method was applied to 7185 HIV-1 sequences, comprising 5530 pure subtype sequences and 1655 CRF sequences. Results have demonstrated that the method can achieve very high accuracy (reaching 99%) in the prediction of the complete set of labels of HIV-1 recombinant forms. A few wrong predictions are actually incomplete predictions, very close to the complete set of genuine labels. AVAILABILITY AND IMPLEMENTATION: https://github.com/Runbin-tang/The-source-of-HIV-CRFs-prediction. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Infecções por HIV , HIV-1 , Variação Genética , Infecções por HIV/genética , HIV-1/genética , Humanos , Epidemiologia Molecular , Filogenia
6.
BMC Bioinformatics ; 22(Suppl 6): 142, 2021 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-34078284

RESUMO

BACKGROUND: Genomic reads from sequencing platforms contain random errors. Global correction algorithms have been developed, aiming to rectify all possible errors in the reads using generic genome-wide patterns. However, the non-uniform sequencing depths hinder the global approach to conduct effective error removal. As some genes may get under-corrected or over-corrected by the global approach, we conduct instance-based error correction for short reads of disease-associated genes or pathways. The paramount requirement is to ensure the relevant reads, instead of the whole genome, are error-free to provide significant benefits for single-nucleotide polymorphism (SNP) or variant calling studies on the specific genes. RESULTS: To rectify possible errors in the short reads of disease-associated genes, our novel idea is to exploit local sequence features and statistics directly related to these genes. Extensive experiments are conducted in comparison with state-of-the-art methods on both simulated and real datasets of lung cancer associated genes (including single-end and paired-end reads). The results demonstrated the superiority of our method with the best performance on precision, recall and gain rate, as well as on sequence assembly results (e.g., N50, the length of contig and contig quality). CONCLUSION: Instance-based strategy makes it possible to explore fine-grained patterns focusing on specific genes, providing high precision error correction and convincing gene sequence assembly. SNP case studies show that errors occurring at some traditional SNP areas can be accurately corrected, providing high precision and sensitivity for investigations on disease-causing point mutations.


Assuntos
Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Algoritmos , Genômica , Análise de Sequência de DNA
7.
Bioinformatics ; 35(12): 2066-2074, 2019 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-30407482

RESUMO

MOTIVATION: Advanced high-throughput sequencing technologies have produced massive amount of reads data, and algorithms have been specially designed to contract the size of these datasets for efficient storage and transmission. Reordering reads with regard to their positions in de novo assembled contigs or in explicit reference sequences has been proven to be one of the most effective reads compression approach. As there is usually no good prior knowledge about the reference sequence, current focus is on the novel construction of de novo assembled contigs. RESULTS: We introduce a new de novo compression algorithm named minicom. This algorithm uses large k-minimizers to index the reads and subgroup those that have the same minimizer. Within each subgroup, a contig is constructed. Then some pairs of the contigs derived from the subgroups are merged into longer contigs according to a (w, k)-minimizer-indexed suffix-prefix overlap similarity between two contigs. This merging process is repeated after the longer contigs are formed until no pair of contigs can be merged. We compare the performance of minicom with two reference-based methods and four de novo methods on 18 datasets (13 RNA-seq datasets and 5 whole genome sequencing datasets). In the compression of single-end reads, minicom obtained the smallest file size for 22 of 34 cases with significant improvement. In the compression of paired-end reads, minicom achieved 20-80% compression gain over the best state-of-the-art algorithm. Our method also achieved a 10% size reduction of compressed files in comparison with the best algorithm under the reads-order preserving mode. These excellent performances are mainly attributed to the exploit of the redundancy of the repetitive substrings in the long contigs. AVAILABILITY AND IMPLEMENTATION: https://github.com/yuansliu/minicom. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Compressão de Dados , Software , Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Sequenciamento Completo do Genoma
8.
Chaos ; 30(11): 113123, 2020 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-33261323

RESUMO

In this study, we focus on the fractal property of recurrence networks constructed from the two-dimensional fractional Brownian motion (2D fBm), i.e., the inter-system recurrence network, the joint recurrence network, the cross-joint recurrence network, and the multidimensional recurrence network, which are the variants of classic recurrence networks extended for multiple time series. Generally, the fractal dimension of these recurrence networks can only be estimated numerically. The numerical analysis identifies the existence of fractality in these constructed recurrence networks. Furthermore, it is found that the numerically estimated fractal dimension of these networks can be connected to the theoretical fractal dimension of the 2D fBm graphs, because both fractal dimensions are piecewisely associated with the Hurst exponent H in a highly similar pattern, i.e., a linear decrease (if H varies from 0 to 0.5) followed by an inversely proportional-like decay (if H changes from 0.5 to 1). Although their fractal dimensions are not exactly identical, their difference can actually be deciphered by one single parameter with the value around 1. Therefore, it can be concluded that these recurrence networks constructed from the 2D fBms must inherit some fractal properties of its associated 2D fBms with respect to the fBm graphs.

9.
Chaos ; 30(2): 023134, 2020 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-32113234

RESUMO

Fractal and multifractal properties of various systems have been studied extensively. In this paper, first, the multivariate multifractal detrend cross-correlation analysis (MMXDFA) is proposed to investigate the multifractal features in multivariate time series. MMXDFA may produce oscillations in the fluctuation function and spurious cross correlations. In order to overcome these problems, we then propose the multivariate multifractal temporally weighted detrended cross-correlation analysis (MMTWXDFA). In relation to the multivariate detrended cross-correlation analysis and multifractal temporally weighted detrended cross-correlation analysis, an innovation of MMTWXDFA is the application of the signed Manhattan distance to calculate the local detrended covariance function. To evaluate the performance of the MMXDFA and MMTWXDFA methods, we apply them on some artificially generated multivariate series. Several numerical tests demonstrate that both methods can identify their fractality, but MMTWXDFA can detect long-range cross correlations and simultaneously quantify the levels of cross correlation between two multivariate series more accurately.

10.
Chaos ; 30(5): 053113, 2020 May.
Artigo em Inglês | MEDLINE | ID: mdl-32491907

RESUMO

A novel general randomized method is proposed to investigate multifractal properties of long time series. Based on multifractal temporally weighted detrended fluctuation analysis (MFTWDFA), we obtain randomized multifractal temporally weighted detrended fluctuation analysis (RMFTWDFA). The innovation of this algorithm is applying a random idea in the process of dividing multiple intervals to find the local trend. To test the performance of the RMFTWDFA algorithm, we apply it, together with the MFTWDFA, to the artificially generated time series and real genomic sequences. For three types of artificially generated time series, consistency tests are performed on the estimated h(q), and all results indicate that there is no significant difference in the estimated h(q) of the two methods. Meanwhile, for different sequence lengths, the running time of RMFTWDFA is reduced by over ten times. We use prokaryote genomic sequences with large scales as real examples, the results obtained by RMFTWDFA demonstrate that these genomic sequences show fractal characteristics, and we leverage estimated exponents to study phylogenetic relationships between species. The final clustering results are consistent with real relationships. All the results reflect that RMFTWDFA is significantly effective and timesaving for long time series, while obtaining an accuracy statistically comparable to other methods.


Assuntos
Fractais , Filogenia , Algoritmos , Bactérias/genética , Bases de Dados Genéticas
11.
Entropy (Basel) ; 22(3)2020 Mar 13.
Artigo em Inglês | MEDLINE | ID: mdl-33286103

RESUMO

Genome-wide association study (GWAS) has turned out to be an essential technology for exploring the genetic mechanism of complex traits. To reduce the complexity of computation, it is well accepted to remove unrelated single nucleotide polymorphisms (SNPs) before GWAS, e.g., by using iterative sure independence screening expectation-maximization Bayesian Lasso (ISIS EM-BLASSO) method. In this work, a modified version of ISIS EM-BLASSO is proposed, which reduces the number of SNPs by a screening methodology based on Pearson correlation and mutual information, then estimates the effects via EM-Bayesian Lasso (EM-BLASSO), and finally detects the true quantitative trait nucleotides (QTNs) through likelihood ratio test. We call our method a two-stage mutual information based Bayesian Lasso (MBLASSO). Under three simulation scenarios, MBLASSO improves the statistical power and retains the higher effect estimation accuracy when comparing with three other algorithms. Moreover, MBLASSO performs best on model fitting, the accuracy of detected associations is the highest, and 21 genes can only be detected by MBLASSO in Arabidopsis thaliana datasets.

12.
Entropy (Basel) ; 22(9)2020 Sep 08.
Artigo em Inglês | MEDLINE | ID: mdl-33286772

RESUMO

In this paper, we propose a new cross-sample entropy, namely the composite multiscale partial cross-sample entropy (CMPCSE), for quantifying the intrinsic similarity of two time series affected by common external factors. First, in order to test the validity of CMPCSE, we apply it to three sets of artificial data. Experimental results show that CMPCSE can accurately measure the intrinsic cross-sample entropy of two simultaneously recorded time series by removing the effects from the third time series. Then CMPCSE is employed to investigate the partial cross-sample entropy of Shanghai securities composite index (SSEC) and Shenzhen Stock Exchange Component Index (SZSE) by eliminating the effect of Hang Seng Index (HSI). Compared with the composite multiscale cross-sample entropy, the results obtained by CMPCSE show that SSEC and SZSE have stronger similarity. We believe that CMPCSE is an effective tool to study intrinsic similarity of two time series.

13.
Entropy (Basel) ; 22(2)2020 Feb 23.
Artigo em Inglês | MEDLINE | ID: mdl-33286029

RESUMO

HIV-1 viruses, which are predominant in the family of HIV viruses, have strong pathogenicity and infectivity. They can evolve into many different variants in a very short time. In this study, we propose a new and effective alignment-free method for the phylogenetic analysis of HIV-1 viruses using complete genome sequences. Our method combines the position distribution information and the counts of the k-mers together. We also propose a metric to determine the optimal k value. We name our method the Position-Weighted k-mers (PWkmer) method. Validation and comparison with the Robinson-Foulds distance method and the modified bootstrap method on a benchmark dataset show that our method is reliable for the phylogenetic analysis of HIV-1 viruses. PWkmer can resolve within-group variations for different known subtypes of Group M of HIV-1 viruses. This method is simple and computationally fast for whole genome phylogenetic analysis.

14.
BMC Bioinformatics ; 19(Suppl 19): 521, 2018 Dec 31.
Artigo em Inglês | MEDLINE | ID: mdl-30598066

RESUMO

BACKGROUND: Distinction between pre-microRNAs (precursor microRNAs) and length-similar pseudo pre-microRNAs can reveal more about the regulatory mechanism of RNA biological processes. Machine learning techniques have been widely applied to deal with this challenging problem. However, most of them mainly focus on secondary structure information of pre-microRNAs, while ignoring sequence-order information and sequence evolution information. RESULTS: We use new features for the machine learning algorithms to improve the classification performance by characterizing both sequence order evolution information and secondary structure graphs. We developed three steps to extract these features of pre-microRNAs. We first extract features from PSI-BLAST profiles and Hilbert-Huang transforms, which contain rich sequence evolution information and sequence-order information respectively. We then obtain properties of small molecular networks of pre-microRNAs, which contain refined secondary structure information. These structural features are carefully generated so that they can depict both global and local characteristics of pre-microRNAs. In total, our feature space covers 591 features. The maximum relevance and minimum redundancy (mRMR) feature selection method is adopted before support vector machine (SVM) is applied as our classifier. The constructed classification model is named MicroRNA -NHPred. The performance of MicroRNA -NHPred is high and stable, which is better than that of those state-of-the-art methods, achieving an accuracy of up to 94.83% on same benchmark datasets. CONCLUSIONS: The high prediction accuracy achieved by our proposed method is attributed to the design of a comprehensive feature set on the sequences and secondary structures, which are capable of characterizing the sequence evolution information and sequence-order information, and global and local information of pre-microRNAs secondary structures. MicroRNA -NHPred is a valuable method for pre-microRNAs identification. The source codes of our method can be downloaded from https://github.com/myl446/MicroRNA-NHPred .


Assuntos
Algoritmos , Biologia Computacional/métodos , Genoma Humano , Aprendizado de Máquina , MicroRNAs/química , MicroRNAs/genética , Humanos , Máquina de Vetores de Suporte
15.
Bioinformatics ; 33(14): 2214-2215, 2017 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-28369270

RESUMO

SUMMARY: A number of alignment-free methods have been proposed for phylogeny reconstruction over the past two decades. But there are some long-standing challenges in these methods, including requirement of huge computer memory and CPU time, and existence of duplicate computations. In this article, we address these challenges with the idea of compressed vector, fingerprint and scalable memory management. With these ideas we developed the DLTree algorithm for efficient implementation of the dynamical language model and whole genome-based phylogenetic analysis. The DLTree algorithm was compared with other alignment-free tools, demonstrating that it is more efficient and accurate for phylogeny reconstruction. AVAILABILITY AND IMPLEMENTATION: The DLTree algorithm is freely available at http://dltree.xtu.edu.cn. CONTACT: yuzuguo@aliyun.com or yangjy@nankai.edu.cn. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Filogenia , Software , Sequenciamento Completo do Genoma
16.
Chaos ; 27(6): 063111, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-28679233

RESUMO

A new method-multifractal temporally weighted detrended cross-correlation analysis (MF-TWXDFA)-is proposed to investigate multifractal cross-correlations in this paper. This new method is based on multifractal temporally weighted detrended fluctuation analysis and multifractal cross-correlation analysis (MFCCA). An innovation of the method is applying geographically weighted regression to estimate local trends in the nonstationary time series. We also take into consideration the sign of the fluctuations in computing the corresponding detrended cross-covariance function. To test the performance of the MF-TWXDFA algorithm, we apply it and the MFCCA method on simulated and actual series. Numerical tests on artificially simulated series demonstrate that our method can accurately detect long-range cross-correlations for two simultaneously recorded series. To further show the utility of MF-TWXDFA, we apply it on time series from stock markets and find that power-law cross-correlation between stock returns is significantly multifractal. A new coefficient, MF-TWXDFA cross-correlation coefficient, is also defined to quantify the levels of cross-correlation between two time series.

17.
Mol Phylogenet Evol ; 96: 102-111, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26724405

RESUMO

UNLABELLED: Traditional methods for sequence comparison and phylogeny reconstruction rely on pair wise and multiple sequence alignments. But alignment could not be directly applied to whole genome/proteome comparison and phylogenomic studies due to their high computational complexity. Hence alignment-free methods became popular in recent years. Here we propose a fast alignment-free method for whole genome/proteome comparison and phylogeny reconstruction using higher order Markov model and chaos game representation. In the present method, we use the transition matrices of higher order Markov models to characterize amino acid or DNA sequences for their comparison. The order of the Markov model is uniquely identified by maximizing the average Shannon entropy of conditional probability distributions. Using one-dimensional chaos game representation and linked list, this method can reduce large memory and time consumption which is due to the large-scale conditional probability distributions. To illustrate the effectiveness of our method, we employ it for fast phylogeny reconstruction based on genome/proteome sequences of two species data sets used in previous published papers. Our results demonstrate that the present method is useful and efficient. AVAILABILITY AND IMPLEMENTATION: The source codes for our algorithm to get the distance matrix and genome/proteome sequences can be downloaded from ftp://121.199.20.25/. The software Phylip and EvolView we used to construct phylogenetic trees can be referred from their websites.


Assuntos
Genoma/genética , Cadeias de Markov , Dinâmica não Linear , Filogenia , Células Procarióticas/metabolismo , Proteoma/genética , Algoritmos , Alinhamento de Sequência , Software
18.
Mol Phylogenet Evol ; 89: 37-45, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-25882834

RESUMO

There has been a growing interest in alignment-free methods for whole genome comparison and phylogenomic studies. In this study, we propose an alignment-free method for phylogenetic tree construction using whole-proteome sequences. Based on the inter-amino-acid distances, we first convert the whole-proteome sequences into inter-amino-acid distance vectors, which are called observed inter-amino-acid distance profiles. Then, we propose to use conditional geometric distribution profiles (the distributions of sequences where the amino acids are placed randomly and independently) as the reference distribution profiles. Last the relative deviation between the observed and reference distribution profiles is used to define a simple metric that reflects the phylogenetic relationships between whole-proteome sequences of different organisms. We name our method inter-amino-acid distances and conditional geometric distribution profiles (IAGDP). We evaluate our method on two data sets: the benchmark dataset including 29 genomes used in previous published papers, and another one including 67 mammal genomes. Our results demonstrate that the new method is useful and efficient.


Assuntos
Aminoácidos/análise , Filogenia , Proteoma/análise , Proteoma/química , Aminoácidos/química , Animais , Sequência de Bases , Bases de Dados Genéticas , Genoma/genética , Mamíferos/genética , Proteoma/genética
19.
Chaos ; 25(2): 023103, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25725639

RESUMO

Complex networks have attracted much attention in diverse areas of science and technology. Multifractal analysis (MFA) is a useful way to systematically describe the spatial heterogeneity of both theoretical and experimental fractal patterns. In this paper, we employ the sandbox (SB) algorithm proposed by Tél et al. (Physica A 159, 155-166 (1989)), for MFA of complex networks. First, we compare the SB algorithm with two existing algorithms of MFA for complex networks: the compact-box-burning algorithm proposed by Furuya and Yakubo (Phys. Rev. E 84, 036118 (2011)), and the improved box-counting algorithm proposed by Li et al. (J. Stat. Mech.: Theor. Exp. 2014, P02020 (2014)) by calculating the mass exponents τ(q) of some deterministic model networks. We make a detailed comparison between the numerical and theoretical results of these model networks. The comparison results show that the SB algorithm is the most effective and feasible algorithm to calculate the mass exponents τ(q) and to explore the multifractal behavior of complex networks. Then, we apply the SB algorithm to study the multifractal property of some classic model networks, such as scale-free networks, small-world networks, and random networks. Our results show that multifractality exists in scale-free networks, that of small-world networks is not obvious, and it almost does not exist in random networks.

20.
J Theor Biol ; 344: 31-9, 2014 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-24316387

RESUMO

Membrane proteins play important roles in many biochemical processes and are also attractive targets of drug discovery for various diseases. The elucidation of membrane protein types provides clues for understanding the structure and function of proteins. Recently we developed a novel system for predicting protein subnuclear localizations. In this paper, we propose a simplified version of our system for predicting membrane protein types directly from primary protein structures, which incorporates amino acid classifications and physicochemical properties into a general form of pseudo-amino acid composition. In this simplified system, we will design a two-stage multi-class support vector machine combined with a two-step optimal feature selection process, which proves very effective in our experiments. The performance of the present method is evaluated on two benchmark datasets consisting of five types of membrane proteins. The overall accuracies of prediction for five types are 93.25% and 96.61% via the jackknife test and independent dataset test, respectively. These results indicate that our method is effective and valuable for predicting membrane protein types. A web server for the proposed method is available at http://www.juemengt.com/jcc/memty_page.php.


Assuntos
Aminoácidos/classificação , Proteínas de Membrana/química , Máquina de Vetores de Suporte , Algoritmos , Aminoácidos/química , Animais , Físico-Química , Biologia Computacional/métodos , Bases de Dados de Proteínas , Proteínas de Membrana/análise
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA