Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 49
Filtrar
1.
Sensors (Basel) ; 24(9)2024 Apr 27.
Artigo em Inglês | MEDLINE | ID: mdl-38732896

RESUMO

Accurate and fast recognition of vehicle license plates from natural scene images is a crucial and challenging task. Existing methods can recognize license plates in simple scenarios, but their performance degrades significantly in complex environments. A novel license plate detection and recognition model YOLOv5-PDLPR is proposed, which employs YOLOv5 target detection algorithm in the license plate detection part and uses the PDLPR algorithm proposed in this paper in the license plate recognition part. The PDLPR algorithm is mainly designed as follows: (1) A Multi-Head Attention mechanism is used to accurately recognize individual characters. (2) A global feature extractor network is designed to improve the completeness of the network for feature extraction. (3) The latest parallel decoder architecture is adopted to improve the inference efficiency. The experimental results show that the proposed algorithm has better accuracy and speed than the comparison algorithms, can achieve real-time recognition, and has high efficiency and robustness in complex scenes.

2.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33377150

RESUMO

Data from the SEER reports reveal that the occurrence rate of a cancer type generally follows a unimodal distribution over age, peaking at an age that is cancer-type specific and ranges from 30+ through 70+. Previous studies attribute such bell-shaped distributions to the reduced proliferative potential in senior years but fail to explain why some cancers have their occurrence peak at 30+ or 40+. We present a computational model to offer a new explanation to such distributions. The model uses two factors to explain the observed age-dependent cancer occurrence rates: cancer risk of an organ and the availability level of the growth signals in circulation needed by a cancer type, with the former increasing and the latter decreasing with age. Regression analyses were conducted of known occurrence rates against such factors for triple negative breast cancer, testicular cancer and cervical cancer; and all achieved highly tight fitting results, which were also consistent with clinical, gene-expression and cancer-drug data. These reveal a fundamentally important relationship: while cancer is driven by endogenous stressors, it requires sufficient levels of exogenous growth signals to happen, hence suggesting the realistic possibility for treating cancer via cleaning out the growth signals in circulation needed by a cancer.


Assuntos
Bases de Dados Factuais , Modelos Biológicos , Neoplasias Testiculares , Neoplasias de Mama Triplo Negativas , Neoplasias do Colo do Útero , Adulto , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Neoplasias Testiculares/epidemiologia , Neoplasias Testiculares/metabolismo , Neoplasias de Mama Triplo Negativas/epidemiologia , Neoplasias de Mama Triplo Negativas/metabolismo , Neoplasias do Colo do Útero/epidemiologia , Neoplasias do Colo do Útero/metabolismo
3.
BMC Bioinformatics ; 23(1): 5, 2022 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-34983367

RESUMO

BACKGROUND: More and more evidence showed that long non-coding RNAs (lncRNAs) play important roles in the development and progression of human sophisticated diseases. Therefore, predicting human lncRNA-disease associations is a challenging and urgently task in bioinformatics to research of human sophisticated diseases. RESULTS: In the work, a global network-based computational framework called as LRWRHLDA were proposed which is a universal network-based method. Firstly, four isomorphic networks include lncRNA similarity network, disease similarity network, gene similarity network and miRNA similarity network were constructed. And then, six heterogeneous networks include known lncRNA-disease, lncRNA-gene, lncRNA-miRNA, disease-gene, disease-miRNA, and gene-miRNA associations network were applied to design a multi-layer network. Finally, the Laplace normalized random walk with restart algorithm in this global network is suggested to predict the relationship between lncRNAs and diseases. CONCLUSIONS: The ten-fold cross validation is used to evaluate the performance of LRWRHLDA. As a result, LRWRHLDA achieves an AUC of 0.98402, which is higher than other compared methods. Furthermore, LRWRHLDA can predict isolated disease-related lnRNA (isolated lnRNA related disease). The results for colorectal cancer, lung adenocarcinoma, stomach cancer and breast cancer have been verified by other researches. The case studies indicated that our method is effective.


Assuntos
MicroRNAs , Neoplasias/genética , RNA Longo não Codificante , Algoritmos , Biologia Computacional , Redes Reguladoras de Genes , Humanos , MicroRNAs/genética , RNA Longo não Codificante/genética
4.
J Theor Biol ; 538: 111039, 2022 04 07.
Artigo em Inglês | MEDLINE | ID: mdl-35085534

RESUMO

Based on the physicochemical indexes of 20 amino acids and the Hungarian algorithm, each amino acid was mapped into a vector. And, the protein sequence can be represented as time series in eleven-dimensional space. In addition, the DTW algorithm was applied to calculate the distance between two time series to compare the similarities of protein sequences. The validity and accuracy of this method was illustrated by similarity comparison of ND5 proteins of nine species. Furthermore, homology analysis of eleven ACE2 proteins, which included human, Malayan pangolin and six species of bats, confirmed that the human had shorter evolutionary distance from the pangolin than those bats. The phylogenetic tree of spike protein sequences of 36 coronaviruses, which were divided into five groups, Class I, Class II, Class III, SARS-CoVs and COVID-19, was constructed.


Assuntos
COVID-19 , Quirópteros , Sequência de Aminoácidos , Animais , Humanos , Filogenia , SARS-CoV-2/genética , Fatores de Tempo
5.
BMC Bioinformatics ; 21(1): 159, 2020 Apr 29.
Artigo em Inglês | MEDLINE | ID: mdl-32349677

RESUMO

BACKGROUND: Genomic islands are associated with microbial adaptations, carrying genomic signatures different from the host. Some methods perform an overall test to identify genomic islands based on their local features. However, regions of different scales will display different genomic features. RESULTS: We proposed here a novel method "2SigFinder ", the first combined use of small-scale and large-scale statistical testing for genomic island detection. The proposed method was tested by genomic island boundary detection and identification of genomic islands or functional features of real biological data. We also compared the proposed method with the comparative genomics and composition-based approaches. The results indicate that the proposed 2SigFinder is more efficient in identifying genomic islands. CONCLUSIONS: From real biological data, 2SigFinder identified genomic islands from a single genome and reported robust results across different experiments, without annotated information of genomes or prior knowledge from other datasets. 2SigHunter identified 25 Pathogenicity, 1 tRNA, 2 Virulence and 2 Repeats from 27 Pathogenicity, 1 tRNA, 2 Virulence and 2 Repeats, and detected 101 Phage and 28 HEG out of 130 Phage and 36 HEGs in S. enterica Typhi CT18, which shows that it is more efficient in detecting functional features associated with GIs.


Assuntos
Algoritmos , Genoma Bacteriano , Ilhas Genômicas/genética , Genômica/métodos , Pseudomonas aeruginosa/genética , Pseudomonas aeruginosa/patogenicidade , Salmonella enterica/genética , Salmonella enterica/patogenicidade , Virulência
6.
J Theor Biol ; 467: 142-149, 2019 04 21.
Artigo em Inglês | MEDLINE | ID: mdl-30768974

RESUMO

Genomic islands that are associated with microbial adaptations and carry genomic signatures different from that of the host, and thus many methods have been proposed to select the informative genomic signatures from a range of organisms and discriminate genomic islands from the rest of the genome in terms of these signature biases. However, they are of limited use when closely related genomes are unavailable. In the present work, we proposed a kurtosis-based ranking method to select the informative genomic signatures from a single genome. In simulations with alien fragments from artificial and real genomes, the proposed kurtosis-based ranking method efficiently selected the informative genomic signatures from a single genome, without annotated information of genomes or prior knowledge from other datasets. This understanding can be useful to design more powerful method for genomic island detection.


Assuntos
Genoma Bacteriano , Ilhas Genômicas , Genômica/métodos , Algoritmos
7.
Bioinformatics ; 33(20): 3195-3201, 2017 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-28637337

RESUMO

MOTIVATION: Low-rank matrix completion has been demonstrated to be powerful in predicting antigenic distances among influenza viruses and vaccines from partially revealed hemagglutination inhibition table. Meanwhile, influenza hemagglutinin (HA) protein sequences are also effective in inferring antigenic distances. Thus, it is natural to integrate HA protein sequence information into low-rank matrix completion model to help infer influenza antigenicity, which is critical to influenza vaccine development. RESULTS: We have proposed a novel algorithm called biological matrix completion with side information (BMCSI), which first measures HA protein sequence similarities among influenza viruses (especially on epitopes) and then integrates the similarity information into a low-rank matrix completion model to predict influenza antigenicity. This algorithm exploits both the correlations among viruses and vaccines in serological tests and the power of HA sequence in predicting influenza antigenicity. We applied this model into H3N2 seasonal influenza virus data. Comparing to previous methods, we significantly reduced the prediction root-mean-square error in a 10-fold cross validation analysis. Based on the cartographies constructed from imputed data, we showed that the antigenic evolution of H3N2 seasonal influenza is generally S-shaped while the genetic evolution is half-circle shaped. We also showed that the Spearman correlation between genetic and antigenic distances (among antigenic clusters) is 0.83, demonstrating a globally high correspondence and some local discrepancies between influenza genetic and antigenic evolution. Finally, we showed that 4.4%±1.2% genetic variance (corresponding to 3.11 ± 1.08 antigenic distances) caused an antigenic drift event for H3N2 influenza viruses historically. AVAILABILITY AND IMPLEMENTATION: The software and data for this study are available at http://bi.sky.zstu.edu.cn/BMCSI/. CONTACT: jialiang.yang@mssm.edu or pinganhe@zstu.edu.cn. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Antígenos Virais , Biologia Computacional/métodos , Variação Genética , Vírus da Influenza A Subtipo H3N2/imunologia , Vacinas contra Influenza , Software , Algoritmos , Epitopos , Evolução Molecular , Testes de Inibição da Hemaglutinação , Glicoproteínas de Hemaglutininação de Vírus da Influenza/imunologia , Vírus da Influenza A Subtipo H3N2/genética , Vírus da Influenza A Subtipo H3N2/metabolismo , Modelos Imunológicos , Análise de Sequência de Proteína/métodos
8.
Opt Express ; 26(3): 2252-2260, 2018 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-29401765

RESUMO

The influences of dot material component, barrier material component, aspect ratio and carrier density on the refractive index changes of TE mode and TM mode of columnar quantum dot are analyzed, and a multiparameter adjustment method is proposed to realize low polarization dependence of refractive index change. Then the quantum dots with low polarization dependence of refractive index change (<1.5%) within C-band (1530 nm - 1565 nm) are designed, and it shows that quantum dots with different material parameters are anticipated to have similar characteristics of low polarization dependence.

9.
Opt Express ; 26(9): 11843-11849, 2018 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-29716101

RESUMO

Metasurfaces consist of dielectric nanobrick arrays with different dimensions in the long and short axes can be used to generate different phase delays, predicting a new way to manipulate an incident beam in the two orthogonal directions separately. Here we demonstrate the concept of depth perception based three-dimensional (3D) holograms with polarization-independent metasurfaces. 4-step dielectric metasurfaces-based fan-out optical elements and holograms operating at 658 nm were designed and simulated. Two different holographic images with high fidelity were generated at the same plane in the far field for different polarization states. One can observe the 3D effect of target objects with polarized glasses. With the advantages of ultracompactness, flexibility and replicability, the polarization-independent metasurfaces open up depth perception based stereoscopic imaging in a holographic way.

10.
Int J Mol Sci ; 19(3)2018 Mar 03.
Artigo em Inglês | MEDLINE | ID: mdl-29510512

RESUMO

Brassinosteroids are important phytohormones for plant growth and development. In soybean (Glycine max), BR receptors have been identified, but the genes encoding BR biosynthesis-related enzymes remain poorly understood. Here, we found that the soybean genome encodes eight steroid reductases (GmDET2a to GmDET2h). Phylogenetic analysis grouped 105 steroid reductases from moss, fern and higher plants into five subgroups and indicated that the steroid reductase family has experienced purifying selection. GmDET2a and GmDET2b, homologs of the Arabidopsis thaliana steroid 5 α -reductase AtDET2, are proteins of 263 amino acids. Ectopic expression of GmDET2a and GmDET2b rescued the defects of the Atdet2-1 mutant in both darkness and light. Compared to the mutant, the hypocotyl length and plant height of the transgenic lines GmDET2a and GmDET2b increased significantly, in both darkness and light, and the transcript levels of the BR biosynthesis-related genes CPD, DWF4, BR6ox-1 and BR6ox-2 were downregulated in GmDET2aOX-23 and GmDET2bOX-16 lines compared to that in Atdet2-1. Quantitative real-time PCR revealed that GmDET2a and GmDET2b are ubiquitously expressed in all tested soybean organs, including roots, leaves and hypocotyls. Moreover, epibrassinosteroid negatively regulated GmDET2a and GmDET2b expression. Sulfate deficiency downregulated GmDET2a in leaves and GmDET2b in leaves and roots; by contrast, phosphate deficiency upregulated GmDET2b in roots and leaves. Taken together, our results revealed that GmDET2a and GmDET2b function as steroid reductases.


Assuntos
3-Oxo-5-alfa-Esteroide 4-Desidrogenase/genética , Glycine max/genética , Proteínas de Plantas/genética , 3-Oxo-5-alfa-Esteroide 4-Desidrogenase/metabolismo , Regulação da Expressão Gênica de Plantas , Hipocótilo/metabolismo , Folhas de Planta/metabolismo , Proteínas de Plantas/metabolismo , Glycine max/enzimologia
11.
Opt Lett ; 42(7): 1261-1264, 2017 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-28362744

RESUMO

A conventional optical zoom system is bulky, expensive, and complicated for real-time adjustment. Recent progress in metasurface research has provided a new solution to achieve innovative compact optical systems. In this Letter, we propose a highly integrated step-zoom lens with dual field of view (FOV) based on double-sided metasurfaces. With silicon nanobrick arrays of spatially varying orientations sitting on both sides of a transparent substrate, this ultrathin step-zoom metalens can be designed to focus an incident circular polarized beam with handedness-dependent FOVs without varying the focal plane, which is important for practical applications. The proposed dual FOV step-zoom metalens, with advantages such as ultracompactness, flexibility, and replicability, can find applications in fields that require ultracompact zoom imaging and beam focusing.

12.
Opt Express ; 24(6): 6749-57, 2016 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-27136861

RESUMO

Since the transmission of anisotropic nano-structures is sensitive to the polarisation of an incident beam, a novel polarising beam splitter (PBS) based on silicon nanobrick arrays is proposed. With careful design of such structures, an incident beam with polarisation direction aligned with the long axis of the nanobrick is almost totally reflected (~98.5%), whilst that along the short axis is nearly totally transmitted (~94.3%). More importantly, by simply changing the width of the nanobrick we can shift the peak response wavelength from 1460 nm to 1625 nm, covering S, C and L bands of the fiber telecommunications windows. The silicon nanobrick-based PBS can find applications in many fields which require ultracompactness, high efficiency, and compatibility with semiconductor industry technologies.

13.
Opt Lett ; 40(18): 4285-8, 2015 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-26371917

RESUMO

Established diffractive optical elements (DOEs), such as Dammann gratings, whose phase profile is controlled by etching different depths into a transparent dielectric substrate, suffer from a contradiction between the complexity of fabrication procedures and the performance of such gratings. In this Letter, we combine the concept of geometric phase and phase modulation in depth, and prove by theoretical analysis and numerical simulation that nanorod arrays etched on a silicon substrate have a characteristic of strong polarization conversion between two circularly polarized states and can act as a highly efficient half-wave plate. More importantly, only by changing the orientation angles of each nanorod can the arrays control the phase of a circularly polarized light, cell by cell. With the above principle, we report the realization of nanorod-based Dammann gratings reaching diffraction efficiencies of 50%-52% in the C-band fiber telecommunications window (1530-1565 nm). In this design, uniform 4×4 spot arrays with an extending angle of 59°×59° can be obtained in the far field. Because of these advantages of the single-step fabrication procedure, accurate phase controlling, and strong polarization conversion, nanorod-based Dammann gratings could be utilized for various practical applications in a range of fields.

14.
J Theor Biol ; 369: 51-8, 2015 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-25636491

RESUMO

Polymerase chain reaction (PCR) is hailed as one of the monumental scientific techniques of the twentieth century, and has become a common and often indispensable technique in many areas. However, researchers still frequently find some DNA templates very hard to amplify with PCR, although many kinds of endeavors were introduced to optimize the amplification. In fact, during the past decades, the experimental procedure of PCR was always the focus of attention, while the analysis of a DNA template, the PCR experimental subject itself, was almost neglected. Up to now, nobody can certainly identify whether a fragment of DNA can be simply amplified using conventional Taq DNA polymerase-based PCR protocol. Characterizing a DNA template and then developing a reliable and efficient method to predict the success of PCR reactions is thus urgently needed. In this study, by means of the Markov maximal order model, we construct a 48-D feature vector to represent a DNA template. Support vector machine (SVM) is then employed to help evaluate PCR result. To examine the anticipated success rates of our predictor, jackknife cross-validation test is adopted. The overall accuracy of our approach arrives at 93.12%, with the sensitivity, specificity, and MCC of 94.68%, 91.58%, and 0.863%, respectively.


Assuntos
Cadeias de Markov , Reação em Cadeia da Polimerase/métodos , Máquina de Vetores de Suporte , Humanos , Modelos Teóricos
15.
Opt Express ; 22(26): 31893-8, 2014 Dec 29.
Artigo em Inglês | MEDLINE | ID: mdl-25607157

RESUMO

Characteristics of polarization insensitivity of carrier-induced refractive index change of 1.55 µm tensile-strained multiple quantum well (MQW) are theoretically investigated. A comprehensive MQW model is proposed to effectively extend the application range of previous models. The model considers the temperature variation as well as the nonuniform distribution of injected carrier in MQW. Tensile-strained MQW is expected to achieve polarization insensitivity of carrier-induced refractive index change over a wide wavelength range as temperature varies from 0°C to 40°C, while the magnitude of refractive index change keeps a large value (more than 3 × 10-3). And that the polarization insensitivity of refractive index change can maintain for a wide range of carrier concentration. Multiple quantum well with different material and structure parameters is anticipated to have the similar polarization insensitivity of refractive index change, which shows the design flexibility.


Assuntos
Modelos Teóricos , Refratometria/instrumentação , Refratometria/métodos , Ressonância de Plasmônio de Superfície/instrumentação , Ressonância de Plasmônio de Superfície/métodos , Simulação por Computador , Desenho Assistido por Computador , Desenho de Equipamento , Análise de Falha de Equipamento , Luz , Espalhamento de Radiação , Temperatura , Resistência à Tração
16.
J Theor Biol ; 347: 109-17, 2014 Apr 21.
Artigo em Inglês | MEDLINE | ID: mdl-24412564

RESUMO

In this paper, a dynamic 3-D graphical representation of protein sequences is introduced based on three physical-chemical properties of amino acids. The coordinates of the graph have direct biological significance, which could reflect the innate structure of the proteins. The information of principal moments of inertia and range of axis coordinate are extracted as a novel mixed descriptor and proposed for the comparison of protein primary sequences. Meanwhile, the Euclidean distance of the normalized descriptor vectors which avoid the influence of the difference in length of protein sequences under consideration is employed as a quantitative measurement of the similarity of proteins. Finally, we take the nine ND5 (NADH dehydrogenase subunit 5) proteins for example and illustrate the effectiveness of our approach.


Assuntos
Proteínas/química , Análise de Sequência de Proteína
17.
Comput Biol Med ; 169: 107926, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38183706

RESUMO

Immune checkpoint blockade (ICB) therapy offers promise in the treatment of triple-negative breast cancer (TNBC); however, its limited efficacy in certain TNBC patients poses a challenge. In this study, we elucidated the metabolic mechanism at 'sub-subtype' resolution underlying the non-response to ICB therapy in TNBC. Here, an analytic pipeline was developed to reveal the metabolic heterogeneity, which is correlated with the ICB outcomes, within each immune cell subtype. First, we identified metabolic 'sub-subtypes' within certain cell subtypes, predominantly T cell subsets, which are enriched in ICB non-responders and named as non-responder-enriched (NR-E) clusters. Notably, most of NR-E T metabolic cells exhibit globally higher metabolic activities compared to other cells within the same individual subtype. Further, we investigated the extra-cellular signals that trigger the metabolic status of NR-E T cells. In detail, the prediction of cell-to-cell communication indicated that NR-E T cells are regulated by plasmatic dendritic cells (pDCs) through TNFSF9, as well as by macrophages expressing SIGLEC9. In addition, we also validate the communication between TNFSF9+ pDCs and NR-E T cells utilizing deconvolution of spatial transcriptomics analysis. In summary, our research identified specific metabolic 'sub-subtypes' associated with ICB non-response and uncovered the mechanisms of their regulation in TNBC. And the proposed analytical pipeline can be used to examine metabolic heterogeneity within cell types that correlate with diverse phenotypes.


Assuntos
Neoplasias de Mama Triplo Negativas , Humanos , Análise da Expressão Gênica de Célula Única , Imunoterapia , Perfilação da Expressão Gênica , Macrófagos
18.
BMC Bioinformatics ; 14: 152, 2013 May 04.
Artigo em Inglês | MEDLINE | ID: mdl-23641706

RESUMO

BACKGROUND: Many content-based statistical features of secondary structural elements (CBF-PSSEs) have been proposed and achieved promising results in protein structural class prediction, but until now position distribution of the successive occurrences of an element in predicted secondary structure sequences hasn't been used. It is necessary to extract some appropriate position-based features of the secondary structural elements for prediction task. RESULTS: We proposed some position-based features of predicted secondary structural elements (PBF-PSSEs) and assessed their intrinsic ability relative to the available CBF-PSSEs, which not only offers a systematic and quantitative experimental assessment of these statistical features, but also naturally complements the available comparison of the CBF-PSSEs. We also analyzed the performance of the CBF-PSSEs combined with the PBF-PSSE and further constructed a new combined feature set, PBF11CBF-PSSE. Based on these experiments, novel valuable guidelines for the use of PBF-PSSEs and CBF-PSSEs were obtained. CONCLUSIONS: PBF-PSSEs and CBF-PSSEs have a compelling impact on protein structural class prediction. When combining with the PBF-PSSE, most of the CBF-PSSEs get a great improvement over the prediction accuracies, so the PBF-PSSEs and the CBF-PSSEs have to work closely so as to make significant and complementary contributions to protein structural class prediction. Besides, the proposed PBF-PSSE's performance is extremely sensitive to the choice of parameter k. In summary, our quantitative analysis verifies that exploring the position information of predicted secondary structural elements is a promising way to improve the abilities of protein structural class prediction.


Assuntos
Estrutura Secundária de Proteína , Proteínas/química , Algoritmos , Sequência de Aminoácidos , Dados de Sequência Molecular , Dobramento de Proteína , Proteínas/classificação , Homologia de Sequência de Aminoácidos , Máquina de Vetores de Suporte
19.
J Theor Biol ; 336: 52-60, 2013 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-23876763

RESUMO

Lempel-Ziv complexity has been widely used for sequence comparison and achieved promising results, but until now components' distribution in exhaustive history has not been studied. This paper investigated the whole distribution of LZ-words and presented a novel statistical method for sequence comparison. With the components' length in mind, we revised Lempel-Ziv complexity and obtained various sets of LZ-words. Instead of calculating the LZ-words' contents, we defined a series of set operations on LZ-word set to compare biological sequences. In order to assess the effectiveness of the proposed method, we performed two sets of experiments and compared it with alignment-based methods.


Assuntos
Algoritmos , Homologia de Sequência , Sequência de Bases , Análise por Conglomerados , Coronavirus/classificação , Coronavirus/genética , Genoma Viral , Vírus da Hepatite E/genética , Filogenia
20.
Comput Biol Med ; 166: 107550, 2023 Sep 28.
Artigo em Inglês | MEDLINE | ID: mdl-37826950

RESUMO

Genomic islands are fragments of foreign DNA that are found in bacterial and archaeal genomes, and are typically associated with symbiosis or pathogenesis. While numerous genomic island detection methods have been proposed, there has been limited evaluation of the efficiency of the genome information processing and boundary recognition tools. In this study, we conducted a review of the statistical methods involved in genomic signatures, host signature extraction, informative signature selection, divergence measures, and boundary detection steps in genomic island prediction. We compared the performances of these methods on simulated experiments using alien fragments obtained from both artificial and real genomes. Our results indicate that among the nine genomic signatures evaluated, genomic signature frequency and full probability performed the best. However, their performance declined when normalized to their expectations and variances, such as Z-score and composition vector. Based on our experiments of the E. coli genome, we found that the confidence intervals of the window variances achieved the best performance in the signature extraction of the host, with the best confidence interval being 1.5-2 times the standard error. Ordered kurtosis was most effective in selecting informative signatures from a single genome, without requiring prior knowledge from other datasets. Among the three divergence measures evaluated, the two-sample t-test was the most successful, and a non-overlapping window with a small eye window (size 2) was best suited for identifying compositionally distinct regions. Finally, the maximum of the Markovian Jensen-Shannon divergence score, in terms of GC-content bias, was found to make boundary detection faster while maintaining a similar error rate.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa