Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
J Synchrotron Radiat ; 2024 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-39007825

RESUMO

The ID10 beamline of the SESAME (Synchrotron-light for Experimental Science and Applications in the Middle East) synchrotron light source in Jordan was inaugurated in June 2023 and is now open to scientific users. The beamline, which was designed and installed within the European Horizon 2020 project BEAmline for Tomography at SESAME (BEATS), provides full-field X-ray radiography and microtomography imaging with monochromatic or polychromatic X-rays up to photon energies of 100 keV. The photon source generated by a 2.9 T wavelength shifter with variable gap, and a double-multilayer monochromator system allow versatile application for experiments requiring either an X-ray beam with high intensity and flux, and/or a partially spatial coherent beam for phase-contrast applications. Sample manipulation and X-ray detection systems are designed to allow scanning samples with different size, weight and material, providing image voxel sizes from 13 µm down to 0.33 µm. A state-of-the-art computing infrastructure for data collection, three-dimensional (3D) image reconstruction and data analysis allows the visualization and exploration of results online within a few seconds from the completion of a scan. Insights from 3D X-ray imaging are key to the investigation of specimens from archaeology and cultural heritage, biology and health sciences, materials science and engineering, earth, environmental sciences and more. Microtomography scans and preliminary results obtained at the beamline demonstrate that the new beamline ID10-BEATS expands significantly the range of scientific applications that can be targeted at SESAME.

2.
J Chem Phys ; 157(18): 184903, 2022 Nov 14.
Artigo em Inglês | MEDLINE | ID: mdl-36379782

RESUMO

Despite the modern advances in the available computational resources, the length and time scales of the physical systems that can be studied in full atomic detail, via molecular simulations, are still limited. To overcome such limitations, coarse-grained (CG) models have been developed to reduce the dimensionality of the physical system under study. However, to study such systems at the atomic level, it is necessary to re-introduce the atomistic details into the CG description. Such an ill-posed mathematical problem is typically treated via numerical algorithms, which need to balance accuracy, efficiency, and general applicability. Here, we introduce an efficient and versatile method for backmapping multi-component CG macromolecules of arbitrary microstructures. By utilizing deep learning algorithms, we train a convolutional neural network to learn structural correlations between polymer configurations at the atomistic and their corresponding CG descriptions, obtained from atomistic simulations. The trained model is then utilized to get predictions of atomistic structures from input CG configurations. As an illustrative example, we apply the convolutional neural network to polybutadiene copolymers of various microstructures, in which each monomer microstructure (i.e., cis-1,4, trans-1,4, and vinyl-1,2) is represented as a different CG particle type. The proposed methodology is transferable over molecular weight and various microstructures. Moreover, starting from a specific single CG configuration with a given microstructure, we show that by modifying its chemistry (i.e., CG particle types), we are able to obtain a set of well equilibrated polymer configurations of different microstructures (chemistry) than the one of the original CG configuration.


Assuntos
Algoritmos , Redes Neurais de Computação , Polímeros
3.
Hum Genomics ; 13(1): 29, 2019 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-31266543

RESUMO

In the original publication of this article [1], the Figure 1 and Figure 2 were wrong. The Figure 1 "Heat map showing the quantity of DNA repair genes, from red to blue in ascending order, per species' genome (numbers at the top of the figure represent the species code that is found in Table 1). Each DNA repair gene pathway was analyzed separately in rows. Radiated species' genomes are richer in DNA repair genes. Analytical data can be found in Additional file 2: Table S2. M mammals, B&R birds and reptiles, BF bony fishes" should be the picture of Figure 2. The figure 2 "Linear regression analysis. The number of DNA repair genes is linearly related to genome size and protein number. As a negative control, we show that genome size is not linearly related with protein number" should be the picture of figure 1.

4.
Hum Genomics ; 13(1): 26, 2019 06 07.
Artigo em Inglês | MEDLINE | ID: mdl-31174607

RESUMO

Adaptive radiation and evolutionary stasis are characterized by very different evolution rates. The main aim of this study was to investigate if any genes have a special role to a high or low evolution rate. The availability of animal genomes permitted comparison of gene content of genomes of 24 vertebrate species that evolved through adaptive radiation (representing high evolutionary rate) and of 20 vertebrate species that are considered as living fossils (representing a slow evolutionary rate or evolutionary stasis). Mammals, birds, reptiles, and bony fishes were included in the analysis. Pathway analysis was performed for genes found to be specific in adaptive radiation or evolutionary stasis respectively. Pathway analysis revealed that DNA repair and cellular response to DNA damage are important (false discovery rate = 8.35 × 10-5; 7.15 × 10-6, respectively) for species evolved through adaptive radiation. This was confirmed by further genetic in silico analysis (p = 5.30 × 10-3). Nucleotide excision repair and base excision repair were the most significant pathways. Additionally, the number of DNA repair genes was found to be linearly related to the genome size and the protein number (proteome) of the 44 animals analyzed (p < 1.00 × 10-4), this being compatible with Drake's rule. This is the first study where radiated and living fossil species have been genetically compared. Evidence has been found that cancer-related genes have a special role in radiated species. Linear association of the number of DNA repair genes with the species genome size has also been revealed. These comparative genetics results can support the idea of punctuated equilibrium evolution.


Assuntos
Reparo do DNA/genética , Evolução Molecular , Genoma/genética , Genômica , Animais , Dano ao DNA/genética , Genes Supressores de Tumor , Tamanho do Genoma/genética , Fenótipo , Filogenia , Vertebrados/classificação , Vertebrados/genética
5.
Alzheimers Dement ; 14(6): 837-842, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-29604264

RESUMO

INTRODUCTION: It is a challenge to find participants for Alzheimer's disease (AD) prevention trials within a short period of time. The European Prevention of Alzheimer's Dementia Registry (EPAD) aims to facilitate recruitment by preselecting subjects from ongoing cohort studies. This article introduces this novel approach. METHODS: A virtual registry, with access to risk factors and biomarkers for AD through minimal data sets of ongoing cohort studies, was set up. RESULTS: To date, ten cohorts have been included in the EPAD. Around 2500 participants have been selected, using variables associated with the risk for AD. Of these, 15% were already recruited in the EPAD longitudinal cohort study, which serves as a trial readiness cohort. DISCUSSION: This study demonstrates that a virtual registry can be used for the preselection of participants for AD studies.


Assuntos
Doença de Alzheimer/prevenção & controle , Ensaios Clínicos como Assunto , Seleção de Pacientes , Sistema de Registros , Idoso , Idoso de 80 Anos ou mais , Biomarcadores , Europa (Continente) , Feminino , Humanos , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , Sintomas Prodrômicos , Fatores de Risco
6.
Nucleic Acids Res ; 43(W1): W589-98, 2015 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-25897122

RESUMO

The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified interface to biomedical databases that are distributed worldwide. The portal provides access to numerous database projects supported by 30 scientific organizations. It includes over 800 different biological datasets spanning genomics, proteomics, model organisms, cancer data, ontology information and more. All resources available through the portal are independently administered and funded by their host organizations. The BioMart data federation technology provides a unified interface to all the available data. The latest version of the portal comes with many new databases that have been created by our ever-growing community. It also comes with better support and extensibility for data analysis and visualization tools. A new addition to our toolbox, the enrichment analysis tool is now accessible through graphical and web service interface. The BioMart community portal averages over one million requests per day. Building on this level of service and the wealth of information that has become available, the BioMart Community Portal has introduced a new, more scalable and cheaper alternative to the large data stores maintained by specialized organizations.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Genômica , Humanos , Internet , Neoplasias/genética , Proteômica
7.
Annu Int Conf IEEE Eng Med Biol Soc ; 2021: 1682-1685, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34891609

RESUMO

The Influenza virus can be considered as one of the most severe viruses that can infect multiple species with often fatal consequences to the hosts. The Hemagglutinin (HA) gene of the virus can be a target for antiviral drug development realised through accurate identification of its sub-types and possible the targeted hosts. This paper focuses on accurately predicting if an Influenza type A virus can infect specific hosts, and more specifically, Human, Avian and Swine hosts, using only the protein sequence of the HA gene. In more detail, we propose encoding the protein sequences into numerical signals using the Hydrophobicity Index and subsequently utilising a Convolutional Neural Network-based predictive model. The Influenza HA protein sequences used in the proposed work are obtained from the Influenza Research Database (IRD). Specifically, complete and unique HA protein sequences were used for avian, human and swine hosts. The data obtained for this work was 17999 human-host proteins, 17667 avian-host proteins and 9278 swine-host proteins. Given this set of collected proteins, the proposed method yields as much as 10% higher accuracy for an individual class (namely, Avian) and 5% higher overall accuracy than in an earlier study. It is also observed that the accuracy for each class in this work is more balanced than what was presented in this earlier study. As the results show, the proposed model can distinguish HA protein sequences with high accuracy whenever the virus under investigation can infect Human, Avian or Swine hosts.


Assuntos
Vírus da Influenza A , Influenza Humana , Animais , Glicoproteínas de Hemaglutininação de Vírus da Influenza/genética , Hemaglutininas , Humanos , Vírus da Influenza A/genética , Redes Neurais de Computação , Suínos
8.
Arch Oral Biol ; 123: 104969, 2021 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-33450640

RESUMO

OBJECTIVES: The objectives are 1) to calculate the position of highly accentuated lines in dental enamel of a group of individuals from Shahr-i-Sokhta, a thriving urban centre in Bronze Age South West Asia; 2) to identify peak frequencies of physiologically stressful periods during early childhood of these individuals; and 3) to relate these peak frequencies to developmental milestones at population level. DESIGN: We analysed highly accentuated lines in the enamel of nine (n = 9) permanent mandibular first molars of nine individuals from the 5th millennium before the present urban and long-distance-trading complex, Shahr-i Sokhta (Iran). Age at death ranged between 4.5 years and 18-20 years. Permanent mandibular first molar enamel begins to mineralise before birth, and is normally completed sometime between 2.1-3.3 years, giving us insight to early childhood physiological stress, the ages at which it occurs, and any peaks in the frequencies in highly accentuated line formation, through histological sections investigated using transmitted light microscopy. RESULTS: Highly accentuated line peak frequencies occur in the sample at c. four, nine, eleven, and twelve months. After 1 year of age, no more peaks occur. CONCLUSION: The peak frequencies coincide with the timing timing of the type of developmental milestones which may have exposed the individuals to an increased pathogen load, injury, or sub-optimal diet. We note similarity in peak timings in the few published, disparate populations, suggest a potential link with attainment of developmental milestones connected with morbidity, and propose reporting standardised statistics to enable exploration of differences between populations in terms of postnatal health-related stress.


Assuntos
Saúde da Criança/história , Esmalte Dentário , Dente Molar , Estresse Fisiológico , Criança , Pré-Escolar , História Antiga , Humanos , Irã (Geográfico) , Mandíbula
9.
Annu Int Conf IEEE Eng Med Biol Soc ; 2017: 1186-1189, 2017 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-29060087

RESUMO

The Influenza type A virus can be considered as one of the most severe viruses that can infect multiple species with often fatal consequences to the hosts. The Haemagglutinin (HA) gene of the virus has the potential to be a target for antiviral drug development realised through accurate identification of its sub-types and possible the targeted hosts. In this paper, to accurately predict if an Influenza type A virus has the capability to infect human hosts, by using only the HA gene, is therefore developed and tested. The predictive model follows three main steps; (i) decoding the protein sequences into numerical signals using EIIP amino acid scale, (ii) analysing these sequences by using Discrete Fourier Transform (DFT) and extracting DFT-based features, (iii) using a predictive model, based on Artificial Neural Networks and using the features generated by DFT. In this analysis, from the Influenza Research Database, 30724, 18236 and 8157 HA protein sequences were collected for Human, Avian and Swine respectively. Given this set of the proteins, the proposed method yielded 97.36% (± 0.04%), 97.26% (± 0.26%), 0.978 (± 0.004), 0.963 (± 0.005) and 0.945 (±0.005) for the training accuracy validation accuracy, precision, recall and Mathews Correlation Coefficient (MCC) respectively, based on a 10-fold cross-validation. The classification model generated by using one of the largest dataset, if not the largest, yields promising results that could lead to early detection of such species and help develop precautionary measurements for possible human infections.


Assuntos
Influenza Humana , Sequência de Aminoácidos , Animais , Aves , Humanos , Vírus da Influenza A , Suínos
10.
Annu Int Conf IEEE Eng Med Biol Soc ; 2016: 3088-3091, 2016 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-28268964

RESUMO

The function of any protein depends directly on its secondary and tertiary structure. Proteins can fold into a three-dimensional shape, which is primarily depended on the arrangement of amino acids in the primary structure. In recent years, with the explosive sequencing of proteins, it is unfeasible to perform detailed experimental studies, as these methodologies are very expensive and time consuming. This leaves the structure of the majority of currently available protein sequences unknown. In this paper, a predictive model is therefore presented for the classification of protein sequence's secondary structures, namely alpha helix and beta sheet. The proteins used throughout this study were collected from the Structural Classification of Proteinsextended (SCOPe) database, which contains manually curated information from proteins with known structure. Two sets of proteins are used for all alpha and all beta protein sequences. The first set comprise of sequences with less than 40% identity, and the second set comprise of proteins with less than 95% identity. The analysis shows a strong connection between the amino acid indices used to convert protein sequences to numerical sequences and proteins' secondary structures. The total classification accuracy for the proposed classifier for the protein sequences with less than 40% identity for amino acid index BIOV880101 and BIOV880102 are 78.49% and 76.40%, respectively. The classification accuracy for sets of protein sequences with less than 95% identity for amino acid index BIOV880101 and BIOV880102 are 88.01% and 85.17%, respectively.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Processamento de Sinais Assistido por Computador , Máquina de Vetores de Suporte , Sequência de Aminoácidos , Estrutura Secundária de Proteína
11.
Annu Int Conf IEEE Eng Med Biol Soc ; 2015: 8181-4, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26738193

RESUMO

In recent years, numerous protein weight matrices have been developed that include physical characteristics of proteins, such as local sequence-structure information, alpha-helix information, secondary structure information and solvent accessibility states. These protein weight matrices are shown to have generally improved protein sequence alignments over classical protein weight matrices, like Point Accepted Mutation (PAM), Blocks of Amino Acid Substitution (BLOSUM), and GONNET matrices, where important limitations have been observe in recent works. In this paper, a novel protein weight matrix is constructed and presented. This protein weight matrix is not considered based on the mutation rate, like PAM or BLOSUM matrices, but on the physicochemical properties of each amino acid. In the literature, over 500 amino acid indices exist, each one representing a unique biological protein feature. For this study, 25 amino acid indices were selected. These amino acid indices represent general and widely accepted features of the amino acids. By using the proposed protein weight matrix the following advantages can be obtained compared to the classical protein weight matrices. The proposed protein weight matrix is not biased to specific groups of protein sequences as the values are calculated from the amino acid indices, and not from the protein sequences. Additionally, for the proposed protein weight matrix, the same matrix can be considered regardless of the protein sequence's homology to be aligned or the mutation rate presented. A correlation to the physical characterisations of the amino acids that the protein weight matrix derived from can be achieved. Different similarity matrices can be generated when different physical characterisations of amino acids are considered.


Assuntos
Proteínas/química , Sequência de Aminoácidos , Aminoácidos , Estrutura Secundária de Proteína , Alinhamento de Sequência
12.
Artigo em Inglês | MEDLINE | ID: mdl-26738065

RESUMO

In recent years, the development of high-throughput sequencing technologies provided an effective way to generate data from entire genomes and test variants from thousands of individuals. The information acquired from analysing the data generated from high-throughput sequencing technologies provided useful insights into applications like whole-exome sequencing and targeted sequencing to discover the genetic cause of complex diseases and drug responses. The Distributed Annotation System (DAS) is one of the proposed solution developed to share and unify biological data from multiple local and remote DAS annotation servers. The researchers can use DAS to request data from federated or centralised databases and integrate them into a unified view. Furthermore, with the use of Reference DAS servers, structural and sequence data can be used to accompany annotation data, for the pursue of new knowledge for a particular feature or region. In this paper, two additional commands, summary and summary-plot commands, to the existing DAS protocol are proposed and implemented. The proposed commands were created in order to give the users the capabilities to request a summary of features for a particular region of interest. The summary command was created in order to extend the capabilities of the current DAS protocol, while the summaryplot command was created to provide a more user-friendly alternative to standard XML DAS responses. Finally, three examples are presented based on the GENCODE annotation data.


Assuntos
Biologia Computacional/métodos , Redes de Comunicação de Computadores , Bases de Dados Genéticas , Cromossomos Humanos Par 1 , Bases de Dados Factuais , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Software
13.
Adv Bioinformatics ; 2015: 909765, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25632276

RESUMO

Complex informational spectrum analysis for protein sequences (CISAPS) and its web-based server are developed and presented. As recent studies show, only the use of the absolute spectrum in the analysis of protein sequences using the informational spectrum analysis is proven to be insufficient. Therefore, CISAPS is developed to consider and provide results in three forms including absolute, real, and imaginary spectrum. Biologically related features to the analysis of influenza A subtypes as presented as a case study in this study can also appear individually either in the real or imaginary spectrum. As the results presented, protein classes can present similarities or differences according to the features extracted from CISAPS web server. These associations are probable to be related with the protein feature that the specific amino acid index represents. In addition, various technical issues such as zero-padding and windowing that may affect the analysis are also addressed. CISAPS uses an expanded list of 611 unique amino acid indices where each one represents a different property to perform the analysis. This web-based server enables researchers with little knowledge of signal processing methods to apply and include complex informational spectrum analysis to their work.

14.
Artigo em Inglês | MEDLINE | ID: mdl-25570082

RESUMO

Current bioinformatics tools accomplish high accuracies in classifying allergenic protein sequences with high homology and generally perform poorly with low homology protein sequences. Although some homologous regions explained Immunoglobulin E (IgE) cross-reactivity in groups of allergens, no universal molecular structure could be associated with allergenicity. In addition, studies have showed that cross-reactivity is not directly linked to the homology between protein sequences. Therefore, a new homology independent method needs to be developed to determine if a protein is an allergen or not. The aim of this study is therefore to differentiate sets of allergenic and non-allergenic proteins using a signal-processing based bioinformatics approach. In this paper, a new method was proposed for characterisation and classification of allergenic protein sequences. For this method hydrophobicity amino acid index was used to encode proteins to numerical sequences and Discrete Fourier Transform to extract features for each protein. Finally, a classifier was constructed based on Support Vector Machines. In order to demonstrate the applicability of the proposed method 857 allergen and 1000 non-allergen proteins were collected from UniProt online database. The results obtained from the proposed method yielded: MCC: 0.752 ± 0.007, Specificity: 0.912 ± 0.005, Sensitivity: 0.835 ± 0.008 and Total Accuracy: 87.65% ± 0.004.


Assuntos
Alérgenos/química , Alérgenos/classificação , Biologia Computacional/métodos , Processamento de Sinais Assistido por Computador , Sequência de Aminoácidos , Bases de Dados de Proteínas , Análise de Sequência de Proteína
15.
Artigo em Inglês | MEDLINE | ID: mdl-25570084

RESUMO

From the literature, existing methods use pairwise percent identity to identify the percentage of similarity between two protein sequences, in order to create a dendrogram. As this is a parametric method of measuring the similarities between proteins, and different parameter may yield different results, this method does not guarantee that the global optimal similarity values will be found. As protein dendrogram construction is used in other areas, such as multiple protein sequence alignments, it is very important that the most related protein sequences to be identified and align first. Furthermore, by using the pairwise percent identity of the protein sequences to construct the dendrograms, the physical characteristics of protein sequences and amino acids are not considered. In this paper, a new method was proposed for constructing protein sequence dendrograms. For this method, Discrete Fourier Transform, was used to construct the distance matrix in combination with the multiple amino acid indices that were used to encode protein sequences into numerical sequences. In order to show the applicability and robustness of the proposed method, a case study was presented by using nine Cluster of Differentiation 4 protein sequences extracted from the UniProt online database.


Assuntos
Sequência de Aminoácidos , Biologia Computacional/métodos , Análise de Fourier , Proteínas/química , Proteínas/classificação , Alinhamento de Sequência/métodos , Algoritmos , Aminoácidos/química
16.
Artigo em Inglês | MEDLINE | ID: mdl-24110625

RESUMO

Protein distance matrix is widely used in various protein sequence analyses, and mainly obtained by using pairwise sequence alignment scores or protein sequence homology, which fail to take into consideration of individual physical characteristics of protein sequences and amino acids, or a combination of these features. In this paper, a new method is therefore proposed for constructing protein distance matrix based on natural amino acid indices in combination with Discrete Fourier Transform (DFT). For the proposed method, protein distance matrices can be generated using any given set of amino acid indices, each one of which represents a unique biological feature of protein sequences. In this study, the results are based on the combination of 25 widely accepted amino acid indices, which produced the best results, according to the biological relationships between proteins. As a case study 26 Cluster of Differentiation 4 (CD4) protein sequences were used in order to construct a distance matrix based on the proposed method. The results show that the pairwise relationship between CD4 protein sequences remain the same in comparison with their pairwise percent identity. For another group of protein sequences the pairwise relationship between CD4 protein sequences dramatically changed with the proposed method in comparison to the pairwise percent identity. The proposed distance matrix has been shown to have a positive impact on these case studies and therefore is expected to be useful in several fields such as multiple protein sequence alignment and phylogenetic analysis, where an accurate distance matrix based on natural generalized protein properties plays an important role.


Assuntos
Aminoácidos/química , Antígenos CD4/química , Análise de Fourier , Análise de Sequência de Proteína/métodos , Algoritmos , Sequência de Aminoácidos , Animais , Humanos , Filogenia
17.
Artigo em Inglês | MEDLINE | ID: mdl-24110375

RESUMO

Neuraminidase (NA) genes of influenza A virus is a highly potential candidate for antiviral drug development that can only be realized through true identification of its sub-types. In this paper, in order to accurately detect the sub-types, a hybrid predictive model is therefore developed and tested over proteins obtained from the four subtypes of the influenza A virus, namely, H1N1, H2N2, H3N2 and H5N1 that caused major pandemics in the twentieth century. The predictive model is built by the following four main steps; (i) decoding the protein sequences into numerical signals by means of EIIP amino acid scale, (ii) analysing these signals (protein sequences) by using Discrete Fourier Transform (DFT) and extracting DFT-based features, (iii) selecting more influential sub-set of the features by using the F-score statistical feature selection method, and finally (iv) building a predictive model on the feature sub-set by using support vector machine classifier. The protein sequences were chosen as to be of high percentage identity that they demonstrate within individual influenza subtype classes and high variation that they display in the percentage identity. This makes the proteins very difficult to distinguish from each other even they belong to different subtypes. Given this set of the proteins, the predictive model yielded 98.3% accuracy based on a 5-fold cross validation. This also results in a twenty feature sub-set that can also help reveal spectral characteristics of the subtypes. The proposed model is promising and can easily be generalized for other similar studies.


Assuntos
Biologia Computacional/métodos , Vírus da Influenza A/genética , Neuraminidase/genética , Processamento de Sinais Assistido por Computador , Sequência de Aminoácidos , Animais , Humanos , Vírus da Influenza A Subtipo H1N1/genética , Vírus da Influenza A Subtipo H2N2/genética , Vírus da Influenza A Subtipo H3N2/genética , Virus da Influenza A Subtipo H5N1/genética , Neuraminidase/química , Análise de Sequência de Proteína , Homologia de Sequência de Aminoácidos , Máquina de Vetores de Suporte
18.
Artigo em Inglês | MEDLINE | ID: mdl-22255450

RESUMO

Signal processing techniques such as Fourier Transform have widely been studied and successfully applied in many different areas. Techniques such as zero-padding and windowing have been developed and found very useful to improve the outcome of the signal processing methods. Resonant Recognition Model (RRM) and Complex Resonant Recognition Model (CRRM) that are based on the discrete Fourier Transform and widely used for the analysis of protein sequences do not consider such methods, which can however improve or alter the features extracted from the protein sequences. Therefore, in this paper, an extensive analysis was carried out to investigate into the influence of the zero-padding and windowing on the features extracted from the Complex Resonant Recognition Model. In order to present such effects, five different classes of influenza A virus Neuraminidase genes, which include H1N1, H1N2, H2N2, H3N2 and H5N1 genes, were used as a case study. For each of the Influenza A subtypes, two sets of Common Frequency Peaks (CFP) were extracted, one where windowing is applied and the other one where windowing is suppressed, for each signal length set for the analysis. In order to make all the signals (protein sequence) the same length, zero-padding was used. The signal lengths used in this study are set to 470, which is the maximum protein length, and also 512, 1024, 2048, 4096, 8192 and 16384 for further analysis. The results suggest that the windowing and zero-padding have key impact on CFP extracted from the Influenza A subtypes as the best match with CFP extracted from influenza A subtypes using CRRM is when the signal length of 4096 and windowing were both applied. Therefore, the outcome of this study should be taken into consideration for more accurate and reliable analysis of the protein sequences.


Assuntos
Algoritmos , Vírus da Influenza A/metabolismo , Reconhecimento Automatizado de Padrão/métodos , Análise de Sequência de Proteína/métodos , Proteínas Virais/química , Proteínas Virais/metabolismo , Sequência de Aminoácidos , Dados de Sequência Molecular , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA