Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
1.
BMC Bioinformatics ; 20(1): 261, 2019 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-31113356

RESUMO

BACKGROUND: Pairwise alignment of short DNA sequences with affine-gap scoring is a common processing step performed in a range of bioinformatics analyses. Dynamic programming (i.e. Smith-Waterman algorithm) is widely used for this purpose. Despite using data level parallelisation, pairwise alignment consumes much time. There are faster alignment algorithms but they suffer from the lack of accuracy. RESULTS: In this paper, we present MEM-Align, a fast semi-global alignment algorithm for short DNA sequences that allows for affine-gap scoring and exploit sequence similarity. In contrast to traditional alignment method (such as Smith-Waterman) where individual symbols are aligned, MEM-Align extracts Maximal Exact Matches (MEMs) using a bit-level parallel method and then looks for a subset of MEMs that forms the alignment using a novel dynamic programming method. MEM-Align tries to mimic alignment produced by Smith-Waterman. As a result, for 99.9% of input sequence pair, the computed alignment score is identical to the alignment score computed by Smith-Waterman. Yet MEM-Align is up to 14.5 times faster than the Smith-Waterman algorithm. Fast run-time is achieved by: (a) using a bit-level parallel method to extract MEMs; (b) processing MEMs rather than individual symbols; and, (c) applying heuristics. CONCLUSIONS: MEM-Align is a potential candidate to replace other pairwise alignment algorithms used in processes such as DNA read-mapping and Variant-Calling.


Assuntos
Algoritmos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Nucleotídeos/química
2.
PLoS Comput Biol ; 14(2): e1005772, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29390004

RESUMO

Bioinformatics is recognized as part of the essential knowledge base of numerous career paths in biomedical research and healthcare. However, there is little agreement in the field over what that knowledge entails or how best to provide it. These disagreements are compounded by the wide range of populations in need of bioinformatics training, with divergent prior backgrounds and intended application areas. The Curriculum Task Force of the International Society of Computational Biology (ISCB) Education Committee has sought to provide a framework for training needs and curricula in terms of a set of bioinformatics core competencies that cut across many user personas and training programs. The initial competencies developed based on surveys of employers and training programs have since been refined through a multiyear process of community engagement. This report describes the current status of the competencies and presents a series of use cases illustrating how they are being applied in diverse training contexts. These use cases are intended to demonstrate how others can make use of the competencies and engage in the process of their continuing refinement and application. The report concludes with a consideration of remaining challenges and future plans.


Assuntos
Biologia Computacional/educação , Currículo , Educação de Pós-Graduação , Biologia de Sistemas/educação , Comitês Consultivos , África , Algoritmos , Predisposição Genética para Doença , Illinois , New South Wales , Ohio , Pennsylvania , Software , Inquéritos e Questionários , Reino Unido , Universidades
3.
Appl Microbiol Biotechnol ; 103(2): 903-915, 2019 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-30421108

RESUMO

Quorum sensing (QS) is a cell-to-cell communication that is used by bacteria to regulate collective behaviors. Quorum sensing controls virulence factor production in many bacterial species and it is regarded as an attractive target to combat bacterial pathogenicity, especially against antibiotic-resistant bacteria. Chlorogenic acid (CA), abundant in fruits, vegetables, and Chinese herbs, processes multiple activities. In this research, we explored its quorum sensing quenching activity. In Pseudomonas aeruginosa, CA significantly inhibited the formation of biofilm, the ability of swarming, and virulence factors including protease and elastase activities and rhamnolipid and pyocyanin production. CA showed similar inhibitory effects in Chromobacterium violaceum on its biofilm formation, swarming motility, chitinolytic activity and violacein production. We examined the expression of QS-related genes in P.aeruginosa  and found these genes were all downregulated by CA treatment. Computational modeling revealed that CA can form hydrogen bonds with all three QS receptors. Caenorhabditis elegans and mouse infection models were employed to explore the anti-virulence ability of CA and its effect on pathogenesis process in vivo. CA extended the survival period and reduced the quantity of P. aeruginosa in nematode gut, showing a moderate protective effect on C. elegans. In mice wound model, CA-treated groups showed an accelerating healing rate and the bacteria number in wound area was also decreased by CA treatment. It is suggested by our research that CA has potential to be used as an anti-virulence factor in P. aeruginosa infection.


Assuntos
Antibacterianos/farmacologia , Ácido Clorogênico/farmacologia , Pseudomonas aeruginosa/efeitos dos fármacos , Percepção de Quorum/efeitos dos fármacos , Fatores de Virulência/antagonistas & inibidores , Animais , Biofilmes/efeitos dos fármacos , Biofilmes/crescimento & desenvolvimento , Caenorhabditis elegans , Chromobacterium/efeitos dos fármacos , Chromobacterium/crescimento & desenvolvimento , Modelos Animais de Doenças , Perfilação da Expressão Gênica , Locomoção/efeitos dos fármacos , Camundongos , Infecções por Pseudomonas/microbiologia , Infecções por Pseudomonas/patologia , Pseudomonas aeruginosa/crescimento & desenvolvimento , Pseudomonas aeruginosa/patogenicidade , Análise de Sobrevida , Resultado do Tratamento , Infecção dos Ferimentos/microbiologia , Infecção dos Ferimentos/patologia
4.
Bioinformatics ; 33(7): 964-970, 2017 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-27993787

RESUMO

MOTIVATION: The Variant Call Format (VCF) is widely used to store data about genetic variation. Variant calling workflows detect potential variants in large numbers of short sequence reads generated by DNA sequencing and report them in VCF format. To evaluate the accuracy of variant callers, it is critical to correctly compare their output against a reference VCF file containing a gold standard set of variants. However, comparing VCF files is a complicated task as an individual genomic variant can be represented in several different ways and is therefore not necessarily reported in a unique way by different software. RESULTS: We introduce a VCF normalization method called Best Alignment Normalisation (BAN) that results in more accurate VCF file comparison. BAN applies all the variations in a VCF file to the reference genome to create a sample genome, and then recalls the variants by aligning this sample genome back with the reference genome. Since the purpose of BAN is to get an accurate result at the time of VCF comparison, we define a better normalization method as the one resulting in less disagreement between the outputs of different VCF comparators. AVAILABILITY AND IMPLEMENTATION: The BAN Linux bash script along with required software are publicly available on https://sites.google.com/site/banadf16. CONTACT: A.Bayat@unsw.edu.au. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Variação Genética , Software , Algoritmos , Alelos , Sequência de Bases , Genômica/métodos , Humanos , Alinhamento de Sequência , Análise de Sequência de DNA
5.
Bioinformatics ; 31(1): 140-2, 2015 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-25189782

RESUMO

SUMMARY: Rapid technological advances have led to an explosion of biomedical data in recent years. The pace of change has inspired new collaborative approaches for sharing materials and resources to help train life scientists both in the use of cutting-edge bioinformatics tools and databases and in how to analyse and interpret large datasets. A prototype platform for sharing such training resources was recently created by the Bioinformatics Training Network (BTN). Building on this work, we have created a centralized portal for sharing training materials and courses, including a catalogue of trainers and course organizers, and an announcement service for training events. For course organizers, the portal provides opportunities to promote their training events; for trainers, the portal offers an environment for sharing materials, for gaining visibility for their work and promoting their skills; for trainees, it offers a convenient one-stop shop for finding suitable training resources and identifying relevant training events and activities locally and worldwide. AVAILABILITY AND IMPLEMENTATION: http://mygoblet.org/training-portal.


Assuntos
Biologia Computacional/educação , Currículo , Sistemas de Gerenciamento de Base de Dados , Pesquisadores/educação , Ensino , Humanos , Linguagens de Programação , Design de Software
6.
BMC Bioinformatics ; 15 Suppl 16: S5, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25521061

RESUMO

BACKGROUND: A pharmacophore model consists of a group of chemical features arranged in three-dimensional space that can be used to represent the biological activities of the described molecules. Clustering of molecular interactions of ligands on the basis of their pharmacophore similarity provides an approach for investigating how diverse ligands can bind to a specific receptor site or different receptor sites with similar or dissimilar binding affinities. However, efficient clustering of pharmacophore models in three-dimensional space is currently a challenge. RESULTS: We have developed a pharmacophore-assisted Iterative Closest Point (ICP) method that is able to group pharmacophores in a manner relevant to their biochemical properties, such as binding specificity etc. The implementation of the method takes pharmacophore files as input and produces distance matrices. The method integrates both alignment-dependent and alignment-independent concepts. CONCLUSIONS: We apply our three-dimensional pharmacophore clustering method to two sets of experimental data, including 31 globulin-binding steroids and 4 groups of selected antibody-antigen complexes. Results are translated from distance matrices to Newick format and visualised using dendrograms. For the steroid dataset, the resulting classification of ligands shows good correspondence with existing classifications. For the antigen-antibody datasets, the classification of antigens reflects both antigen type and binding antibody. Overall the method runs quickly and accurately for classifying the data based on their binding affinities or antigens.


Assuntos
Globulinas/química , Esteroides/química , Complexo Antígeno-Anticorpo , Sítios de Ligação , Análise por Conglomerados , Bases de Dados de Compostos Químicos , Globulinas/metabolismo , Humanos , Modelos Moleculares , Estrutura Molecular , Filogenia , Ligação Proteica , Esteroides/metabolismo
8.
J Immunol ; 188(3): 1333-40, 2012 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-22205028

RESUMO

The existence of many highly similar genes in the lymphocyte receptor gene loci makes them difficult to investigate, and the determination of phased "haplotypes" has been particularly problematic. However, V(D)J gene rearrangements provide an opportunity to infer the association of Ig genes along the chromosomes. The chromosomal distribution of H chain genes in an Ig genotype can be inferred through analysis of VDJ rearrangements in individuals who are heterozygous at points within the IGH locus. We analyzed VDJ rearrangements from 44 individuals for whom sufficient unique rearrangements were available to allow comprehensive genotyping. Nine individuals were identified who were heterozygous at the IGHJ6 locus and for whom sufficient suitable VDJ rearrangements were available to allow comprehensive haplotyping. Each of the 18 resulting IGHV│IGHD│IGHJ haplotypes was unique. Apparent deletion polymorphisms were seen that involved as many as four contiguous, functional IGHV genes. Two deletion polymorphisms involving multiple contiguous IGHD genes were also inferred. Three previously unidentified gene duplications were detected, where two sequences recognized as allelic variants of a single gene were both inferred to be on a single chromosome. Phased genomic data brings clarity to the study of the contribution of each gene to the available repertoire of rearranged VDJ genes. Analysis of rearrangement frequencies suggests that particular genes may have substantially different yet predictable propensities for rearrangement within different haplotypes. Together with data highlighting the extent of haplotypic variation within the population, this suggests that there may be substantial variability in the available Ab repertoires of different individuals.


Assuntos
Haplótipos , Cadeias Pesadas de Imunoglobulinas/genética , Região Variável de Imunoglobulina/genética , Recombinação V(D)J/genética , Rearranjo Gênico , Genes de Imunoglobulinas , Loci Gênicos , Genótipo , Heterozigoto , Humanos , Polimorfismo Genético
9.
BMC Bioinformatics ; 14: 249, 2013 Aug 16.
Artigo em Inglês | MEDLINE | ID: mdl-23947436

RESUMO

BACKGROUND: Candidate disease gene prediction is a rapidly developing area of bioinformatics research with the potential to deliver great benefits to human health. As experimental studies detecting associations between genetic intervals and disease proliferate, better bioinformatic techniques that can expand and exploit the data are required. DESCRIPTION: Gentrepid is a web resource which predicts and prioritizes candidate disease genes for both Mendelian and complex diseases. The system can take input from linkage analysis of single genetic intervals or multiple marker loci from genome-wide association studies. The underlying database of the Gentrepid tool sources data from numerous gene and protein resources, taking advantage of the wealth of biological information available. Using known disease gene information from OMIM, the system predicts and prioritizes disease gene candidates that participate in the same protein pathways or share similar protein domains. Alternatively, using an ab initio approach, the system can detect enrichment of these protein annotations without prior knowledge of the phenotype. CONCLUSIONS: The system aims to integrate the wealth of protein information currently available with known and novel phenotype/genotype information to acquire knowledge of biological mechanisms underpinning disease. We have updated the system to facilitate analysis of GWAS data and the study of complex diseases. Application of the system to GWAS data on hypertension using the ICBP data is provided as an example. An interesting prediction is a ZIP transporter additional to the one found by the ICBP analysis. The webserver URL is https://www.gentrepid.org/.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla , Internet , Humanos , Fenótipo
12.
Immunogenetics ; 64(1): 3-14, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21789596

RESUMO

We have analysed the transcribed immunoglobulin kappa (IGK) repertoire of peripheral blood B cells from four individuals from two genetically distinct populations, Papua New Guinean and Australian, using high-throughput DNA sequencing. The depth of sequencing data for each individual averaged 5,548 high-quality IGK reads, and permitted genotyping of the inferred IGKV and IGKJ germline gene segments for each individual. All individuals were homozygous at each IGKJ locus and had highly similar inferred IGKV genotypes. Preferential gene usage was seen at both the IGKV and IGKJ loci, but only IGKV segment usage varied significantly between individuals. Despite the differences in IGKV gene utilisation, the rearranged IGK repertoires showed extensive identity at the amino acid level. Public rearrangements (those shared by two or more individuals) made up 60.2% of the total sequenced IGK rearrangements. The total diversity of IGK rearrangements of each individual was estimated to range from just 340 to 549 unique amino acid sequences. Thus, the repertoire of unique expressed IGK rearrangements is dramatically less than previous theoretical estimates of IGK diversity, and the majority of expressed IGK rearrangements are likely to be extensively shared in individual human beings.


Assuntos
Linfócitos B/imunologia , Rearranjo Gênico do Linfócito B , Cadeias kappa de Imunoglobulina/genética , Alelos , Austrália , Genética Populacional , Genótipo , Humanos , Cadeias kappa de Imunoglobulina/imunologia , Papua Nova Guiné
15.
J Immunol ; 184(12): 6986-92, 2010 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-20495067

RESUMO

Individual variation in the Ig germline gene repertoire leads to individual differences in the combinatorial diversity of the Ab repertoire, but the study of such variation has been problematic. The application of high-throughput DNA sequencing to the study of rearranged Ig genes now makes this possible. The sequencing of thousands of VDJ rearrangements from an individual, either from genomic DNA or expressed mRNA, should allow their germline IGHV, IGHD, and IGHJ repertoires to be inferred. In addition, where previously mere glimpses of diversity could be gained from sequencing studies, new large data sets should allow the rearrangement frequency of different genes and alleles to be seen with clarity. We analyzed the DNA of 108,210 human IgH chain rearrangements from 12 individuals and determined their individual IGH genotypes. The number of reportedly functional IGHV genes and allelic variants ranged from 45 to 60, principally because of variable levels of gene heterozygosity, and included 14 previously unreported IGHV polymorphisms. New polymorphisms of the IGHD3-16 and IGHJ6 genes were also seen. At heterozygous loci, remarkably different rearrangement frequencies were seen for the various IGHV alleles, and these frequencies were consistent between individuals. The specific alleles that make up an individual's Ig genotype may therefore be critical in shaping the combinatorial repertoire. The extent of genotypic variation between individuals is highlighted by an individual with aplastic anemia who appears to lack six contiguous IGHD genes on both chromosomes. These deletions significantly alter the potential expressed IGH repertoire, and possibly immune function, in this individual.


Assuntos
Genes de Cadeia Pesada de Imunoglobulina , Região Variável de Imunoglobulina/genética , Sequência de Bases , Rearranjo Gênico do Linfócito B , Genótipo , Humanos , Dados de Sequência Molecular , Reação em Cadeia da Polimerase , Polimorfismo Genético
16.
JMIR Bioinform Biotechnol ; 3(1): e29404, 2022 Oct 28.
Artigo em Inglês | MEDLINE | ID: mdl-38935962

RESUMO

BACKGROUND: The mammalian immune system is able to generate antibodies against a huge variety of antigens, including bacteria, viruses, and toxins. The ultradeep DNA sequencing of rearranged immunoglobulin genes has considerable potential in furthering our understanding of the immune response, but it is limited by the lack of a high-throughput, sequence-based method for predicting the antigen(s) that a given immunoglobulin recognizes. OBJECTIVE: As a step toward the prediction of antibody-antigen binding from sequence data alone, we aimed to compare a range of machine learning approaches that were applied to a collated data set of antibody-antigen pairs in order to predict antibody-antigen binding from sequence data. METHODS: Data for training and testing were extracted from the Protein Data Bank and the Coronavirus Antibody Database, and additional antibody-antigen pair data were generated by using a molecular docking protocol. Several machine learning methods, including the weighted nearest neighbor method, the nearest neighbor method with the BLOSUM62 matrix, and the random forest method, were applied to the problem. RESULTS: The final data set contained 1157 antibodies and 57 antigens that were combined in 5041 antibody-antigen pairs. The best performance for the prediction of interactions was obtained by using the nearest neighbor method with the BLOSUM62 matrix, which resulted in around 82% accuracy on the full data set. These results provide a useful frame of reference, as well as protocols and considerations, for machine learning and data set creation in the prediction of antibody-antigen binding. CONCLUSIONS: Several machine learning approaches were compared to predict antibody-antigen interaction from protein sequences. Both the data set (in CSV format) and the machine learning program (coded in Python) are freely available for download on GitHub.

17.
Immunogenetics ; 63(5): 259-65, 2011 May.
Artigo em Inglês | MEDLINE | ID: mdl-21249354

RESUMO

Complete and accurate knowledge of the genes and allelic variants of the human immunoglobulin gene loci is critical for studies of B cell repertoire development and somatic point mutation, but evidence from studies of VDJ rearrangements suggests that our knowledge of the available immunoglobulin gene repertoire is far from complete. The reported repertoire has changed little over the last 15 years. This is, in part, a consequence of the inefficiencies involved in searching for new members of large, multigenic gene families by cloning and sequencing. The advent of high-throughput sequencing provides a new avenue by which the germline repertoire can be explored. In this report, we describe pyrosequencing studies of the heavy chain IGHV1, IGHV3 and IGHV4 gene subgroups in ten Papua New Guineans. Thousands of 454 reads aligned with complete identity to 51 previously reported functional IGHV genes and allelic variants. A new gene, IGHV3-NL1*01, was identified, which differs from the nearest previously reported gene by 15 nucleotides. Sixteen new IGHV alleles were also identified, 15 of which varied from previously reported functional IGHV genes by between one and four nucleotides, while one sequence appears to be a functional variant of the pseudogene IGHV3-25. BLAST searches suggest that at least six of these new genes are carried within the relatively well-studied populations of North America, Europe or Asia. This study substantially expands the known immunoglobulin gene repertoire and demonstrates that genetic variation of immunoglobulin genes can now be efficiently explored in different human populations using high-throughput pyrosequencing.


Assuntos
Genes de Cadeia Pesada de Imunoglobulina , Testes Genéticos/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Região Variável de Imunoglobulina/genética , Análise de Sequência de DNA/métodos , Alelos , Sequência de Bases , Loci Gênicos , Variação Genética , Humanos , Dados de Sequência Molecular , Família Multigênica , Papua Nova Guiné
18.
Bioinformatics ; 26(24): 3129-30, 2010 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-21036814

RESUMO

MOTIVATION: Immunoglobulin heavy chain genes are formed by recombination of genes randomly selected from sets of IGHV, IGHD and IGHJ genes. Utilities have been developed to identify genes that contribute to observed VDJ rearrangements, but in the absence of datasets of known rearrangements, the evaluation of these utilities is problematic. We have analyzed thousands of VDJ rearrangements from an individual (S22) whose IGHV, IGHD and IGHJ genotype can be inferred from the dataset. Knowledge of this genotype means that the Stanford_S22 dataset can serve to benchmark the performance of IGH alignment utilities. RESULTS: We evaluated the performance of seven utilities. Failure to partition a sequence into genes present in the S22 genome was considered an error, and error rates for different utilities ranged from 7.1% to 13.7%. AVAILABILITY: Supplementary data includes the S22 genotypes and alignments. The Stanford_S22 dataset and an evaluation tool is available at http://www.emi.unsw.edu.au/~ihmmune/IGHUtilityEval/.


Assuntos
Rearranjo Gênico de Cadeia Pesada de Linfócito B , Genes de Cadeia Pesada de Imunoglobulina , Alinhamento de Sequência/métodos , Benchmarking , Genótipo , Humanos , Análise de Sequência de DNA
19.
BMC Genet ; 12: 98, 2011 Nov 13.
Artigo em Inglês | MEDLINE | ID: mdl-22077927

RESUMO

BACKGROUND: Genome-wide association studies (GWAS) aim to identify causal variants and genes for complex disease by independently testing a large number of SNP markers for disease association. Although genes have been implicated in these studies, few utilise the multiple-hit model of complex disease to identify causal candidates. A major benefit of multi-locus comparison is that it compensates for some shortcomings of current statistical analyses that test the frequency of each SNP in isolation for the phenotype population versus control. RESULTS: Here we developed and benchmarked several protocols for GWAS data analysis using different in-silico gene prediction and prioritisation methodologies. We adopted a high sensitivity approach to the data, using less conservative statistical SNP associations. Multiple gene search spaces, either of fixed-widths or proximity-based, were generated around each SNP marker. We used the candidate disease gene prediction system Gentrepid to identify candidates based on shared biomolecular pathways or domain-based protein homology. Predictions were made either with phenotype-specific known disease genes as input; or without a priori knowledge, by exhaustive comparison of genes in distinct loci. Because Gentrepid uses biomolecular data to find interactions and common features between genes in distinct loci of the search spaces, it takes advantage of the multi-locus aspect of the data. CONCLUSIONS: Results suggest testing multiple SNP-to-gene search spaces compensates for differences in phenotypes, populations and SNP platforms. Surprisingly, domain-based homology information was more informative when benchmarked against gene candidates reported by GWA studies compared to previously determined disease genes, possibly suggesting a larger contribution of gene homologs to complex diseases than Mendelian diseases.


Assuntos
Doença/genética , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Bases de Dados Genéticas , Bases de Dados de Proteínas , Humanos , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA