Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 88
Filtrar
1.
J Comput Biol ; 29(1): 19-22, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34985990

RESUMO

Although the availability of various sequencing technologies allows us to capture different genome properties at single-cell resolution, with the exception of a few co-assaying technologies, applying different sequencing assays on the same single cell is impossible. Single-cell alignment using optimal transport (SCOT) is an unsupervised algorithm that addresses this limitation by using optimal transport to align single-cell multiomics data. First, it preserves the local geometry by constructing a k-nearest neighbor (k-NN) graph for each data set (or domain) to capture the intra-domain distances. SCOT then finds a probabilistic coupling matrix that minimizes the discrepancy between the intra-domain distance matrices. Finally, it uses the coupling matrix to project one single-cell data set onto another through barycentric projection, thus aligning them. SCOT requires tuning only two hyperparameters and is robust to the choice of one. Furthermore, the Gromov-Wasserstein distance in the algorithm can guide SCOT's hyperparameter tuning in a fully unsupervised setting when no orthogonal alignment information is available. Thus, SCOT is a fast and accurate alignment method that provides a heuristic for hyperparameter selection in a real-world unsupervised single-cell data alignment scenario. We provide a tutorial for SCOT and make its source code publicly available on GitHub.


Assuntos
Algoritmos , Alinhamento de Sequência/estatística & dados numéricos , Análise de Célula Única/estatística & dados numéricos , Biologia Computacional , Bases de Dados Genéticas/estatística & dados numéricos , Genômica/estatística & dados numéricos , Heurística , Humanos , Redes Neurais de Computação , Análise de Sequência/estatística & dados numéricos , Software , Aprendizado de Máquina não Supervisionado
2.
Clin Transl Med ; 11(11): e589, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34842356

RESUMO

BACKGROUND: Few studies have discussed the contradictory roles of mutated-PI3Kα in HER2-positive (HER2+) breast cancer. Thus, we characterised the adaptive roles of PI3Kα mutations among HER2+ tumour progression. METHODS: We conducted prospective clinical sequencing of 1923 Chinese breast cancer patients and illustrated the clinical significance of PIK3CA mutations in locally advanced and advanced HER2+ cohort. A high-throughput PIK3CA mutations-barcoding screen was performed to reveal impactful mutation sites in tumour growth and drug responses. RESULTS: PIK3CA mutations acted as a protective factor in treatment-naïve patients; however, advanced/locally advanced patients harbouring mutated-PI3Kα exhibited a higher progressive disease rate (100% vs. 15%, p = .000053) and a lower objective response rate (81.7% vs. 95.4%, p = .0008) in response to trastuzumab-based therapy. Meanwhile, patients exhibiting anti-HER2 resistance had a relatively high variant allele fraction (VAF) of PIK3CA mutations; we defined the VAF > 12.23% as a predictor of poor anti-HER2 neoadjuvant treatment efficacy. Pooled mutations screen revealed that specific PI3Kα mutation alleles mediated own biological effects. PIK3CA functional mutations suppressed the growth of HER2+ cells, but conferred anti-HER2 resistance, which can be reversed by the PI3Kα-specific inhibitor BYL719. CONCLUSIONS: We proposed adaptive treatment strategies that the mutated PIK3CA and amplified ERBB2 should be concomitantly inhibited when exposing to continuous anti-HER2 therapy, while the combination of anti-HER2 and anti-PI3Kα treatment was not essential for anti-HER2 treatment-naïve patients. These findings improve the understanding of genomics-guided treatment in the different progressions of HER2+ breast cancer.


Assuntos
Neoplasias da Mama/tratamento farmacológico , Receptor ErbB-2/genética , Análise de Sequência/estatística & dados numéricos , Adaptação Fisiológica/efeitos dos fármacos , Adaptação Fisiológica/genética , Neoplasias da Mama/genética , Neoplasias da Mama/fisiopatologia , China , Estudos de Coortes , Feminino , Humanos , Estudos Prospectivos , Análise de Sequência/métodos
3.
Methods Mol Biol ; 2212: 277-289, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33733362

RESUMO

We report a step-by-step protocol to use pysster, a TensorFlow-based package for building deep neural networks on a broad range of epistatic sequences such as DNA, RNA, or annotated secondary structure sequences. Pysster provides users comprehensive supports for developing, training, and evaluating the self-defined deep neural networks on sequence data. Moreover, pysster allows users to easily visualize the resulting perditions, which is helpful to uncover the "black box" of deep neural networks. Here, we describe a step-by-step application of pysster to classify the RNA A-to-I editing regions and interpret the model predictions. To further demonstrate the generalizability of pysster, we utilized it to build and evaluated a new deep neural network on an artificial epistatic sequence dataset.


Assuntos
Aprendizado Profundo , Epistasia Genética , Modelos Genéticos , RNA/genética , Software , Sequência de Bases , Conjuntos de Dados como Assunto , Humanos , Edição de RNA , Curva ROC , Análise de Sequência/estatística & dados numéricos
4.
Microbiome ; 8(1): 134, 2020 09 16.
Artigo em Inglês | MEDLINE | ID: mdl-32938501

RESUMO

BACKGROUND: Sequencing prokaryotic genomes has revolutionized our understanding of the many roles played by microorganisms. However, the cell and taxon proportions of genome-sequenced bacteria or archaea on earth remain unknown. This study aimed to explore this basic question using large-scale alignment between the sequences released by the Earth Microbiome Project and 155,810 prokaryotic genomes from public databases. RESULTS: Our results showed that the median proportions of the genome-sequenced cells and taxa (at 100% identities in the 16S-V4 region) in different biomes reached 38.1% (16.4-86.3%) and 18.8% (9.1-52.6%), respectively. The sequenced proportions of the prokaryotic genomes in biomes were significantly negatively correlated with the alpha diversity indices, and the proportions sequenced in host-associated biomes were significantly higher than those in free-living biomes. Due to a set of cosmopolitan OTUs that are found in multiple samples and preferentially sequenced, only 2.1% of the global prokaryotic taxa are represented by sequenced genomes. Most of the biomes were occupied by a few predominant taxa with a high relative abundance and much higher genome-sequenced proportions than numerous rare taxa. CONCLUSIONS: These results reveal the current situation of prokaryotic genome sequencing for earth biomes, provide a more reasonable and efficient exploration of prokaryotic genomes, and promote our understanding of microbial ecological functions. Video Abstract.


Assuntos
Planeta Terra , Genoma/genética , Genômica/estatística & dados numéricos , Microbiota/genética , Células Procarióticas/classificação , Células Procarióticas/metabolismo , Análise de Sequência/estatística & dados numéricos , Archaea/classificação , Archaea/genética , Archaea/isolamento & purificação , Bactérias/classificação , Bactérias/genética , Bactérias/isolamento & purificação , Bases de Dados Genéticas , Alinhamento de Sequência
5.
Brief Bioinform ; 20(1): 222-234, 2019 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-29028876

RESUMO

High-throughput sequencing technologies have exposed the possibilities for the in-depth evaluation of T-cell receptor (TCR) repertoires. These studies are highly relevant to gain insights into human adaptive immunity and to decipher the composition and diversity of antigen receptors in physiological and disease conditions. The major objective of TCR sequencing data analysis is the identification of V, D and J gene segments, complementarity-determining region 3 (CDR3) sequence extraction and clonality analysis. With the advancement in sequencing technologies, new TCR analysis approaches and programs have been developed. However, there is still a deficit of systematic comparative studies to assist in the selection of an optimal analysis approach. Here, we present a detailed comparison of 10 state-of-the-art TCR analysis tools on samples with different complexities by taking into account many aspects such as clonotype detection [unique V(D)J combination], CDR3 identification or accuracy in error correction. We used our in silico and experimental data sets with known clonalities enabling the identification of potential tool biases. We also established a new strategy, named clonal plane, which allows quantifying and comparing the clonality of multiple samples. Our results provide new insights into the effect of method selection on analysis results, and it will assist users in the selection of an appropriate analysis method.


Assuntos
Receptores de Antígenos de Linfócitos T/genética , Sequência de Aminoácidos , Sequência de Bases , Biologia Computacional/métodos , Simulação por Computador , Bases de Dados Genéticas/estatística & dados numéricos , Células HeLa , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Células Jurkat , Análise de Sequência/estatística & dados numéricos , Linfócitos T/imunologia
6.
Brief Bioinform ; 20(4): 1280-1294, 2019 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-29272359

RESUMO

With the avalanche of biological sequences generated in the post-genomic age, one of the most challenging problems is how to computationally analyze their structures and functions. Machine learning techniques are playing key roles in this field. Typically, predictors based on machine learning techniques contain three main steps: feature extraction, predictor construction and performance evaluation. Although several Web servers and stand-alone tools have been developed to facilitate the biological sequence analysis, they only focus on individual step. In this regard, in this study a powerful Web server called BioSeq-Analysis (http://bioinformatics.hitsz.edu.cn/BioSeq-Analysis/) has been proposed to automatically complete the three main steps for constructing a predictor. The user only needs to upload the benchmark data set. BioSeq-Analysis can generate the optimized predictor based on the benchmark data set, and the performance measures can be reported as well. Furthermore, to maximize user's convenience, its stand-alone program was also released, which can be downloaded from http://bioinformatics.hitsz.edu.cn/BioSeq-Analysis/download/, and can be directly run on Windows, Linux and UNIX. Applied to three sequence analysis tasks, experimental results showed that the predictors generated by BioSeq-Analysis even outperformed some state-of-the-art methods. It is anticipated that BioSeq-Analysis will become a useful tool for biological sequence analysis.


Assuntos
Aprendizado de Máquina , Análise de Sequência/métodos , Software , Algoritmos , Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Bases de Dados de Proteínas/estatística & dados numéricos , Humanos , Internet , Análise de Sequência/estatística & dados numéricos , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos , Análise de Sequência de RNA/métodos
7.
Bioinformatics ; 34(16): 2870-2878, 2018 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-29608657

RESUMO

Motivation: Although seldom acknowledged explicitly, count data generated by sequencing platforms exist as compositions for which the abundance of each component (e.g. gene or transcript) is only coherently interpretable relative to other components within that sample. This property arises from the assay technology itself, whereby the number of counts recorded for each sample is constrained by an arbitrary total sum (i.e. library size). Consequently, sequencing data, as compositional data, exist in a non-Euclidean space that, without normalization or transformation, renders invalid many conventional analyses, including distance measures, correlation coefficients and multivariate statistical models. Results: The purpose of this review is to summarize the principles of compositional data analysis (CoDA), provide evidence for why sequencing data are compositional, discuss compositionally valid methods available for analyzing sequencing data, and highlight future directions with regard to this field of study. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Análise de Sequência , Biblioteca Gênica , Humanos , Modelos Estatísticos , Análise de Sequência/estatística & dados numéricos
8.
Brief Bioinform ; 15(3): 369-75, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24162172

RESUMO

Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.


Assuntos
Biologia Computacional/métodos , Análise de Sequência/métodos , Biologia Computacional/tendências , Fractais , Modelos Estatísticos , Dinâmica não Linear , Alinhamento de Sequência , Análise de Sequência/estatística & dados numéricos
9.
Brief Bioinform ; 15(3): 343-53, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24064230

RESUMO

With the development of next-generation sequencing (NGS) technologies, a large amount of short read data has been generated. Assembly of these short reads can be challenging for genomes and metagenomes without template sequences, making alignment-based genome sequence comparison difficult. In addition, sequence reads from NGS can come from different regions of various genomes and they may not be alignable. Sequence signature-based methods for genome comparison based on the frequencies of word patterns in genomes and metagenomes can potentially be useful for the analysis of short reads data from NGS. Here we review the recent development of alignment-free genome and metagenome comparison based on the frequencies of word patterns with emphasis on the dissimilarity measures between sequences, the statistical power of these measures when two sequences are related and the applications of these measures to NGS data.


Assuntos
Biologia Computacional/métodos , Análise de Sequência/métodos , Algoritmos , Biologia Computacional/tendências , Genômica/métodos , Genômica/estatística & dados numéricos , Sequenciamento de Nucleotídeos em Larga Escala , Cadeias de Markov , Modelos Estatísticos , Alinhamento de Sequência , Análise de Sequência/estatística & dados numéricos
10.
Brief Bioinform ; 15(3): 376-89, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24058049

RESUMO

Information theory (IT) addresses the analysis of communication systems and has been widely applied in molecular biology. In particular, alignment-free sequence analysis and comparison greatly benefited from concepts derived from IT, such as entropy and mutual information. This review covers several aspects of IT applications, ranging from genome global analysis and comparison, including block-entropy estimation and resolution-free metrics based on iterative maps, to local analysis, comprising the classification of motifs, prediction of transcription factor binding sites and sequence characterization based on linguistic complexity and entropic profiles. IT has also been applied to high-level correlations that combine DNA, RNA or protein features with sequence-independent properties, such as gene mapping and phenotype analysis, and has also provided models based on communication systems theory to describe information transmission channels at the cell level and also during evolutionary processes. While not exhaustive, this review attempts to categorize existing methods and to indicate their relation with broader transversal topics such as genomic signatures, data compression and complexity, time series analysis and phylogenetic classification, providing a resource for future developments in this promising area.


Assuntos
Biologia Computacional/métodos , Teoria da Informação , Análise de Sequência/métodos , Sítios de Ligação/genética , Genômica/métodos , Genômica/estatística & dados numéricos , Humanos , Modelos Estatísticos , Dinâmica não Linear , Filogenia , Saccharomyces cerevisiae/genética , Alinhamento de Sequência , Análise de Sequência/estatística & dados numéricos , Software , Fatores de Transcrição/metabolismo
11.
Brief Bioinform ; 15(3): 354-68, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24096012

RESUMO

With the massive production of genomic and proteomic data, the number of available biological sequences in databases has reached a level that is not feasible anymore for exact alignments even when just a fraction of all sequences is used. To overcome this inevitable time complexity, ultrafast alignment-free methods are studied. Within the past two decades, a broad variety of nonalignment methods have been proposed including dissimilarity measures on classical representations of sequences like k-words or Markov models. Furthermore, articles were published that describe distance measures on alternative representations such as compression complexity, spectral time series or chaos game representation. However, alignments are still the standard method for real world applications in biological sequence analysis, and the time efficient alignment-free approaches are usually applied in cases when the accustomed algorithms turn out to fail or be too inconvenient.


Assuntos
Biologia Computacional/métodos , Reconhecimento Automatizado de Padrão/métodos , Análise de Sequência/métodos , Genômica/estatística & dados numéricos , Cadeias de Markov , Modelos Estatísticos , Filogenia , Proteômica/estatística & dados numéricos , Alinhamento de Sequência , Análise de Sequência/estatística & dados numéricos , Software
12.
Pac Symp Biocomput ; : 320-31, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23424137

RESUMO

We have developed a novel approach called ChIPModule to systematically discover transcription factors and their cofactors from ChIP-seq data. Given a ChIP-seq dataset and the binding patterns of a large number of transcription factors, ChIPModule can efficiently identify groups of transcription factors, whose binding sites significantly co-occur in the ChIP-seq peak regions. By testing ChIPModule on simulated data and experimental data, we have shown that ChIPModule identifies known cofactors of transcription factors, and predicts new cofactors that are supported by literature. ChIPModule provides a useful tool for studying gene transcriptional regulation.


Assuntos
Imunoprecipitação da Cromatina/estatística & dados numéricos , Análise de Sequência/estatística & dados numéricos , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Sítios de Ligação/genética , Biologia Computacional , Bases de Dados Genéticas/estatística & dados numéricos , Humanos
13.
Pac Symp Biocomput ; : 356-67, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23424140

RESUMO

Human genetics recently transitioned from GWAS to studies based on NGS data. For GWAS, small effects dictated large sample sizes, typically made possible through meta-analysis by exchanging summary statistics across consortia. NGS studies groupwise-test for association of multiple potentially-causal alleles along each gene. They are subject to similar power constraints and therefore likely to resort to meta-analysis as well. The problem arises when considering privacy of the genetic information during the data-exchange process. Many scoring schemes for NGS association rely on the frequency of each variant thus requiring the exchange of identity of the sequenced variant. As such variants are often rare, potentially revealing the identity of their carriers and jeopardizing privacy. We have thus developed MetaSeq, a protocol for meta-analysis of genome-wide sequencing data by multiple collaborating parties, scoring association for rare variants pooled per gene across all parties. We tackle the challenge of tallying frequency counts of rare, sequenced alleles, for metaanalysis of sequencing data without disclosing the allele identity and counts, thereby protecting sample identity. This apparent paradoxical exchange of information is achieved through cryptographic means. The key idea is that parties encrypt identity of genes and variants. When they transfer information about frequency counts in cases and controls, the exchanged data does not convey the identity of a mutation and therefore does not expose carrier identity. The exchange relies on a 3rd party, trusted to follow the protocol although not trusted to learn about the raw data. We show applicability of this method to publicly available exome-sequencing data from multiple studies, simulating phenotypic information for powerful meta-analysis. The MetaSeq software is publicly available as open source.


Assuntos
Privacidade Genética , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Metanálise como Assunto , Biologia Computacional , Segurança Computacional/estatística & dados numéricos , Simulação por Computador , Frequência do Gene , Humanos , Análise de Sequência/estatística & dados numéricos , Software
14.
Brief Bioinform ; 14(2): 193-202, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-22445902

RESUMO

The advent of second-generation sequencing (2GS) has provided a range of significant new challenges for the visualization of sequence assemblies. These include the large volume of data being generated, short-read lengths and different data types and data formats associated with the diversity of new sequencing technologies. This article illustrates how Tablet-a high-performance graphical viewer for visualization of 2GS assemblies and read mappings-plays an important role in the analysis of these data. We present Tablet, and through a selection of use cases, demonstrate its value in quality assurance and scientific discovery, through features such as whole-reference coverage overviews, variant highlighting, paired-end read mark-up, GFF3-based feature tracks and protein translations. We discuss the computing and visualization techniques utilized to provide a rich and responsive graphical environment that enables users to view a range of file formats with ease. Tablet installers can be freely downloaded from http://bioinf.hutton.ac.uk/tablet in 32 or 64-bit versions for Windows, OS X, Linux or Solaris. For further details on the Tablet, contact tablet@hutton.ac.uk.


Assuntos
Gráficos por Computador , Apresentação de Dados , Bases de Dados Genéticas/estatística & dados numéricos , Animais , Biologia Computacional , Genômica/estatística & dados numéricos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Internet , Análise de Sequência/estatística & dados numéricos , Software
15.
PLoS Comput Biol ; 8(6): e1002541, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22685393

RESUMO

We provide a novel method, DRISEE (duplicate read inferred sequencing error estimation), to assess sequencing quality (alternatively referred to as "noise" or "error") within and/or between sequencing samples. DRISEE provides positional error estimates that can be used to inform read trimming within a sample. It also provides global (whole sample) error estimates that can be used to identify samples with high or varying levels of sequencing error that may confound downstream analyses, particularly in the case of studies that utilize data from multiple sequencing samples. For shotgun metagenomic data, we believe that DRISEE provides estimates of sequencing error that are more accurate and less constrained by technical limitations than existing methods that rely on reference genomes or the use of scores (e.g. Phred). Here, DRISEE is applied to (non amplicon) data sets from both the 454 and Illumina platforms. The DRISEE error estimate is obtained by analyzing sets of artifactual duplicate reads (ADRs), a known by-product of both sequencing platforms. We present DRISEE as an open-source, platform-independent method to assess sequencing error in shotgun metagenomic data, and utilize it to discover previously uncharacterized error in de novo sequence data from the 454 and Illumina sequencing platforms.


Assuntos
Metagenômica/estatística & dados numéricos , Análise de Sequência/estatística & dados numéricos , Biologia Computacional , Interpretação Estatística de Dados , Genômica/estatística & dados numéricos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos
16.
Pac Symp Biocomput ; : 259-70, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22174281

RESUMO

Homology-based approaches are often used for the annotation of microbial communities, providing functional profiles that are used to characterize and compare the content and the functionality of microbial communities. Metagenomic reads are the starting data for these studies, however considerable differences are observed between the functional profiles-built from sequencing reads produced by different sequencing techniques-for even the same microbial community. Using simulation experiments, we show that such functional differences are likely to be caused by the actual difference in read lengths, and are not the results of a sampling bias of the sequencing techniques. Furthermore, the functional differences derived from different sequencing techniques cannot be fully explained by the read-count bias, i.e. 1) the higher fraction of unannotated shorter reads (i.e., "read length matters"), and 2) the different lengths of proteins in different functional categories. Instead, we show here that specific functional categories are under-annotated, because similarity-search-based functional annotation tools tend to miss more reads from functional categories that contain less conserved genes/proteins. In addition, the accuracy of functional annotation of short reads for different functions varies, further skewing the functional profiles. To address these issues, we present a simple yet efficient method to improve the frequency estimates of different functional categories in the functional profiles of metagenomes, based on the functional annotation of simulated reads from complete microbial genomes.


Assuntos
Metagenômica/estatística & dados numéricos , Microbiota/genética , Análise de Sequência/estatística & dados numéricos , Animais , Bactérias/genética , Bactérias/isolamento & purificação , Proteínas de Bactérias/classificação , Proteínas de Bactérias/genética , Biologia Computacional , Fezes/microbiologia , Camundongos , Obesidade/microbiologia , Magreza/microbiologia
18.
Adv Exp Med Biol ; 680: 411-7, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20865526

RESUMO

Efforts have been devoted to accelerating the construction of suffix trees. However, little attention has been given to post-construction operations on suffix trees. Therefore, we investigate the effects of improved spatial locality on certain post-construction operations on suffix trees. We used a maximal exact repeat finding algorithm, MERF, on which software REPuter is based, as an example, and conducted experiments on the 16 chromosomes of the yeast Saccharomyces cerevisiae. Two versions of suffix trees were customized for the algorithm and two variants of MERF were implemented accordingly. We showed that in all cases, the optimal cache-oblivious MERF is faster and displays consistently lower cache miss rates than their non-optimized counterparts.


Assuntos
Algoritmos , Análise de Sequência/estatística & dados numéricos , Cromossomos Fúngicos/genética , Biologia Computacional , Genoma Fúngico , Sequências Repetitivas de Ácido Nucleico , Saccharomyces cerevisiae/genética , Software
19.
Adv Exp Med Biol ; 680: 693-700, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20865556

RESUMO

Next Generation Sequencing technologies are limited by the lack of standard bioinformatics infrastructures that can reduce data storage, increase data processing performance, and integrate diverse information. HDF technologies address these requirements and have a long history of use in data-intensive science communities. They include general data file formats, libraries, and tools for working with the data. Compared to emerging standards, such as the SAM/BAM formats, HDF5-based systems demonstrate significantly better scalability, can support multiple indexes, store multiple data types, and are self-describing. For these reasons, HDF5 and its BioHDF extension are well suited for implementing data models to support the next generation of bioinformatics applications.


Assuntos
Alinhamento de Sequência/estatística & dados numéricos , Análise de Sequência/estatística & dados numéricos , Biologia Computacional , Simulação por Computador , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Alinhamento de Sequência/normas , Alinhamento de Sequência/tendências , Análise de Sequência/normas , Análise de Sequência/tendências , Software/normas , Software/tendências , Design de Software , Interface Usuário-Computador
20.
J Mol Biol ; 396(5): 1439-50, 2010 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-20043919

RESUMO

Chimeric, humanized and human antibodies have successively been exploited as therapeutics because their increasing human ('self') character is expected to correspond with decreased immunogenicity, which is critical for their clinical development. Thus, humanness has been inferred to predict antibody immunogenicity. Humanness of antibody variable regions (V-regions) has recently been studied using a parameter (here referred to as the H-score) that evaluates similarity to expressed human sequences. Macaque (Macaca fascicularis) antibody sequences are of particular interest because they have been suggested to have extremely human-like character and, recently, macaque single-chain variable fragments with very high affinity for various antigens have been isolated. In this study, the H-scores of all macaque antibody V-regions available in sequence data banks were compared with those of their human counterparts using statistical tests. The results were found to be influenced by the relative size of the human families to which the macaque V-regions are related. As the relevance of families to immunogenicity is suspected but unproven, a new parameter (the 'G-score') was derived from the H-score to avoid this influence, and macaque V-region sequences were reanalyzed using the G-score. Both parameters show that these regions cannot be regarded as human when they derive from heavy chains, but the humanness of light chains is variable. It was shown that 'germline humanization' of a macaque V-region favourably influenced its humanness, as evaluated by both H-score and G-score. In addition, the humanness of macaque sequences presented in patents has been analyzed. The H-score and G-score define objectively the humanness of antibody V-regions, and their use is exemplified here.


Assuntos
Genes de Imunoglobulinas , Imunoglobulinas/genética , Macaca fascicularis/genética , Macaca fascicularis/imunologia , Animais , Diversidade de Anticorpos , Bases de Dados Genéticas , Genes de Cadeia Pesada de Imunoglobulina , Humanos , Fragmentos Fab das Imunoglobulinas/genética , Cadeias Pesadas de Imunoglobulinas/genética , Região Variável de Imunoglobulina/genética , Cadeias kappa de Imunoglobulina/genética , Cadeias lambda de Imunoglobulina/genética , Família Multigênica , Análise de Sequência/estatística & dados numéricos , Especificidade da Espécie
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA