Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
Mais filtros

Tipo de documento
Intervalo de ano de publicação
1.
São Paulo; s.n; s.n; 2022. 186 p. tab, graf, ilus.
Tese em Português | LILACS | ID: biblio-1397348

RESUMO

Os avanços metodológicos e instrumentais decorrentes do Projeto Genoma Humano formaram o arcabouço necessário para o surgimento das tecnologias de sequenciamento de DNA de Nova Geração, as quais se caracterizam por um custo reduzido, uma baixa demanda operacional e a produção de um grande volume de dados por experimento. Concomitantemente a isso, o aumento no poder de processamento computacional permitiu o desenvolvimento de análises genéticas em larga escala, de modo que, atualmente, é possível estudar características genômicas individualizadas e, até então, pouco ou nunca exploradas. Dentre essas características, aquelas relacionadas às variações estruturais em genomas têm recebido bastante atenção. Os pseudogenes processados, ou retrocópias, são variações estruturais causadas pela duplicação de genes codificadores mediante à transposição de seu RNA mensageiro maduro pela maquinaria enzimática de LINE- 1. As retrocópias podem estar fixadas, ou seja, presentes em todos os genomas de uma dada espécie, os quais são representados pela montagem modelo do genoma de referência, ou podem não estar fixadas, sendo polimórficas, germinativas ou somáticas. No entanto, o conhecimento acerca das retrocópias não fixadas ainda é limitado devido à falta de ferramentas de bioinformática dedicadas a sua identificação e anotação em dados de sequenciamento de DNA. Posto isso, este trabalho apresenta o sideRETRO um programa computacional especializado na detecção de pseudogenes processados ausentes do genoma de referência, mas presentes em dados de sequenciamento de genoma completo e exoma de outros indivíduos. Além de apontar para a presença de retrocópias não fixadas, o sideRETRO é capaz de anotar várias outras características relacionadas a esses evento, tais como: a coordenada genômica de inserção do pseudogene processado, a qual constitui o cromossomo, o ponto de inserção e a fita de DNA (líder or retardada); o contexto genômico do evento (exônico, intrônico ou intergênico); a genotipagem (presente ou ausente) e a haplotipagem (em homozigose ou heterozigose). Para atestar a eficiência da ferramenta, o sideRETRO foi executado para dados simulados e para dados reais validados experimentalmente por um grupo independente. Portanto, em resumo, nesta tese são descritos o desenvolvimento e o uso do sideRETRO uma ferramenta computacional robusta e eficiente, designada para identificar e anotar pseudogenes processados não fixados. Por fim, vale destacar que o sideRETRO preenche uma lacuna metodológica e possibilita novas hipóteses e investigações sistemáticas no campo de chamada de variantes estruturais


The methodological and instrumental advances resulting from the Human Genome Project have created the necessary framework to the emergence of Next Generation DNA sequencing technologies, which are characterized by a reduced cost, low operational demand and the generation of a large volume of data per experiment. Concomitantly with this, the increase in computational processing power has driven the development of large-scale genetic analyses, which allowed us to study individualized genomic traits little or never explored before. Among these characteristics, those related to structural variations in genomes have received much attention. Processed pseudogenes, or retrocopies, are structural variations caused by the duplication of coding genes through the transposition of their mature messenger RNA by the LINE-1 enzymatic machinery. Retrocopies can be fixed (i.e., present in all genomes of a given species and included into the assembly of the reference genome) or unfixed, being polymorphic, germinal or somatic. However, knowledge about unfixed retrocopies is still limited due to the lack of bioinformatics tools dedicated to their identification and annotation in DNA sequencing data. Therefore, this work presents sideRETRO a computer program specialized in the detection of processed pseudogenes absent from the reference genome, but present in whole genome and exome sequencing data from other individuals. In addition to pointing out the presence of unfixed retrocopies, sideRETRO is able to annotate several other characteristics related to these events, such as: the genomic coordinate of the processed pseudogene insetion, which constitutes the chromosome, the insertion point and the DNA strand (leader or retard); the genomic context of the event (exonic, intronic or intergenic); genotyping (present or absent) and haplotyping (homozygous or heterozygous). To certify the sideRETRO efficiency, it was run on simulated data and on real data experimentally validated by an independent group. Therefore, in summary, this thesis describes the development and use of sideRETRO a robust and efficient computational tool, designed to identify and annotate unfixed processed pseudogenes. Finally, it is worth noting that sideRETRO fills a methodological gap and allows new hypotheses and systematic investigations in the field of structural variant calling


Assuntos
Polimorfismo Genético/genética , Biologia Computacional/classificação , Biologia Computacional/instrumentação , Custos e Análise de Custo , Genômica/instrumentação , Análise de Sequência de DNA/instrumentação , Codificação Clínica
2.
Methods Mol Biol ; 1883: 195-215, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30547401

RESUMO

In this chapter, we introduce the reader to a popular family of machine learning algorithms, called decision trees. We then review several approaches based on decision trees that have been developed for the inference of gene regulatory networks (GRNs). Decision trees have indeed several nice properties that make them well-suited for tackling this problem: they are able to detect multivariate interacting effects between variables, are non-parametric, have good scalability, and have very few parameters. In particular, we describe in detail the GENIE3 algorithm, a state-of-the-art method for GRN inference.


Assuntos
Biologia Computacional/métodos , Redes Reguladoras de Genes , Modelos Genéticos , Aprendizado de Máquina não Supervisionado , Biologia Computacional/instrumentação , Árvores de Decisões , Regulação da Expressão Gênica
4.
São Paulo; s.n; s.n; 2019. 108 p. ilus, graf, tab.
Tese em Português | LILACS | ID: biblio-1008521

RESUMO

Os inibidores de BRAF (iBRAFs) e de MEK (iMEK), inauguraram uma nova classe de medicamentos, a terapia direcionada, no combate ao melanoma metastático. Entretanto, os pacientes adquirem resistência ao tratamento em poucos meses. Além disso, a imunoterapia vem ganhando espaço no tratamento do câncer, incluindo o melanoma, porém, com alguns aspectos inexplorados. Dentro deste tema, a enzima IDO vem despertando um grande interesse pela participação nos mecanismos de imunotolerância, imunoescape e progressão tumoral. A IDO é responsável pelo consumo e depleção do triptofano, produzindo a quinurenina. Ela está presente em diversos tipos celulares, incluindo células do sistema imune e células tumorais. Este trabalho objetivou avaliar a expressão de IDO durante a progressão da doença - desde do nevo até o melanoma metastático e também avaliar a regulação de IDO induzido por IFN-γ após tratamento com iBRAF em linhagens parentais e resistentes ao iBRAF, buscando-se os mecanismos moleculares. Por fim, objetivou-se entender os efeitos do 1-metil-triptofano (1-MT), um inibidor de IDO, tanto na sua capacidade de inibir a atividade de IDO quanto na sua influência na capacidade clonogênica. O estudo de bioinformática sobre o repositório público GSE12391 mostrou que o nível de expressão gênica de IDO foi superior nos estágios mais avançado da doença. Além disso, todas amostras de melanoma primário de pacientes apresentaram a imunomarcação de IDO, enquanto que nenhuma amostra de nevo apresentou tal marcação. Adicionalmente, a ocorrência de IDO se deu nos infiltrados linfoides, em células mononucleares do sistema imune. Duas análises de bioinformática de expressão gênica demonstraram que a IDO estava expressa positivamente na fase de resistência ao iBRAF. Ademais, os resultados de expressão proteica mostraram que a inibição de via MAPK (tanto por iBRAF quanto por iMEK) conseguiu modular a expressão de IDO, sendo que a maioria das linhagens apresentou uma diminuição de IDO. A atividade de IDO, medida através da produção de quinurenina, por HPLC se mostrou em consonância com os resultados de expressão proteica, exceto pela linhagem WM164 que não apresentou atividade enzimática, embora a proteína estivesse presente. Por fim, o 1-MT conseguiu inibir de maneira eficiente a enzima IDO, bloqueando a produção de quinurenina. Além de que, o 1-MT reduziu a capacidade clonogênica de maneira dose-dependente. Portanto, conclui-se que a expressão de IDO é crescente conforme a progressão do melanoma, que a inibição da via MAPK regulou a expressão de IDO e que o 1-MT reduz a capacidade clonogênica, além da sua função primária de inibir IDO


BRAF and MEK inhibitors (BRAFi and MEKi) has launched a new class of medication, the target therapy, to combat metastatic melanoma. Nevertheless, patients acquired resistance to the treatment in few months. Additionally, immunotherapy has been gaining space in cancer treatment, including melanoma, but some aspects need to be explored. Inside this theme, IDO enzyme has called the attention due to its participation in the mechanisms of immune tolerance, scape and tumor progression. IDO is responsible for tryptophan consume e depletion, producing kynurenine. It is present in different cells, including cells from immune system and tumor cells. This work purposed evaluate IDO expression during disease progression - since nevus until metastatic melanoma and also, evaluate IFN-γ-induced IDO regulation after BRAFi treatment in parental and resistant melanoma cell lines, seeking the molecular mechanisms. Lastly, it was evaluated the effects of 1-methyltryptopahn (1-MT), an IDO inhibitor, by its ability to inhibit IDO and also by its influency on the clonogenic capability. Bioinformatic study performed on GSE12391 showed that gene expression level of IDO was superior in the most advanced stages of the disease. Additionally, all sample of patient's primary melanoma presented IDO immunostaining, whereas, no nevus samples presented such staining. Besides, IDO occurrence was in the lymphoid infiltrates, in mononuclear cells from immune system. Two bioinformatic analysis of gene expression demonstrated that IDO was differentially overexpressed during BRAFi resistance stage. Moreover, protein expression results presented that MAPK pathway inhibition (both by BRAFi and by MEKy) was able to modulate IDO expression, and most of the cell lines presented an IDO downregulation. IDO activity, measured through kynurenine production, by HPLC was consonant with protein expression results, except by WM164 cell line, which did not present enzymatic activity, albeit the protein was present. By the end, 1-MT could inhibit efficiently IDO enzyme, blocking kynurenine production. Furthermore, 1-MT reduced clonogenic capability in a dosedependent manner. Therefore, it was concluded that IDO expression increases along with melanoma progression, MAPK pathway inhibition regulated IDO expression and 1-MT reduced clonogenic capability, besides its primary function of IDO inhibitor


Assuntos
Progressão da Doença , Indolamina-Pirrol 2,3,-Dioxigenase/análise , Melanoma/prevenção & controle , Biologia Computacional/instrumentação , Proteínas Quinases Ativadas por Mitógeno/análise
5.
Arch Pathol Lab Med ; 141(11): 1544-1557, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28782984

RESUMO

CONTEXT: - Next-generation sequencing (NGS) is a technology being used by many laboratories to test for inherited disorders and tumor mutations. This technology is new for many practicing pathologists, who may not be familiar with the uses, methodology, and limitations of NGS. OBJECTIVE: - To familiarize pathologists with several aspects of NGS, including current and expanding uses; methodology including wet bench aspects, bioinformatics, and interpretation; validation and proficiency; limitations; and issues related to the integration of NGS data into patient care. DATA SOURCES: - The review is based on peer-reviewed literature and personal experience using NGS in a clinical setting at a major academic center. CONCLUSIONS: - The clinical applications of NGS will increase as the technology, bioinformatics, and resources evolve to address the limitations and improve quality of results. The challenge for clinical laboratories is to ensure testing is clinically relevant, cost-effective, and can be integrated into clinical care.


Assuntos
Doenças Genéticas Inatas/diagnóstico , Testes Genéticos , Sequenciamento de Nucleotídeos em Larga Escala , Mutação , Neoplasias/diagnóstico , Biologia Computacional/economia , Biologia Computacional/instrumentação , Biologia Computacional/tendências , Análise Mutacional de DNA/economia , Análise Mutacional de DNA/instrumentação , Análise Mutacional de DNA/normas , Análise Mutacional de DNA/tendências , Bases de Dados Genéticas , Doenças Genéticas Inatas/genética , Testes Genéticos/economia , Testes Genéticos/instrumentação , Testes Genéticos/normas , Testes Genéticos/tendências , Custos de Cuidados de Saúde , Sequenciamento de Nucleotídeos em Larga Escala/economia , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Sequenciamento de Nucleotídeos em Larga Escala/normas , Sequenciamento de Nucleotídeos em Larga Escala/tendências , Humanos , Ensaio de Proficiência Laboratorial , Neoplasias/genética , Integração de Sistemas
6.
BMC Res Notes ; 9: 144, 2016 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-26945860

RESUMO

BACKGROUND: The National Institutes of Health (USA) has committed 5 years of funding to the Bioinformatics Network of the Human Heredity and Health in Africa initiative. This pan-African network aims to develop capacity for bioinformatics research, in order to provide support to human health genomics research programs ongoing on the continent. Over the 5 years of funding, it is imperative to track changes in bioinformatics capacity at the funded centres and to document how the funding has translated into capacity development during this time frame. RESULTS: The Network capacity database, NetCapDB, is a relational database that captures quantitative metrics for bioinformatics capacity, and tracks the changes in these metrics over time. A graphical user interface allows for straight-forward, browser-based data entry by users across Africa; and for visual and graph-based exploration of captured data. A reporting interface allows for semi-automated generation of standardized reports for monitoring and evaluation purposes.


Assuntos
Biologia Computacional/economia , Genoma Humano , National Institutes of Health (U.S.)/economia , Avaliação de Programas e Projetos de Saúde/estatística & dados numéricos , África , Financiamento de Capital , Biologia Computacional/instrumentação , Biologia Computacional/métodos , Bases de Dados Factuais , Humanos , Estados Unidos , Interface Usuário-Computador
7.
BMC Bioinformatics ; 15 Suppl 5: S2, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25077818

RESUMO

BACKGROUND: The huge quantity of data produced in Biomedical research needs sophisticated algorithmic methodologies for its storage, analysis, and processing. High Performance Computing (HPC) appears as a magic bullet in this challenge. However, several hard to solve parallelization and load balancing problems arise in this context. Here we discuss the HPC-oriented implementation of a general purpose learning algorithm, originally conceived for DNA analysis and recently extended to treat uncertainty on data (U-BRAIN). The U-BRAIN algorithm is a learning algorithm that finds a Boolean formula in disjunctive normal form (DNF), of approximately minimum complexity, that is consistent with a set of data (instances) which may have missing bits. The conjunctive terms of the formula are computed in an iterative way by identifying, from the given data, a family of sets of conditions that must be satisfied by all the positive instances and violated by all the negative ones; such conditions allow the computation of a set of coefficients (relevances) for each attribute (literal), that form a probability distribution, allowing the selection of the term literals. The great versatility that characterizes it, makes U-BRAIN applicable in many of the fields in which there are data to be analyzed. However the memory and the execution time required by the running are of O(n(3)) and of O(n(5)) order, respectively, and so, the algorithm is unaffordable for huge data sets. RESULTS: We find mathematical and programming solutions able to lead us towards the implementation of the algorithm U-BRAIN on parallel computers. First we give a Dynamic Programming model of the U-BRAIN algorithm, then we minimize the representation of the relevances. When the data are of great size we are forced to use the mass memory, and depending on where the data are actually stored, the access times can be quite different. According to the evaluation of algorithmic efficiency based on the Disk Model, in order to reduce the costs of the communications between different memories (RAM, Cache, Mass, Virtual) and to achieve efficient I/O performance, we design a mass storage structure able to access its data with a high degree of temporal and spatial locality. Then we develop a parallel implementation of the algorithm. We model it as a SPMD system together to a Message-Passing Programming Paradigm. Here, we adopt the high-level message-passing systems MPI (Message Passing Interface) in the version for the Java programming language, MPJ. The parallel processing is organized into four stages: partitioning, communication, agglomeration and mapping. The decomposition of the U-BRAIN algorithm determines the necessity of a communication protocol design among the processors involved. Efficient synchronization design is also discussed. CONCLUSIONS: In the context of a collaboration between public and private institutions, the parallel model of U-BRAIN has been implemented and tested on the INTEL XEON E7xxx and E5xxx family of the CRESCO structure of Italian National Agency for New Technologies, Energy and Sustainable Economic Development (ENEA), developed in the framework of the European Grid Infrastructure (EGI), a series of efforts to provide access to high-throughput computing resources across Europe using grid computing techniques. The implementation is able to minimize both the memory space and the execution time. The test data used in this study are IPDATA (Irvine Primate splice- junction DATA set), a subset of HS3D (Homo Sapiens Splice Sites Dataset) and a subset of COSMIC (the Catalogue of Somatic Mutations in Cancer). The execution time and the speed-up on IPDATA reach the best values within about 90 processors. Then the parallelization advantage is balanced by the greater cost of non-local communications between the processors. A similar behaviour is evident on HS3D, but at a greater number of processors, so evidencing the direct relationship between data size and parallelization gain. This behaviour is confirmed on COSMIC. Overall, the results obtained show that the parallel version is up to 30 times faster than the serial one.


Assuntos
Algoritmos , Biologia Computacional/métodos , Metodologias Computacionais , Animais , Biologia Computacional/instrumentação , Bases de Dados de Ácidos Nucleicos , Europa (Continente) , Humanos , Software
8.
Int J Environ Res Public Health ; 10(12): 6887-908, 2013 Dec 09.
Artigo em Inglês | MEDLINE | ID: mdl-24351788

RESUMO

The human signal-molecule-profiling database (HSMPD) is designed as a prospective medical database for translational bioinformatics (TBI). To explore the feasibility of low-cost database construction, we studied the roadmap of HSMPD. A HSMPD-oriented tool, called "signal-molecule-profiling (SMP) chip" was developed for data acquisition, which can be employed in the routine blood tests in hospitals; the results will be stored in the HSMPD system automatically. HSMPD system can provide data services for the TBI community, which generates a stable income to support the data acquisition. The small-scale experimental test was performed in the hospital to verify SMP chips and the demo HSMPD software. One hundred and eighty nine complete SMP records were collected, and the demo HSMPD system was also evaluated in the survey study on patients and doctors. The function of SMP chip was verified, whereas the demo HSMPD software needed to be improved. The survey study showed that patients would only accept free tests of SMP chips when they originally needed blood examinations. The study indicated that the construction of HSMPD relies on the self-motivated cooperation of the TBI community and the traditional healthcare system. The proposed roadmap potentially provides an executable solution to build the HSMPD without high costs.


Assuntos
Biologia Computacional/instrumentação , Biologia Computacional/métodos , Bases de Dados Factuais/normas , Técnicas Analíticas Microfluídicas/normas , Biologia Computacional/economia , Citocinas , Bases de Dados Factuais/economia , Hormônios , Hospitais , Humanos , Técnicas Analíticas Microfluídicas/economia , Estudos Prospectivos , Software/normas
9.
Elife ; 2: e01456, 2013 Sep 10.
Artigo em Inglês | MEDLINE | ID: mdl-24040512

RESUMO

By centralizing many of the tasks associated with the upkeep of scientific software, SBGrid allows researchers to spend more of their time on research.


Assuntos
Biologia Computacional/instrumentação , Software/economia , Biologia Computacional/economia , Comportamento Cooperativo , Humanos , Disseminação de Informação , Software/ética , Software/provisão & distribuição
10.
BMC Bioinformatics ; 14: 243, 2013 Aug 12.
Artigo em Inglês | MEDLINE | ID: mdl-23937194

RESUMO

BACKGROUND: Teaching bioinformatics at universities is complicated by typical computer classroom settings. As well as running software locally and online, students should gain experience of systems administration. For a future career in biology or bioinformatics, the installation of software is a useful skill. We propose that this may be taught by running the course on GNU/Linux running on inexpensive Raspberry Pi computer hardware, for which students may be granted full administrator access. RESULTS: We release 4273π, an operating system image for Raspberry Pi based on Raspbian Linux. This includes minor customisations for classroom use and includes our Open Access bioinformatics course, 4273π Bioinformatics for Biologists. This is based on the final-year undergraduate module BL4273, run on Raspberry Pi computers at the University of St Andrews, Semester 1, academic year 2012-2013. CONCLUSIONS: 4273π is a means to teach bioinformatics, including systems administration tasks, to undergraduates at low cost.


Assuntos
Biologia/economia , Biologia/educação , Biologia Computacional/economia , Biologia Computacional/educação , Estudantes , Universidades , Biologia/instrumentação , Biologia Computacional/instrumentação , Computadores/economia , Humanos , Software , Materiais de Ensino/economia , Livros de Texto como Assunto
11.
Methods Mol Biol ; 855: 77-110, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22407706

RESUMO

In this chapter, we review basic concepts from probability theory and computational statistics that are fundamental to evolutionary genomics. We provide a very basic introduction to statistical modeling and discuss general principles, including maximum likelihood and Bayesian inference. Markov chains, hidden Markov models, and Bayesian network models are introduced in more detail as they occur frequently and in many variations in genomics applications. In particular, we discuss efficient inference algorithms and methods for learning these models from partially observed data. Several simple examples are given throughout the text, some of which point to models that are discussed in more detail in subsequent chapters.


Assuntos
Biologia Computacional/instrumentação , Modelos Estatísticos , Probabilidade , Algoritmos , Teorema de Bayes , Cadeias de Markov
12.
CBE Life Sci Educ ; 10(4): 342-5, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-22135368

RESUMO

To transform undergraduate biology education, faculty need to provide opportunities for students to engage in the process of science. The rise of research approaches using next-generation (NextGen) sequencing has been impressive, but incorporation of such approaches into the undergraduate curriculum remains a major challenge. In this paper, we report proceedings of a National Science Foundation-funded workshop held July 11-14, 2011, at Juniata College. The purpose of the workshop was to develop a regional research coordination network for undergraduate biology education (RCN/UBE). The network is collaborating with a genome-sequencing core facility located at Pennsylvania State University (University Park) to enable undergraduate students and faculty at small colleges to access state-of-the-art sequencing technology. We aim to create a database of references, protocols, and raw data related to NextGen sequencing, and to find innovative ways to reduce costs related to sequencing and bioinformatics analysis. It was agreed that our regional network for NextGen sequencing could operate more effectively if it were partnered with the Genome Consortium for Active Teaching (GCAT) as a new arm of that consortium, entitled GCAT-SEEK(quence). This step would also permit the approach to be replicated elsewhere.


Assuntos
Educação de Graduação em Medicina/métodos , Genoma/genética , Ensino/métodos , Biologia Computacional/economia , Biologia Computacional/educação , Biologia Computacional/instrumentação , Congressos como Assunto , Bases de Dados Genéticas , Tecnologia Educacional/economia , Tecnologia Educacional/educação , Tecnologia Educacional/instrumentação , Docentes de Medicina/organização & administração , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/instrumentação , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise de Sequência de DNA/economia , Análise de Sequência de DNA/instrumentação , Análise de Sequência de DNA/métodos , Estudantes de Medicina
13.
PLoS One ; 6(10): e26624, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-22028928

RESUMO

BACKGROUND: The widespread popularity of genomic applications is threatened by the "bioinformatics bottleneck" resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. RESULTS: We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. CONCLUSIONS: Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers.


Assuntos
Biologia Computacional/economia , Biologia Computacional/métodos , Internet , Microbiologia/economia , Análise de Sequência/economia , Análise de Sequência/métodos , Interface Usuário-Computador , Animais , Biologia Computacional/instrumentação , Humanos , Lactente , Metagenômica , Camundongos , Microbiologia/instrumentação , Anotação de Sequência Molecular , RNA Bacteriano/genética , RNA Ribossômico 16S/genética , Análise de Sequência/instrumentação
14.
J Struct Funct Genomics ; 12(1): 33-41, 2011 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-21519818

RESUMO

Residues in a protein-protein interface that are important for forming and stabilizing the interaction can usually be identified by looking at patterns of evolutionary conservation in groups of homologous proteins and also by the computational identification of binding hotspots. The PRICE (PRotein Interface Conservation and Energetics) server takes the coordinates of a protein-protein complex, dissects the interface into core and rim regions, and calculates (1) the degree of conservation (measured as the sequence entropy), as well as (2) the change in free energy of binding (∆∆G, due to alanine scanning mutagenesis) of interface residues. Results are displayed as color-coded plots and also made available for download. This enables the computational identification of binding hot spots, based on which further experiments can be designed. The method will aid in protein functional prediction by correct assignment of hot regions involved in binding. Consideration of sequence entropies for residues with large ∆∆G values may provide an indication of the biological relevance of the interface. Finally, the results obtained on a test set of alanine mutants has been compared to those obtained using other servers/methods. The PRICE server is a web application available at http://www.boseinst.ernet.in/resources/bioinfo/stag.html.


Assuntos
Biologia Computacional/instrumentação , Biologia Computacional/métodos , Bases de Dados de Proteínas , Proteínas/química , Proteínas/metabolismo , Internet , Modelos Moleculares , Complexos Multiproteicos/química , Mapeamento de Interação de Proteínas/métodos , Interface Usuário-Computador
15.
BMC Bioinformatics ; 11: 387, 2010 Jul 20.
Artigo em Inglês | MEDLINE | ID: mdl-20646315

RESUMO

BACKGROUND: Biomedical research is set to greatly benefit from the use of semantic web technologies in the design of computational infrastructure. However, beyond well defined research initiatives, substantial issues of data heterogeneity, source distribution, and privacy currently stand in the way towards the personalization of Medicine. RESULTS: A computational framework for bioinformatic infrastructure was designed to deal with the heterogeneous data sources and the sensitive mixture of public and private data that characterizes the biomedical domain. This framework consists of a logical model build with semantic web tools, coupled with a Markov process that propagates user operator states. An accompanying open source prototype was developed to meet a series of applications that range from collaborative multi-institution data acquisition efforts to data analysis applications that need to quickly traverse complex data structures. This report describes the two abstractions underlying the S3DB-based infrastructure, logical and numerical, and discusses its generality beyond the immediate confines of existing implementations. CONCLUSIONS: The emergence of the "web as a computer" requires a formal model for the different functionalities involved in reading and writing to it. The S3DB core model proposed was found to address the design criteria of biomedical computational infrastructure, such as those supporting large scale multi-investigator research, clinical trials, and molecular epidemiology.


Assuntos
Biologia Computacional/métodos , Modelos Biológicos , Biologia Computacional/instrumentação , Computadores , Internet , Cadeias de Markov
17.
Methods Enzymol ; 467: 197-227, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19897094

RESUMO

While it is true that the modern computer is many orders of magnitude faster than that of yesteryear; this tremendous growth in CPU clock rates is now over. Unfortunately, however, the growth in demand for computational power has not abated; whereas researchers a decade ago could simply wait for computers to get faster, today the only solution to the growing need for more powerful computational resource lies in the exploitation of parallelism. Software parallelization falls generally into two broad categories--"true parallel" and high-throughput computing. This chapter focuses on the latter of these two types of parallelism. With high-throughput computing, users can run many copies of their software at the same time across many different computers. This technique for achieving parallelism is powerful in its ability to provide high degrees of parallelism, yet simple in its conceptual implementation. This chapter covers various patterns of high-throughput computing usage and the skills and techniques necessary to take full advantage of them. By utilizing numerous examples and sample codes and scripts, we hope to provide the reader not only with a deeper understanding of the principles behind high-throughput computing, but also with a set of tools and references that will prove invaluable as she explores software parallelism with her own software applications and research.


Assuntos
Biologia Computacional , Computadores , Ciência , Software , Algoritmos , Biologia Computacional/instrumentação , Biologia Computacional/métodos , Sistemas de Gerenciamento de Base de Dados , Humanos , Método de Monte Carlo , Interface Usuário-Computador
19.
Conf Proc IEEE Eng Med Biol Soc ; 2006: 5330-3, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-17946298

RESUMO

We propose a new computational approach for protein docking exploiting energy funnels in the 6-dimensional space of translations and rotations of the ligand with respect to the receptor. Our approach consists of a series of translational and orientational moves of the ligand towards the receptor. Each move is performed using a global optimization method we have developed - the semi-definite underestimation (SDU) method - which can exploit a funnel-like energy function. We compared our approach with Monte Carlo on a set of 10 protein complexes using two residue-level potentials. To achieve the same level of performance (produce a near-native < or =3 A RMSD complex) our approach reduces energy evaluations by more than a factor of two, on average.


Assuntos
Biologia Computacional/instrumentação , Biologia Computacional/métodos , Mapeamento de Interação de Proteínas , Proteínas/química , Algoritmos , Fenômenos Fisiológicos Celulares , Simulação por Computador , Bases de Dados de Proteínas , Ligantes , Modelos Moleculares , Modelos Teóricos , Método de Monte Carlo , Ligação Proteica , Conformação Proteica
20.
Conf Proc IEEE Eng Med Biol Soc ; 2006: 5826-9, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-17946724

RESUMO

We propose a new approach for the protein tertiary structure prediction based on the concept of mini-threading. The method identifies useful fragments in Protein Data Bank (PDB) with variable lengths and retrieves spatial restraints. The multidimensional scaling method and least-squares minimization are used to build coarse-grain structural models. Our method uses the information in the PDB efficiently and the prediction time is in minutes when compared to hours and days required by existing methods.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Proteínas/química , Análise de Sequência de Proteína , Algoritmos , Biologia Computacional/instrumentação , Simulação por Computador , Modelos Estatísticos , Método de Monte Carlo , Conformação Proteica , Estrutura Secundária de Proteína , Fatores de Tempo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA