RESUMO
BACKGROUND: About forty-five years ago the advent of Sanger sequencing (Sanger and Coulson 1975) was revolutionary as it allowed deciphering of complete genome sequences. A second revolution came when next-generation sequencing (NGS) technologies accelerated and cheapened genome sequencing. Recently, third generation/longread sequencing methods have appeared, which can directly detect epigenetic modifications on native DNA and allow whole-transcript sequencing without the need for assembly. Nanopore sequencing is one of these third-generation approaches, enabling a single molecule of DNA or RNA to be sequenced in real-time without the need for PCR amplification or chemical labelling of the sample. It works by monitoring changes to an electrical current as nucleic acids are passed through protein or synthetic nanopores. METHODS: A literature search was performed in order to collect and summarize current information about the methodological aspects of nanopore sequencing as well as some application examples. RESULTS: The review describes concisely and comprehensibly the technical aspects of nanopore sequencing and stresses the advantages and disadvantages of this technique thereby also giving examples of their potential applications in the clinical routine laboratory as are rapid identification of viral pathogens, monitoring Ebola, environmental and food safety monitoring, human and plant genome sequencing, monitoring of antibiotic resistance, and other applications. CONCLUSIONS: It is a useful incitation for such ones being permanently in search of upgrading their laboratory.
Assuntos
Sequenciamento por Nanoporos/métodos , Serviços de Laboratório Clínico/tendências , Testes Diagnósticos de Rotina , Humanos , Análise de Sequência/instrumentação , Análise de Sequência/métodos , Análise de Sequência/tendênciasRESUMO
Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach.
Assuntos
Biologia Computacional , Armazenamento e Recuperação da Informação , Análise de Sequência/instrumentaçãoRESUMO
Massively parallel sequencing (MPS) is now a clinical reality, promising improved diagnosis, targeted therapies, and population-based screening. To realize the potential of genomics, we must learn how to apply this technology optimally. The NCGENES project is designed to address several challenges that must be overcome in order to integrate MPS into clinical care.
Assuntos
Serviços em Genética , Análise de Sequência , Humanos , North Carolina , Análise de Sequência/instrumentação , Análise de Sequência/métodosRESUMO
Next generation sequencing (NGS) has pushed back the limitations of prior sequencing technologies to advance genomic knowledge infinitely by allowing cost-effective, rapid sequencing to become a reality. Genome-wide transcriptional profiling can be achieved using NGS with either Tag-Seq, in which short tags of cDNA represent a gene, or RNA-Seq, in which the entire transcriptome is sequenced. Furthermore, the level and diversity of miRNA within different tissues or cell types can be monitored by specifically sequencing small RNA. The biological mechanisms underlying differential gene regulation can also be explored by coupling chromatin immunoprecipitation with NGS (ChIP-Seq). Using this methodology genome-wide binding sites for transcription factors, RNAP II, epigenetic modifiers and the distribution of modified histones can be assessed. The superior, high-resolution data generated by adopting this sequencing technology allows researchers to distinguish the precise genomic location bound by a protein and correlate this with observed gene expression patterns. Additional methods have also been established to examine other factors influencing gene regulation such as DNA methylation or chromatin conformation on a genome-wide scale. Within any research setting, these techniques can provide relevant data and answer numerous questions about gene expression and regulation. The advances made by pairing NGS with strategic experimental protocols will continue to impact the research community.
Assuntos
Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica , Análise de Sequência/métodos , Sequência de Bases , Cromatina/genética , Imunoprecipitação da Cromatina/instrumentação , Imunoprecipitação da Cromatina/métodos , Metilação de DNA/genética , Perfilação da Expressão Gênica/instrumentação , Histonas/química , Histonas/genética , Humanos , RNA Polimerase II/genética , RNA Interferente Pequeno/genética , Análise de Sequência/instrumentação , Fatores de Transcrição/genéticaRESUMO
We report an ab initio density functional theory study of the interaction of four nucleobases, cytosine, thymine, adenine, and guanine, with a novel graphene nanopore device for detecting the base sequence of a single-stranded nucleic acid (ssDNA or RNA). The nucleobases were inserted into a pore in a graphene nanoribbon, and the electrical current and conductance spectra were calculated as functions of voltage applied across the nanoribbon. The conductance spectra and charge densities were analyzed in the presence of each nucleobase in the graphene nanopore. The results indicate that due to significant differences in the conductance spectra the proposed device has adequate sensitivity to discriminate between different nucleotides. Moreover, we show that the nucleotide conductance spectrum is affected little by its orientation inside the graphene nanopore. The proposed technique may be extremely useful for real applications in developing ultrafast, low-cost DNA sequencing methods.
Assuntos
DNA de Cadeia Simples/química , Nanoporos , Ácidos Nucleicos/análise , RNA/química , Análise de Sequência/instrumentação , Ácidos Nucleicos/químicaRESUMO
High-throughput quantitative analytical method for plant N-glycan has been developed. All steps, including peptide N-glycosidase (PNGase) A treatment, glycan preparation, and exoglycosidase digestion, were optimized for high-throughput applications using 96-well format procedures and automatic analysis on a DNA sequencer. The glycans of horseradish peroxidase with plant-specific core alpha(1,3)-fucose can be distinguished by the comparison of the glycan profiles obtained via PNGase A and F treatments. The peaks of the glycans with (91%) and without (1.2%) alpha(1,3)-fucose could be readily quantified and shown to harbor bisecting beta(1,2)-xylose via simultaneous treatment with alpha(1,3)-mannosidase and beta(1,2)-xylosidase. This optimized method was successfully applied to analyze N-glycans of plant-expressed recombinant antibody, which was engineered to contain a minor amount of glycan harboring beta(1,2)-xylose. These results indicate that our DNA sequencer-based method provides quantitative information for plant-specific N-glycan analysis in a high-throughput manner, which has not previously been achieved by glycan profiling based on mass spectrometry.
Assuntos
Plantas/química , Polissacarídeos/química , Proteínas Recombinantes/biossíntese , Proteínas Recombinantes/química , Análise de Sequência/instrumentação , Análise de Sequência/métodos , Anticorpos Monoclonais/biossíntese , Anticorpos Monoclonais/química , Anticorpos Monoclonais/genética , Anticorpos Antivirais/biossíntese , Anticorpos Antivirais/química , Glicosídeo Hidrolases/química , Peroxidase do Rábano Silvestre/química , Espectrometria de Massas , Raiva/imunologia , Proteínas Recombinantes/genética , Análise de Sequência de DNA/instrumentação , Nicotiana/genética , Nicotiana/metabolismoRESUMO
Querying gene function in bacteria has been greatly accelerated by the advent of transposon sequencing (Tn-seq) technologies (related Tn-seq strategies are known as TraDIS, INSeq, RB-TnSeq, and HITS). Pooled populations of transposon mutants are cultured in an environment and next-generation sequencing tools are used to determine areas of the genome that are important for bacterial fitness. In this review we provide an overview of Tn-seq methodologies and discuss how Tn-seq has been applied, or could be applied, to the study of oral microbiology. These applications include studying the essential genome as a means to rationally design therapeutic agents. Tn-seq has also contributed to our understanding of well-studied biological processes in oral bacteria. Other important applications include in vivo pathogenesis studies and use of Tn-seq to probe the molecular basis of microbial interactions. We also highlight recent advancements in techniques that act in synergy with Tn-seq such as clustered regularly interspaced short palindromic repeats (CRISPR) interference and microfluidic chip platforms.
Assuntos
Bactérias/genética , Elementos de DNA Transponíveis/genética , Genes Essenciais/genética , Boca/microbiologia , Análise de Sequência de DNA/métodos , Análise de Sequência/métodos , Sistemas de Liberação de Medicamentos , Genes Essenciais/efeitos dos fármacos , Genoma Bacteriano , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Interações Microbianas/genética , Mutagênese Insercional , Fenótipo , Análise de Sequência/instrumentação , Análise de Sequência de DNA/instrumentaçãoRESUMO
BACKGROUND: Searching for similarities in protein and DNA databases has become a routine procedure in Molecular Biology. The Smith-Waterman algorithm has been available for more than 25 years. It is based on a dynamic programming approach that explores all the possible alignments between two sequences; as a result it returns the optimal local alignment. Unfortunately, the computational cost is very high, requiring a number of operations proportional to the product of the length of two sequences. Furthermore, the exponential growth of protein and DNA databases makes the Smith-Waterman algorithm unrealistic for searching similarities in large sets of sequences. For these reasons heuristic approaches such as those implemented in FASTA and BLAST tend to be preferred, allowing faster execution times at the cost of reduced sensitivity. The main motivation of our work is to exploit the huge computational power of commonly available graphic cards, to develop high performance solutions for sequence alignment. RESULTS: In this paper we present what we believe is the fastest solution of the exact Smith-Waterman algorithm running on commodity hardware. It is implemented in the recently released CUDA programming environment by NVidia. CUDA allows direct access to the hardware primitives of the last-generation Graphics Processing Units (GPU) G80. Speeds of more than 3.5 GCUPS (Giga Cell Updates Per Second) are achieved on a workstation running two GeForce 8800 GTX. Exhaustive tests have been done to compare our implementation to SSEARCH and BLAST, running on a 3 GHz Intel Pentium IV processor. Our solution was also compared to a recently published GPU implementation and to a Single Instruction Multiple Data (SIMD) solution. These tests show that our implementation performs from 2 to 30 times faster than any other previous attempt available on commodity hardware. CONCLUSIONS: The results show that graphic cards are now sufficiently advanced to be used as efficient hardware accelerators for sequence alignment. Their performance is better than any alternative available on commodity hardware platforms. The solution presented in this paper allows large scale alignments to be performed at low cost, using the exact Smith-Waterman algorithm instead of the largely adopted heuristic approaches.
Assuntos
Gráficos por Computador , Alinhamento de Sequência/instrumentação , Análise de Sequência/instrumentação , Processamento de Sinais Assistido por Computador/instrumentação , Desenho de Equipamento , Análise de Falha de Equipamento , Armazenamento e Recuperação da Informação/métodos , Alinhamento de Sequência/métodos , Análise de Sequência/métodosRESUMO
Generally, machine learning includes many in silico methods to transform the principles underlying natural phenomenon to human understanding information, which aim to save human labor, to assist human judge, and to create human knowledge. It should have wide application potential in biological and biomedical studies, especially in the era of big biological data. To look through the application of machine learning along with biological development, this review provides wide cases to introduce the selection of machine learning methods in different practice scenarios involved in the whole biological and biomedical study cycle and further discusses the machine learning strategies for analyzing omics data in some cutting-edge biological studies. Finally, the notes on new challenges for machine learning due to small-sample high-dimension are summarized from the key points of sample unbalance, white box, and causality.
Assuntos
Big Data , Pesquisa Biomédica/métodos , Biologia Computacional/métodos , Aprendizado de Máquina , Medicina de Precisão/métodos , Pesquisa Biomédica/instrumentação , Biologia Computacional/instrumentação , Mineração de Dados/métodos , Processamento de Imagem Assistida por Computador/instrumentação , Processamento de Imagem Assistida por Computador/métodos , Mapeamento de Interação de Proteínas/instrumentação , Mapeamento de Interação de Proteínas/métodos , Análise de Sequência/instrumentação , Análise de Sequência/métodos , SoftwareRESUMO
At many research institutions, lab space is more valuable than gold. Developers are taking note by designing smaller instruments with enhanced capabilities. Nathan Blow looks inside today's tiny lab.
Assuntos
Pesquisa Biomédica/instrumentação , Biotecnologia/instrumentação , Biotecnologia/tendências , Laboratórios/tendências , Microtecnologia/instrumentação , Citometria de Fluxo/instrumentação , Humanos , Análise de Sequência/instrumentaçãoRESUMO
Electron capture dissociation (ECD) is a new fragmentation technique used in Fourier transform ion cyclotron resonance mass spectrometry and is complementary to traditional tandem mass spectrometry techniques. Disulfide bonds, normally stable to vibrational excitation, are preferentially cleaved in ECD. Fragmentation is fast and specific and labile post-translational modifications and non-covalent bonds often remain intact after backbone bond dissociation. ECD provides more extensive sequence coverage in polypeptides, and at higher electron energies even isoleucine and leucine are distinguishable. In biotechnology, the main area of ECD application is expected to be the top-down verification of DNA-predicted protein sequences, de novo sequencing, disulfide bond analysis and the combined top-down/bottom-up analysis of post-translational modifications.
Assuntos
DNA/análise , DNA/metabolismo , Cromatografia Gasosa-Espectrometria de Massas/métodos , Proteínas/análise , Proteínas/metabolismo , Proteômica/métodos , Análise de Sequência/métodos , Animais , DNA/química , Elétrons , Cromatografia Gasosa-Espectrometria de Massas/instrumentação , Humanos , Espectrometria de Massas/instrumentação , Espectrometria de Massas/métodos , Processamento de Proteína Pós-Traducional/fisiologia , Proteínas/química , Proteômica/instrumentação , Análise de Sequência/instrumentaçãoRESUMO
Array technology has been applied in environmental research using innovative approaches in gene expression, comparative genomics and mixed community analysis. Greater fundamental understanding of sources of experimental and analytical error in array experiments should facilitate the future application of array technology to environmental analysis.
Assuntos
Bactérias/genética , Bactérias/isolamento & purificação , Monitoramento Ambiental/métodos , Perfilação da Expressão Gênica/métodos , Regulação Bacteriana da Expressão Gênica/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise de Sequência/métodos , Algoritmos , Técnicas de Cocultura/instrumentação , Técnicas de Cocultura/métodos , Monitoramento Ambiental/instrumentação , Perfilação da Expressão Gênica/instrumentação , Análise de Sequência com Séries de Oligonucleotídeos/instrumentação , Análise de Sequência com Séries de Oligonucleotídeos/tendências , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Análise de Sequência/instrumentaçãoRESUMO
DNA and RNA molecules can be detected as they are driven through a nanopore by an applied electric field at rates ranging from several hundred microseconds to a few milliseconds per molecule. The nanopore can rapidly discriminate between pyrimidine and purine segments along a single-stranded nucleic acid molecule. Nanopore detection and characterization of single molecules represents a new method for directly reading information encoded in linear polymers. If single-nucleotide resolution can be achieved, it is possible that nucleic acid sequences can be determined at rates exceeding a thousand bases per second.
Assuntos
Análise de Sequência/instrumentação , Análise de Sequência/métodos , Toxinas Bacterianas/química , Previsões , Proteínas Hemolisinas/química , Ácidos Nucleicos/análise , Ácidos Nucleicos/química , Purinas/química , Pirimidinas/química , RNA/análise , Análise de Sequência de DNA/instrumentação , Análise de Sequência de DNA/métodosRESUMO
Proteins and peptides can be sequenced from the carboxy-terminus with isothiocyanate reagents to produce amino acid thiohydantoin derivatives. Previous studies in our laboratory have focused on solution phase conditions for formation of the peptidylthiohydantoins with trimethylsilylisothiocyanate (TMS-ITC) and for hydrolysis of these peptidylthiohydantoins into an amino acid thiohydantoin derivative and a new shortened peptide capable of continued degradation (Bailey, J. M. & Shively, J. E., 1990, Biochemistry 29, 3145-3156). The current study is a continuation of this work and describes the construction of an instrument for automated C-terminal sequencing, the application of the thiocyanate chemistry to peptides covalently coupled to a novel polyethylene solid support (Shenoy, N. R., Bailey, J. M., & Shively, J. E., 1992, Protein Sci. I, 58-67), the use of sodium trimethylsilanolate as a novel reagent for the specific cleavage of the derivatized C-terminal amino acid, and the development of methodology to sequence through the difficult amino acid, aspartate. Automated programs are described for the C-terminal sequencing of peptides covalently attached to carboxylic acid-modified polyethylene. The chemistry involves activation with acetic anhydride, derivatization with TMS-ITC, and cleavage of the derivatized C-terminal amino acid with sodium trimethylsilanolate. The thiohydantoin amino acid is identified by on-line high performance liquid chromatography using a Phenomenex Ultracarb 5 ODS(30) column and a triethylamine/phosphoric acid buffer system containing pentanesulfonic acid. The generality of our automated C-terminal sequencing methodology was examined by sequencing model peptides containing all 20 of the common amino acids. All of the amino acids were found to sequence in high yield (90% or greater) except for asparagine and aspartate, which could be only partially removed, and proline, which was found not be capable of derivatization. In spite of these current limitations, the methodology should be a valuable new tool for the C-terminal sequence analysis of peptides.
Assuntos
Peptídeos/química , Análise de Sequência/instrumentação , Sequência de Aminoácidos , Aminoácidos/química , Automação , Cromatografia Líquida de Alta Pressão , Membranas Artificiais , Modelos Químicos , Dados de Sequência Molecular , Polietilenos/química , Silanos/química , Tiocianatos/química , Tioidantoínas/químicaRESUMO
We report on studies leading to refinements of various steps of the protein internal sequencing process. Specifically, the developments comprise (1) higher-sensitivity chemical sequencing through background reduction; (2) improved peptide recovery from rapid in situ digests of nanogram amount, nitrocellulose-bound proteins; and (3) accurate UV spectroscopic identification of Trp- and Cys-containing peptides. In addition, we describe strategies for 2-dimensional liquid chromatographic peptide isolation from complex mixtures and a multi-analytical approach to peptide sequence analysis (Edman sequencing, matrix-assisted laser desorption mass spectrometry, and UV spectroscopy). Both strategies were applied in tandem to the primary structural analysis of a gel-purified, 250-kDa protein (mammalian target of rapamycin-FKBP12 complex), available in low picomolar quantities only. More than 300-amino acids worth of sequence was obtained in mostly uninterrupted stretches, several containing Trp, Cys, His, and Ser. That information has allowed the matching of a biological function of a mammalian protein to a yeast gene product with a well-characterized mutant phenotype. The results also demonstrate that extended chemical sequencing analysis (e.g., 26 successive amino acids) is now feasible, starting with initial yields well below 1 pmol.
Assuntos
Proteínas de Transporte/química , Proteínas de Ligação a DNA/química , Proteínas Fúngicas/química , Proteínas de Choque Térmico/química , Fosfatidilinositol 3-Quinases , Fosfotransferases (Aceptor do Grupo Álcool)/química , Polienos/química , Conformação Proteica , Proteínas de Saccharomyces cerevisiae , Sequência de Aminoácidos , Automação , Proteínas de Transporte/metabolismo , Proteínas de Ciclo Celular , Cromatografia Líquida de Alta Pressão , Proteínas de Ligação a DNA/metabolismo , Resistência Microbiana a Medicamentos/genética , Proteínas Fúngicas/genética , Proteínas de Choque Térmico/metabolismo , Indicadores e Reagentes , Espectrometria de Massas , Dados de Sequência Molecular , Fosfotransferases (Aceptor do Grupo Álcool)/genética , Polienos/metabolismo , Polienos/farmacologia , Saccharomyces cerevisiae/efeitos dos fármacos , Saccharomyces cerevisiae/genética , Análise de Sequência/instrumentação , Análise de Sequência/métodos , Sirolimo , Espectrofotometria Ultravioleta , Proteínas de Ligação a TacrolimoRESUMO
PEPMOTIF is a computer program which analyzes protein sequences for the occurrence of peptides up to ten residues in length which contain motifs presented by particular class I major histocompatibility complexes. Any peptide motifs defined by the user can be identified in a protein sequence of interest. PEPMOTIF generates a listing of all motif-containing peptides found in the protein, and two modes of data output are provided: (1) direct printout, or (2) storage in a text file on disk.
Assuntos
Antígenos de Histocompatibilidade Classe I/química , Peptídeos/química , Análise de Sequência/instrumentação , Software , Antígenos de Histocompatibilidade Classe I/genética , Peptídeos/imunologiaRESUMO
Exoglycosidase digestion in combination with the catalog-library approach (CLA) is used with matrix-assisted laser desorption/ionization Fourier transform mass spectrometry (MALDI-FTMS) to obtain the complete structure of oligosaccharides. The CLA is a collision-induced dissociation (CID)-based method used to determine the structure of O-linked neutral oligosaccharides. It provides both linkage and stereochemical information. Exoglycosidases are used to confirm independently the validity of the CLA. In some cases, the CLA provides structural information on all but a single residue. Exoglycosidase is used to refine these structures. In this way, exoglycosidase use is targeted employing only a small number of enzymes. Exoglycosidase arrays, which have been used with N-linked oligosaccharides, is avoided despite the larger variations in structures of O-linked species.
Assuntos
Glicosídeo Hidrolases , Oligossacarídeos/análise , Análise de Sequência/métodos , Animais , Sequência de Carboidratos , Cromatografia Líquida de Alta Pressão , Hidrólise , Espectroscopia de Ressonância Magnética , Dados de Sequência Molecular , Oligossacarídeos/química , Análise de Sequência/instrumentação , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz , Espectroscopia de Infravermelho com Transformada de Fourier , Estereoisomerismo , Relação Estrutura-Atividade , XenopusRESUMO
Ion mobility spectrometry (IMS) has recently been established as a powerful tool to separate the protease digest mixtures and identify their peptide components. As accurate calculation of mobilities is critical for this technique, a new rapid method based on intrinsic size parameters (ISPs) of amino acid residues has been devised. However, those parameters had to be obtained by tedious statistical analysis of a large body of experimental data. Here we demonstrate that they can instead be derived a priori, based on the stoichiometry of a residue. Our main finding is that the ISP of a residue is essentially determined by its density, that is, the average mass/size ratio of its constituent atoms. This is in accordance with an interpretation in which peptides assume compact conformations in the gas phase dominated by the solvation of ionic charge.