Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
Sci Rep ; 12(1): 9101, 2022 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-35650262

RESUMO

Identification of proteins is one of the most computationally intensive steps in genomics studies. It usually relies on aligners that do not accommodate rich information on proteins and require additional pipelining steps for protein identification. We introduce kAAmer, a protein database engine based on amino-acid k-mers that provides efficient identification of proteins while supporting the incorporation of flexible annotations on these proteins. Moreover, the database is built to be used as a microservice, to be hosted and queried remotely.


Assuntos
Aminoácidos , Software , Algoritmos , Bases de Dados de Proteínas , Análise de Sequência de DNA
2.
Genome Biol ; 19(1): 112, 2018 08 17.
Artigo em Inglês | MEDLINE | ID: mdl-30115128

RESUMO

BACKGROUND: Numerous scaffold-level sequences for wheat are now being released and, in this context, we report on a strategy for improving the overall assembly to a level comparable to that of the human genome. RESULTS: Using chromosome 7A of wheat as a model, sequence-finished megabase-scale sections of this chromosome were established by combining a new independent assembly using a bacterial artificial chromosome (BAC)-based physical map, BAC pool paired-end sequencing, chromosome-arm-specific mate-pair sequencing and Bionano optical mapping with the International Wheat Genome Sequencing Consortium RefSeq v1.0 sequence and its underlying raw data. The combined assembly results in 18 super-scaffolds across the chromosome. The value of finished genome regions is demonstrated for two approximately 2.5 Mb regions associated with yield and the grain quality phenotype of fructan carbohydrate grain levels. In addition, the 50 Mb centromere region analysis incorporates cytological data highlighting the importance of non-sequence data in the assembly of this complex genome region. CONCLUSIONS: Sufficient genome sequence information is shown to now be available for the wheat community to produce sequence-finished releases of each chromosome of the reference genome. The high-level completion identified that an array of seven fructosyl transferase genes underpins grain quality and that yield attributes are affected by five F-box-only-protein-ubiquitin ligase domain and four root-specific lipid transfer domain genes. The completed sequence also includes the centromere.


Assuntos
Agricultura , Genoma de Planta , Fenômenos Ópticos , Mapeamento Físico do Cromossomo/métodos , Triticum/genética , Centrômero/metabolismo , Cromossomos Artificiais Bacterianos/genética , Cromossomos de Plantas/genética , Frutanos/análise , Sementes/genética
3.
Mol Biol Evol ; 34(10): 2716-2729, 2017 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-28957508

RESUMO

Bacterial genomics studies are getting more extensive and complex, requiring new ways to envision analyses. Using the Ray Surveyor software, we demonstrate that comparison of genomes based on their k-mer content allows reconstruction of phenetic trees without the need of prior data curation, such as core genome alignment of a species. We validated the methodology using simulated genomes and previously published phylogenomic studies of Streptococcus pneumoniae and Pseudomonas aeruginosa. We also investigated the relationship of specific genetic determinants with bacterial population structures. By comparing clusters from the complete genomic content of a genome population with clusters from specific functional categories of genes, we can determine how the population structures are correlated. Indeed, the strain clustering based on a subset of k-mers allows determination of its similarity with the whole genome clusters. We also applied this methodology on 42 species of bacteria to determine the correlational significance of five important bacterial genomic characteristics. For example, intrinsic resistance is more important in P. aeruginosa than in S. pneumoniae, and the former has increased correlation of its population structure with antibiotic resistance genes. The global view of the pangenome of bacteria also demonstrated the taxa-dependent interaction of population structure with antibiotic resistance, bacteriophage, plasmid, and mobile element k-mer data sets.


Assuntos
Biologia Computacional/métodos , Genoma Bacteriano/genética , Análise de Sequência de DNA/métodos , Bactérias/genética , Evolução Biológica , Análise por Conglomerados , Simulação por Computador , Evolução Molecular , Genômica/métodos , Metagenômica , Filogenia , Células Procarióticas , Software
4.
Genome Announc ; 5(9)2017 Mar 02.
Artigo em Inglês | MEDLINE | ID: mdl-28254974

RESUMO

Brucella suis is a Gram-negative, facultative intracellular pathogen that has pigs as its preferred host, but it can also infect humans. Here, we report the draft genome sequences of two B. suis strains that were isolated from the same patient, 8 years apart.

5.
Genome Announc ; 5(8)2017 Feb 23.
Artigo em Inglês | MEDLINE | ID: mdl-28232424

RESUMO

Brucella canis is a facultative intracellular pathogen that preferentially infects members of the Canidae family. Here, we report the genome sequencing of two Brucella canis strains isolated from humans and one isolated from a dog host.

6.
Nucleic Acids Res ; 45(D1): D535-D542, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27899627

RESUMO

The Pathosystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center (https://www.patricbrc.org). Recent changes to PATRIC include a redesign of the web interface and some new services that provide users with a platform that takes them from raw reads to an integrated analysis experience. The redesigned interface allows researchers direct access to tools and data, and the emphasis has changed to user-created genome-groups, with detailed summaries and views of the data that researchers have selected. Perhaps the biggest change has been the enhanced capability for researchers to analyze their private data and compare it to the available public data. Researchers can assemble their raw sequence reads and annotate the contigs using RASTtk. PATRIC also provides services for RNA-Seq, variation, model reconstruction and differential expression analysis, all delivered through an updated private workspace. Private data can be compared by 'virtual integration' to any of PATRIC's public data. The number of genomes available for comparison in PATRIC has expanded to over 80 000, with a special emphasis on genomes with antimicrobial resistance data. PATRIC uses this data to improve both subsystem annotation and k-mer classification, and tags new genomes as having signatures that indicate susceptibility or resistance to specific antibiotics.


Assuntos
Bactérias/genética , Biologia Computacional/métodos , Bases de Dados Genéticas , Genoma Bacteriano , Genômica/métodos , Antibacterianos/farmacologia , Bactérias/efeitos dos fármacos , Bactérias/metabolismo , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Farmacorresistência Bacteriana , Anotação de Sequência Molecular , Proteoma , Proteômica/métodos , Software , Navegador
7.
Mol Ecol Resour ; 17(4): 806-811, 2017 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-27754597

RESUMO

Freshwater eels (Anguilla sp.) have large economic, cultural, ecological and aesthetic importance worldwide, but they suffered more than 90% decline in global stocks over the past few decades. Proper genetic resources, such as sequenced, assembled and annotated genomes, are essential to help plan sustainable recoveries by identifying physiological, biochemical and genetic mechanisms that caused the declines or that may lead to recoveries. Here, we present the first sequenced genome of the American eel. This genome contained 305 043 contigs (N50 = 7397) and 79 209 scaffolds (N50 = 86 641) for a total size of 1.41 Gb, which is in the middle of the range of previous estimations for this species. In addition, protein-coding regions, including introns and flanking regions, are very well represented in the genome, as 95.2% of the 458 core eukaryotic genes and 98.8% of the 248 ultra-conserved subset were represented in the assembly and a total of 26 564 genes were annotated for future functional genomics studies. We performed a candidate gene analysis to compare three genes among all three freshwater eel species and, congruent with the phylogenetic relationships, Japanese eel (A. japanica) exhibited the most divergence. Overall, the sequenced genome presented in this study is a crucial addition to the presently available genetic tools to help guide future conservation efforts of freshwater eels.


Assuntos
Anguilla/genética , Genoma , Filogenia , Animais
8.
Sci Rep ; 6: 27930, 2016 06 14.
Artigo em Inglês | MEDLINE | ID: mdl-27297683

RESUMO

The emergence and spread of antimicrobial resistance (AMR) mechanisms in bacterial pathogens, coupled with the dwindling number of effective antibiotics, has created a global health crisis. Being able to identify the genetic mechanisms of AMR and predict the resistance phenotypes of bacterial pathogens prior to culturing could inform clinical decision-making and improve reaction time. At PATRIC (http://patricbrc.org/), we have been collecting bacterial genomes with AMR metadata for several years. In order to advance phenotype prediction and the identification of genomic regions relating to AMR, we have updated the PATRIC FTP server to enable access to genomes that are binned by their AMR phenotypes, as well as metadata including minimum inhibitory concentrations. Using this infrastructure, we custom built AdaBoost (adaptive boosting) machine learning classifiers for identifying carbapenem resistance in Acinetobacter baumannii, methicillin resistance in Staphylococcus aureus, and beta-lactam and co-trimoxazole resistance in Streptococcus pneumoniae with accuracies ranging from 88-99%. We also did this for isoniazid, kanamycin, ofloxacin, rifampicin, and streptomycin resistance in Mycobacterium tuberculosis, achieving accuracies ranging from 71-88%. This set of classifiers has been used to provide an initial framework for species-specific AMR phenotype and genomic feature prediction in the RAST and PATRIC annotation services.


Assuntos
Antibacterianos/uso terapêutico , Infecções Bacterianas/tratamento farmacológico , Bases de Dados Genéticas , Resistência Microbiana a Medicamentos/genética , Genoma Bacteriano/genética , Tomada de Decisão Clínica , Biologia Computacional , Curadoria de Dados , Humanos , Aprendizado de Máquina , Testes de Sensibilidade Microbiana , Anotação de Sequência Molecular , National Institutes of Health (U.S.) , Prognóstico , Estados Unidos
9.
Gigascience ; 2(1): 10, 2013 Jul 22.
Artigo em Inglês | MEDLINE | ID: mdl-23870653

RESUMO

BACKGROUND: The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. RESULTS: In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. CONCLUSIONS: Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.

10.
Mol Microbiol ; 88(1): 189-202, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-23421749

RESUMO

Antimonials are still the mainstay of treatment against leishmaniasis but drug resistance is increasing. We carried out short read next-generation sequencing (NGS) and comparative genomic hybridization (CGH) of three independent Leishmania major antimony-resistant mutants. Copy number variations were consistently detected with both NGS and CGH. A major attribute of antimony resistance was a novel terminal deletion of variable length (67 kb to 204 kb) of the polyploid chromosome 31 in the three mutants. Terminal deletions in two mutants occurred at the level of inverted repeated sequences. The AQP1 gene coding for an aquaglyceroporin was part of the deleted region and its transfection into resistant mutants reverted resistance to SbIII. We also highlighted an intrachromosomal amplification of a subtelomeric locus on chromosome 34 in one mutant. This region encoded for ascorbate-dependent peroxidase (APX) and glucose-6-phosphate dehydrogenase (G6PDH). Overexpression of these genes in revertant backgrounds demonstrated resistance to SbIII and protection from reactive oxygen species (ROS). Generation of a G6PDH null mutant in one revertant exhibited SbIII sensitivity and a decreased protection of ROS. Our genomic analyses and functional validation highlighted novel genomic rearrangements, functionally important resistant loci and the implication of new genes in antimony resistance in Leishmania.


Assuntos
Antimônio/farmacologia , Cromossomos/genética , Resistência a Medicamentos/genética , Deleção de Genes , Leishmania/genética , Telômero/genética , Aquaporina 1/metabolismo , Mapeamento Cromossômico , Hibridização Genômica Comparativa , Resistência a Medicamentos/efeitos dos fármacos , Loci Gênicos/genética , Glucosefosfato Desidrogenase/genética , Glucosefosfato Desidrogenase/metabolismo , Leishmania/efeitos dos fármacos , Fenótipo , Espécies Reativas de Oxigênio/metabolismo , Reprodutibilidade dos Testes , Análise de Sequência de DNA
11.
Big Data ; 1(4): 227-36, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-27447255

RESUMO

As analysts are expected to process a greater amount of information in a shorter amount of time, creators of big data software are challenged with the need for improved efficiency. Ray, our group's usable, scalable genome assembler, addresses big data problems by using optimal resources and producing one, correct and conservative, timely solution. Only by abstracting the size of the data from both the computers and the humans can the real scientific question, often complex in itself, eventually be solved. To draw a curtain over the specific computational machinery of big data, we developed RayPlatform, a programming framework that allows users to concentrate on their domain-specific problems. RayPlatform is a parallel message-passing software framework that runs on clouds, supercomputers, and desktops alike. Using established technologies such as C++ and MPI (message-passing interface), we handle the genomes of hundreds of species, from viruses to plants, using machines ranging from desktop computers to supercomputers. From this experience, we present insights on making computer time more useful-and user time much more valuable.

12.
Genome Biol ; 13(12): R122, 2012 Dec 22.
Artigo em Inglês | MEDLINE | ID: mdl-23259615

RESUMO

Voluminous parallel sequencing datasets, especially metagenomic experiments, require distributed computing for de novo assembly and taxonomic profiling. Ray Meta is a massively distributed metagenome assembler that is coupled with Ray Communities, which profiles microbiomes based on uniquely-colored k-mers. It can accurately assemble and profile a three billion read metagenomic experiment representing 1,000 bacterial genomes of uneven proportions in 15 hours with 1,024 processor cores, using only 1.5 GB per core. The software will facilitate the processing of large and complex datasets, and will help in generating biological insights for specific environments. Ray Meta is open source and available at http://denovoassembler.sf.net.


Assuntos
Genoma Bacteriano , Metagenômica/métodos , Software , Bactérias/classificação , Ontologia Genética , Sequenciamento de Nucleotídeos em Larga Escala , Metagenoma
13.
PLoS Negl Trop Dis ; 6(2): e1512, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22348164

RESUMO

BACKGROUND: Miltefosine (MF) is the first oral compound used in the chemotherapy against leishmaniasis. Since the mechanism of action of this drug and the targets of MF in Leishmania are unclear, we generated in a step-by-step manner Leishmania major promastigote mutants highly resistant to MF. Two of the mutants were submitted to a short-read whole genome sequencing for identifying potential genes associated with MF resistance. METHODS/PRINCIPAL FINDINGS: Analysis of the genome assemblies revealed several independent point mutations in a P-type ATPase involved in phospholipid translocation. Mutations in two other proteins-pyridoxal kinase and α-adaptin like protein-were also observed in independent mutants. The role of these proteins in the MF resistance was evaluated by gene transfection and gene disruption and both the P-type ATPase and pyridoxal kinase were implicated in MF susceptibility. The study also highlighted that resistance can be highly heterogeneous at the population level with individual clones derived from this population differing both in terms of genotypes but also susceptibility phenotypes. CONCLUSIONS/SIGNIFICANCE: Whole genome sequencing was used to pinpoint known and new resistance markers associated with MF resistance in the protozoan parasite Leishmania. The study also demonstrated the polyclonal nature of a resistant population with individual cells with varying susceptibilities and genotypes.


Assuntos
Antiprotozoários/farmacologia , Resistência a Medicamentos , Genoma de Protozoário , Leishmania major/efeitos dos fármacos , Leishmania major/genética , Mutação de Sentido Incorreto , Fosforilcolina/análogos & derivados , Animais , Humanos , Leishmania major/isolamento & purificação , Proteínas Mutantes/genética , Fosforilcolina/farmacologia , Proteínas de Protozoários/genética , Seleção Genética , Análise de Sequência de DNA
14.
Nucleic Acids Res ; 40(3): 1131-47, 2012 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-21998295

RESUMO

The Leishmania tarentolae Parrot-TarII strain genome sequence was resolved to an average 16-fold mean coverage by next-generation DNA sequencing technologies. This is the first non-pathogenic to humans kinetoplastid protozoan genome to be described thus providing an opportunity for comparison with the completed genomes of pathogenic Leishmania species. A high synteny was observed between all sequenced Leishmania species. A limited number of chromosomal regions diverged between L. tarentolae and L. infantum, while remaining syntenic to L. major. Globally, >90% of the L. tarentolae gene content was shared with the other Leishmania species. We identified 95 predicted coding sequences unique to L. tarentolae and 250 genes that were absent from L. tarentolae. Interestingly, many of the latter genes were expressed in the intracellular amastigote stage of pathogenic species. In addition, genes coding for products involved in antioxidant defence or participating in vesicular-mediated protein transport were underrepresented in L. tarentolae. In contrast to other Leishmania genomes, two gene families were expanded in L. tarentolae, namely the zinc metallo-peptidase surface glycoprotein GP63 and the promastigote surface antigen PSA31C. Overall, L. tarentolae's gene content appears better adapted to the promastigote insect stage rather than the amastigote mammalian stage.


Assuntos
Genes de Protozoários , Leishmania/genética , Animais , Dosagem de Genes , Regulação da Expressão Gênica no Desenvolvimento , Genoma de Protozoário , Genômica , Leishmania/crescimento & desenvolvimento , Lagartos/parasitologia , Família Multigênica , Análise de Sequência de DNA , Sintenia
15.
J Gene Med ; 13(10): 522-37, 2011 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-21954090

RESUMO

BACKGROUND: Various endonucleases can be engineered to induce double-strand breaks (DSBs) in chosen DNA sequences. These DSBs are spontaneously repaired by nonhomologous-end-joining, resulting in micro-insertions or micro-deletions (INDELs). We detected, characterized and quantified the frequency of INDELs produced by one meganuclease (MGN) targeting the RAG1 gene, six MGNs targeting three introns of the human dystrophin gene and one pair of zinc finger nucleases (ZFNs) targeting exon 50 of the human dystrophin gene. The experiments were performed in human cells (i.e. 293 T cells, myoblasts and myotubes). METHODS: To analyse the INDELs produced by the endonucleases the targeted region was polymerase chain reaction amplified and the amplicons were digested with the Surveyor enzyme, cloned in bacteria or deep sequenced. RESULTS: Endonucleases targeting the dystrophin gene produced INDELs of different sizes but there were clear peaks in the size distributions. The positions of these peaks were similar for MGNs but not for ZFNs in 293 T cells and in myoblasts. The size of the INDELs produced by these endonucleases in the dystrophin gene would have permitted a change in the reading frame. In a subsequent experiment, we observed that the frequency of INDELs was increased by re-exposition of the cells to the same endonuclease. CONCLUSIONS: Endonucleases are able to: (i) restore the normal reading of a gene with a frame shift mutation; (ii) delete a nonsense codon; and (iii) knockout a gene. Endonucleases could thus be used to treat Duchenne muscular dystrophy and other hereditary diseases that are the result of a nonsense codon or a frame shift mutation.


Assuntos
Distrofina/genética , Endonucleases/metabolismo , Distrofia Muscular de Duchenne/genética , Distrofia Muscular de Duchenne/terapia , Linhagem Celular , Códon sem Sentido , Endonucleases/genética , Éxons , Mutação da Fase de Leitura , Genes RAG-1 , Humanos , Mutação INDEL , Lentivirus/genética , Lentivirus/metabolismo , Mioblastos/fisiologia , Fases de Leitura , Dedos de Zinco/genética
16.
J Comput Biol ; 17(11): 1519-33, 2010 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-20958248

RESUMO

An accurate genome sequence of a desired species is now a pre-requisite for genome research. An important step in obtaining a high-quality genome sequence is to correctly assemble short reads into longer sequences accurately representing contiguous genomic regions. Current sequencing technologies continue to offer increases in throughput, and corresponding reductions in cost and time. Unfortunately, the benefit of obtaining a large number of reads is complicated by sequencing errors, with different biases being observed with each platform. Although software are available to assemble reads for each individual system, no procedure has been proposed for high-quality simultaneous assembly based on reads from a mix of different technologies. In this paper, we describe a parallel short-read assembler, called Ray, which has been developed to assemble reads obtained from a combination of sequencing platforms. We compared its performance to other assemblers on simulated and real datasets. We used a combination of Roche/454 and Illumina reads to assemble three different genomes. We showed that mixing sequencing technologies systematically reduces the number of contigs and the number of errors. Because of its open nature, this new tool will hopefully serve as a basis to develop an assembler that can be of universal utilization (availability: http://deNovoAssembler.sf.Net/). For online Supplementary Material , see www.liebertonline.com.


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Análise de Sequência de DNA/instrumentação , Sequência de Bases , Biologia Computacional/instrumentação , Biologia Computacional/métodos , Mapeamento de Sequências Contíguas , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software
17.
J Clin Microbiol ; 47(3): 743-50, 2009 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-19158263

RESUMO

Respiratory virus infections are a major health concern and represent the primary cause of testing consultation and hospitalization for young children. We developed and compared two assays that allow the detection of up to 23 different respiratory viruses that frequently infect children. The first method consisted of single TaqMan quantitative real-time PCR assays in a 96-well-plate format. The second consisted of a multiplex PCR followed by primer extension and microarray hybridization in an integrated molecular diagnostic device, the Infiniti analyzer. Both of our assays can detect adenoviruses of groups A, B, C, and E; coronaviruses HKU1, 229E, NL63, and OC43; enteroviruses A, B, C, and D; rhinoviruses of genotypes A and B; influenza viruses A and B; human metapneumoviruses (HMPV) A and B, human respiratory syncytial viruses (HRSV) A and B; and parainfluenza viruses of types 1, 2, and 3. These tests were used to identify viruses in 221 nasopharyngeal aspirates obtained from children hospitalized for respiratory tract infections. Respiratory viruses were detected with at least one of the two methods in 81.4% of the 221 specimens: 10.0% were positive for HRSV A, 38.0% for HRSV B, 13.1% for influenzavirus A, 8.6% for any coronaviruses, 13.1% for rhinoviruses or enteroviruses, 7.2% for adenoviruses, 4.1% for HMPV, and 1.5% for parainfluenzaviruses. Multiple viral infections were found in 13.1% of the specimens. The two methods yielded concordant results for 94.1% of specimens. These tests allowed a thorough etiological assessment of respiratory viruses infecting children in hospital settings and would assist public health interventions.


Assuntos
Análise em Microsséries/métodos , Reação em Cadeia da Polimerase/métodos , Infecções Respiratórias/virologia , Viroses/diagnóstico , Vírus/classificação , Vírus/isolamento & purificação , Pré-Escolar , Humanos , Lactente , Nasofaringe/virologia , Sensibilidade e Especificidade , Viroses/virologia , Vírus/genética
18.
Retrovirology ; 5: 110, 2008 Dec 04.
Artigo em Inglês | MEDLINE | ID: mdl-19055831

RESUMO

BACKGROUND: Human immunodeficiency virus type 1 (HIV-1) infects cells by means of ligand-receptor interactions. This lentivirus uses the CD4 receptor in conjunction with a chemokine coreceptor, either CXCR4 or CCR5, to enter a target cell. HIV-1 is characterized by high sequence variability. Nonetheless, within this extensive variability, certain features must be conserved to define functions and phenotypes. The determination of coreceptor usage of HIV-1, from its protein envelope sequence, falls into a well-studied machine learning problem known as classification. The support vector machine (SVM), with string kernels, has proven to be very efficient for dealing with a wide class of classification problems ranging from text categorization to protein homology detection. In this paper, we investigate how the SVM can predict HIV-1 coreceptor usage when it is equipped with an appropriate string kernel. RESULTS: Three string kernels were compared. Accuracies of 96.35% (CCR5) 94.80% (CXCR4) and 95.15% (CCR5 and CXCR4) were achieved with the SVM equipped with the distant segments kernel on a test set of 1425 examples with a classifier built on a training set of 1425 examples. Our datasets are built with Los Alamos National Laboratory HIV Databases sequences. A web server is available at http://genome.ulaval.ca/hiv-dskernel. CONCLUSION: We examined string kernels that have been used successfully for protein homology detection and propose a new one that we call the distant segments kernel. We also show how to extract the most relevant features for HIV-1 coreceptor usage. The SVM with the distant segments kernel is currently the best method described.


Assuntos
Biologia Computacional/métodos , Receptores CCR5/química , Receptores CXCR4/química , Receptores CXCR4/genética , Receptores de HIV/química , Algoritmos , Infecções por HIV/genética , Infecções por HIV/metabolismo , Humanos , Internet , Receptores CCR5/genética , Receptores CCR5/metabolismo , Receptores CXCR4/metabolismo , Receptores de HIV/genética , Receptores de HIV/metabolismo , Homologia de Sequência de Aminoácidos , Software , Interface Usuário-Computador
19.
Genome Biol ; 9(7): R115, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18638379

RESUMO

BACKGROUND: Drug resistance can be complex, and several mutations responsible for it can co-exist in a resistant cell. Transcriptional profiling is ideally suited for studying complex resistance genotypes and has the potential to lead to novel discoveries. We generated full genome 70-mer oligonucleotide microarrays for all protein coding genes of the human protozoan parasites Leishmania major and Leishmania infantum. These arrays were used to monitor gene expression in methotrexate resistant parasites. RESULTS: Leishmania is a eukaryotic organism with minimal control at the level of transcription initiation and few genes were differentially expressed without concomitant changes in DNA copy number. One exception was found in Leishmania major, where the expression of whole chromosomes was down-regulated. The microarrays highlighted several mechanisms by which the copy number of genes involved in resistance was altered; these include gene deletion, formation of extrachromosomal circular or linear amplicons, and the presence of supernumerary chromosomes. In the case of gene deletion or gene amplification, the rearrangements have occurred at the sites of repeated (direct or inverted) sequences. These repeats appear highly conserved in both species to facilitate the amplification of key genes during environmental changes. When direct or inverted repeats are absent in the vicinity of a gene conferring a selective advantage, Leishmania will resort to supernumerary chromosomes to increase the levels of a gene product. CONCLUSION: Aneuploidy has been suggested as an important cause of drug resistance in several organisms and additional studies should reveal the potential importance of this phenomenon in drug resistance in Leishmania.


Assuntos
Resistência a Medicamentos/genética , Leishmania/efeitos dos fármacos , Leishmania/genética , Mutação , Proteínas de Protozoários/genética , Aneuploidia , Animais , Proteínas de Transporte de Ânions/genética , Amplificação de Genes , Deleção de Genes , Perfilação da Expressão Gênica , Genes de Protozoários , Leishmania infantum/efeitos dos fármacos , Leishmania infantum/genética , Leishmania infantum/metabolismo , Leishmania major/efeitos dos fármacos , Leishmania major/genética , Leishmania major/metabolismo , Metotrexato/farmacologia , Complexos Multienzimáticos/genética , Análise de Sequência com Séries de Oligonucleotídeos , Oxirredutases/genética , Tetra-Hidrofolato Desidrogenase/genética , Timidilato Sintase/genética
20.
BMC Genomics ; 9: 255, 2008 May 29.
Artigo em Inglês | MEDLINE | ID: mdl-18510761

RESUMO

BACKGROUND: Leishmania parasites cause a diverse spectrum of diseases in humans ranging from spontaneously healing skin lesions (e.g., L. major) to life-threatening visceral diseases (e.g., L. infantum). The high conservation in gene content and genome organization between Leishmania major and Leishmania infantum contrasts their distinct pathophysiologies, suggesting that highly regulated hierarchical and temporal changes in gene expression may be involved. RESULTS: We used a multispecies DNA oligonucleotide microarray to compare whole-genome expression patterns of promastigote (sandfly vector) and amastigote (mammalian macrophages) developmental stages between L. major and L. infantum. Seven per cent of the total L. infantum genome and 9.3% of the L. major genome were differentially expressed at the RNA level throughout development. The main variations were found in genes involved in metabolism, cellular organization and biogenesis, transport and genes encoding unknown function. Remarkably, this comparative global interspecies analysis demonstrated that only 10-12% of the differentially expressed genes were common to L. major and L. infantum. Differentially expressed genes are randomly distributed across chromosomes further supporting a posttranscriptional control, which is likely to involve a variety of 3'UTR elements. CONCLUSION: This study highlighted substantial differences in gene expression patterns between L. major and L. infantum. These important species-specific differences in stage-regulated gene expression may contribute to the disease tropism that distinguishes L. major from L. infantum.


Assuntos
Perfilação da Expressão Gênica , Genoma de Protozoário , Leishmania infantum/crescimento & desenvolvimento , Leishmania infantum/genética , Leishmania major/crescimento & desenvolvimento , Leishmania major/genética , Estágios do Ciclo de Vida , Regiões 3' não Traduzidas/genética , Animais , Linhagem Celular , Regulação da Expressão Gênica no Desenvolvimento , Humanos , Camundongos , Camundongos Endogâmicos A , Análise de Sequência com Séries de Oligonucleotídeos , RNA Mensageiro/isolamento & purificação , RNA de Protozoário/isolamento & purificação , Retroelementos , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Especificidade da Espécie
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA