Pesquisa | Portal Regional da BVS

Mass spectra alignment using virtual lock-masses.

Brochu, Francis; Plante, Pier-Luc; Drouin, Alexandre; Gagnon, Dominic; Richard, Dave; Durocher, Francine; Diorio, Caroline; Marchand, Mario; Corbeil, Jacques; Laviolette, François.

Sci Rep ; 9(1): 8469, 2019 06 11.

Artigo em Inglês | MEDLINE | ID: mdl-31186508

RESUMO

Mass spectrometry is a valued method to evaluate the metabolomics content of a biological sample. The recent advent of rapid ionization technologies such as Laser Diode Thermal Desorption (LDTD) and Direct Analysis in Real Time (DART) has rendered high-throughput mass spectrometry possible. It is used for large-scale comparative analysis of populations of samples. In practice, many factors resulting from the environment, the protocol, and even the instrument itself, can lead to minor discrepancies between spectra, rendering automated comparative analysis difficult. In this work, a sequence/pipeline of algorithms to correct variations between spectra is proposed. The algorithms correct multiple spectra by identifying peaks that are common to all and, from those, computes a spectrum-specific correction. We show that these algorithms increase comparability within large datasets of spectra, facilitating comparative analysis, such as machine learning.

Predicting Ion Mobility Collision Cross-Sections Using a Deep Neural Network: DeepCCS.

Plante, Pier-Luc; Francovic-Fontaine, Élina; May, Jody C; McLean, John A; Baker, Erin S; Laviolette, François; Marchand, Mario; Corbeil, Jacques.

Anal Chem ; 91(8): 5191-5199, 2019 04 16.

Artigo em Inglês | MEDLINE | ID: mdl-30932474

RESUMO

Untargeted metabolomic measurements using mass spectrometry are a powerful tool for uncovering new small molecules with environmental and biological importance. The small molecule identification step, however, still remains an enormous challenge due to fragmentation difficulties or unspecific fragment ion information. Current methods to address this challenge are often dependent on databases or require the use of nuclear magnetic resonance (NMR), which have their own difficulties. The use of the gas-phase collision cross section (CCS) values obtained from ion mobility spectrometry (IMS) measurements were recently demonstrated to reduce the number of false positive metabolite identifications. While promising, the amount of empirical CCS information currently available is limited, thus predictive CCS methods need to be developed. In this article, we expand upon current experimental IMS capabilities by predicting the CCS values using a deep learning algorithm. We successfully developed and trained a prediction model for CCS values requiring only information about a compound's SMILES notation and ion type. The use of data from five different laboratories using different instruments allowed the algorithm to be trained and tested on more than 2400 molecules. The resulting CCS predictions were found to achieve a coefficient of determination of 0.97 and median relative error of 2.7% for a wide range of molecules. Furthermore, the method requires only a small amount of processing power to predict CCS values. Considering the performance, time, and resources necessary, as well as its applicability to a variety of molecules, this model was able to outperform all currently available CCS prediction algorithms.

Assuntos

Aprendizado Profundo , Redes Neurais de Computação , Algoritmos , Espectrometria de Mobilidade Iônica , Espectroscopia de Ressonância Magnética , Espectrometria de Massas , Metabolômica

Interpretable genotype-to-phenotype classifiers with performance guarantees.

Drouin, Alexandre; Letarte, Gaël; Raymond, Frédéric; Marchand, Mario; Corbeil, Jacques; Laviolette, François.

Sci Rep ; 9(1): 4071, 2019 03 11.

Artigo em Inglês | MEDLINE | ID: mdl-30858411

RESUMO

Understanding the relationship between the genome of a cell and its phenotype is a central problem in precision medicine. Nonetheless, genotype-to-phenotype prediction comes with great challenges for machine learning algorithms that limit their use in this setting. The high dimensionality of the data tends to hinder generalization and challenges the scalability of most learning algorithms. Additionally, most algorithms produce models that are complex and difficult to interpret. We alleviate these limitations by proposing strong performance guarantees, based on sample compression theory, for rule-based learning algorithms that produce highly interpretable models. We show that these guarantees can be leveraged to accelerate learning and improve model interpretability. Our approach is validated through an application to the genomic prediction of antimicrobial resistance, an important public health concern. Highly accurate models were obtained for 12 species and 56 antibiotics, and their interpretation revealed known resistance mechanisms, as well as some potentially new ones. An open-source disk-based implementation that is both memory and computationally efficient is provided with this work. The implementation is turnkey, requires no prior knowledge of machine learning, and is complemented by comprehensive tutorials.

Assuntos

Estudos de Associação Genética , Genoma/genética , Aprendizado de Máquina , Medicina de Precisão , Algoritmos , Inteligência Artificial , Genômica , Humanos , Software

Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons.

Drouin, Alexandre; Giguère, Sébastien; Déraspe, Maxime; Marchand, Mario; Tyers, Michael; Loo, Vivian G; Bourgault, Anne-Marie; Laviolette, François; Corbeil, Jacques.

BMC Genomics ; 17(1): 754, 2016 Sep 26.

Artigo em Inglês | MEDLINE | ID: mdl-27671088

RESUMO

BACKGROUND: The identification of genomic biomarkers is a key step towards improving diagnostic tests and therapies. We present a reference-free method for this task that relies on a k-mer representation of genomes and a machine learning algorithm that produces intelligible models. The method is computationally scalable and well-suited for whole genome sequencing studies. RESULTS: The method was validated by generating models that predict the antibiotic resistance of C. difficile, M. tuberculosis, P. aeruginosa, and S. pneumoniae for 17 antibiotics. The obtained models are accurate, faithful to the biological pathways targeted by the antibiotics, and they provide insight into the process of resistance acquisition. Moreover, a theoretical analysis of the method revealed tight statistical guarantees on the accuracy of the obtained models, supporting its relevance for genomic biomarker discovery. CONCLUSIONS: Our method allows the generation of accurate and interpretable predictive models of phenotypes, which rely on a small set of genomic variations. The method is not limited to predicting antibiotic resistance in bacteria and is applicable to a variety of organisms and phenotypes. Kover, an efficient implementation of our method, is open-source and should guide biological efforts to understand a plethora of phenotypes ( http://github.com/aldro61/kover/ ).

Machine learning assisted design of highly active peptides for drug discovery.

Giguère, Sébastien; Laviolette, François; Marchand, Mario; Tremblay, Denise; Moineau, Sylvain; Liang, Xinxia; Biron, Éric; Corbeil, Jacques.

PLoS Comput Biol ; 11(4): e1004074, 2015 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-25849257

RESUMO

The discovery of peptides possessing high biological activity is very challenging due to the enormous diversity for which only a minority have the desired properties. To lower cost and reduce the time to obtain promising peptides, machine learning approaches can greatly assist in the process and even partly replace expensive laboratory experiments by learning a predictor with existing data or with a smaller amount of data generation. Unfortunately, once the model is learned, selecting peptides having the greatest predicted bioactivity often requires a prohibitive amount of computational time. For this combinatorial problem, heuristics and stochastic optimization methods are not guaranteed to find adequate solutions. We focused on recent advances in kernel methods and machine learning to learn a predictive model with proven success. For this type of model, we propose an efficient algorithm based on graph theory, that is guaranteed to find the peptides for which the model predicts maximal bioactivity. We also present a second algorithm capable of sorting the peptides of maximal bioactivity. Extensive analyses demonstrate how these algorithms can be part of an iterative combinatorial chemistry procedure to speed up the discovery and the validation of peptide leads. Moreover, the proposed approach does not require the use of known ligands for the target protein since it can leverage recent multi-target machine learning predictors where ligands for similar targets can serve as initial training data. Finally, we validated the proposed approach in vitro with the discovery of new cationic antimicrobial peptides. Source code freely available at http://graal.ift.ulaval.ca/peptide-design/.

Assuntos

Peptídeos Catiônicos Antimicrobianos/química , Peptídeos Catiônicos Antimicrobianos/farmacocinética , Fenômenos Fisiológicos Bacterianos/efeitos dos fármacos , Descoberta de Drogas/métodos , Aprendizado de Máquina , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Dados de Sequência Molecular , Reconhecimento Automatizado de Padrão/métodos , Peptídeos , Mapeamento de Interação de Proteínas/métodos , Relação Estrutura-Atividade

MHC-NP: predicting peptides naturally processed by the MHC.

Giguère, Sébastien; Drouin, Alexandre; Lacoste, Alexandre; Marchand, Mario; Corbeil, Jacques; Laviolette, François.

J Immunol Methods ; 400-401: 30-6, 2013 Dec 31.

Artigo em Inglês | MEDLINE | ID: mdl-24144535

RESUMO

We present MHC-NP, a tool for predicting peptides naturally processed by the MHC pathway. The method was part of the 2nd Machine Learning Competition in Immunology and yielded state-of-the-art accuracy for the prediction of peptides eluted from human HLA-A*02:01, HLA-B*07:02, HLA-B*35:01, HLA-B*44:03, HLA-B*53:01, HLA-B*57:01 and mouse H2-D(b) and H2-K(b) MHC molecules. We briefly explain the theory and motivations that have led to developing this tool. General applicability in the field of immunology and specifically epitope-based vaccine are expected. Our tool is freely available online and hosted by the Immune Epitope Database at http://tools.immuneepitope.org/mhcnp/.

Assuntos

Inteligência Artificial , Mapeamento de Epitopos/métodos , Complexo Principal de Histocompatibilidade/imunologia , Peptídeos/química , Software , Algoritmos , Animais , Apresentação de Antígeno , Antígenos H-2/química , Antígenos H-2/imunologia , Antígeno HLA-A2/química , Antígeno HLA-A2/imunologia , Antígenos HLA-B/química , Antígenos HLA-B/imunologia , Antígeno de Histocompatibilidade H-2D/química , Antígeno de Histocompatibilidade H-2D/imunologia , Humanos , Camundongos , Peptídeos/imunologia , Ligação Proteica , Vacinas

Learning a peptide-protein binding affinity predictor with kernel ridge regression.

Giguère, Sébastien; Marchand, Mario; Laviolette, François; Drouin, Alexandre; Corbeil, Jacques.

BMC Bioinformatics ; 14: 82, 2013 Mar 05.

Artigo em Inglês | MEDLINE | ID: mdl-23497081

RESUMO

BACKGROUND: The cellular function of a vast majority of proteins is performed through physical interactions with other biomolecules, which, most of the time, are other proteins. Peptides represent templates of choice for mimicking a secondary structure in order to modulate protein-protein interaction. They are thus an interesting class of therapeutics since they also display strong activity, high selectivity, low toxicity and few drug-drug interactions. Furthermore, predicting peptides that would bind to a specific MHC alleles would be of tremendous benefit to improve vaccine based therapy and possibly generate antibodies with greater affinity. Modern computational methods have the potential to accelerate and lower the cost of drug and vaccine discovery by selecting potential compounds for testing in silico prior to biological validation. RESULTS: We propose a specialized string kernel for small bio-molecules, peptides and pseudo-sequences of binding interfaces. The kernel incorporates physico-chemical properties of amino acids and elegantly generalizes eight kernels, comprised of the Oligo, the Weighted Degree, the Blended Spectrum, and the Radial Basis Function. We provide a low complexity dynamic programming algorithm for the exact computation of the kernel and a linear time algorithm for it's approximation. Combined with kernel ridge regression and SupCK, a novel binding pocket kernel, the proposed kernel yields biologically relevant and good prediction accuracy on the PepX database. For the first time, a machine learning predictor is capable of predicting the binding affinity of any peptide to any protein with reasonable accuracy. The method was also applied to both single-target and pan-specific Major Histocompatibility Complex class II benchmark datasets and three Quantitative Structure Affinity Model benchmark datasets. CONCLUSION: On all benchmarks, our method significantly (p-value ≤ 0.057) outperforms the current state-of-the-art methods at predicting peptide-protein binding affinities. The proposed approach is flexible and can be applied to predict any quantitative biological activity. Moreover, generating reliable peptide-protein binding affinities will also improve system biology modelling of interaction pathways. Lastly, the method should be of value to a large segment of the research community with the potential to accelerate the discovery of peptide-based drugs and facilitate vaccine development. The proposed kernel is freely available at http://graal.ift.ulaval.ca/downloads/gs-kernel/.

Assuntos

Inteligência Artificial , Peptídeos/química , Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas/métodos , Algoritmos , Alelos , Sítios de Ligação , Simulação por Computador , Antígenos de Histocompatibilidade Classe II/química , Antígenos de Histocompatibilidade Classe II/genética , Antígenos de Histocompatibilidade Classe II/metabolismo , Peptídeos/imunologia , Peptídeos/metabolismo

HIV-1 coreceptor usage prediction without multiple alignments: an application of string kernels.

Boisvert, Sébastien; Marchand, Mario; Laviolette, François; Corbeil, Jacques.

Retrovirology ; 5: 110, 2008 Dec 04.

Artigo em Inglês | MEDLINE | ID: mdl-19055831

RESUMO

BACKGROUND: Human immunodeficiency virus type 1 (HIV-1) infects cells by means of ligand-receptor interactions. This lentivirus uses the CD4 receptor in conjunction with a chemokine coreceptor, either CXCR4 or CCR5, to enter a target cell. HIV-1 is characterized by high sequence variability. Nonetheless, within this extensive variability, certain features must be conserved to define functions and phenotypes. The determination of coreceptor usage of HIV-1, from its protein envelope sequence, falls into a well-studied machine learning problem known as classification. The support vector machine (SVM), with string kernels, has proven to be very efficient for dealing with a wide class of classification problems ranging from text categorization to protein homology detection. In this paper, we investigate how the SVM can predict HIV-1 coreceptor usage when it is equipped with an appropriate string kernel. RESULTS: Three string kernels were compared. Accuracies of 96.35% (CCR5) 94.80% (CXCR4) and 95.15% (CCR5 and CXCR4) were achieved with the SVM equipped with the distant segments kernel on a test set of 1425 examples with a classifier built on a training set of 1425 examples. Our datasets are built with Los Alamos National Laboratory HIV Databases sequences. A web server is available at http://genome.ulaval.ca/hiv-dskernel. CONCLUSION: We examined string kernels that have been used successfully for protein homology detection and propose a new one that we call the distant segments kernel. We also show how to extract the most relevant features for HIV-1 coreceptor usage. The SVM with the distant segments kernel is currently the best method described.

Assuntos

Biologia Computacional/métodos , Receptores CCR5/química , Receptores CXCR4/química , Receptores CXCR4/genética , Receptores de HIV/química , Algoritmos , Infecções por HIV/genética , Infecções por HIV/metabolismo , Humanos , Internet , Receptores CCR5/genética , Receptores CCR5/metabolismo , Receptores CXCR4/metabolismo , Receptores de HIV/genética , Receptores de HIV/metabolismo , Homologia de Sequência de Aminoácidos , Software , Interface Usuário-Computador

Plasma concentrations of selected organobromine compounds and polychlorinated biphenyls in postmenopausal women of Québec, Canada.

Sandanger, Torkjel M; Sinotte, Marc; Dumas, Pierre; Marchand, Mario; Sandau, Courtney D; Pereg, Daria; Bérubé, Sylvie; Brisson, Jacques; Ayotte, Pierre.

Environ Health Perspect ; 115(10): 1429-34, 2007 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-17938731

RESUMO

BACKGROUND: Brominated flame retardants, especially polybrominated diphenyl ethers (PBDEs), have been widely used in North America, but little is known about the level of exposure of human populations to these compounds. OBJECTIVES: We set out to assess the internal exposure of postmenopausal Canadian women to selected organobromine compounds and to investigate factors associated with this exposure. METHODS: We measured concentrations of four PBDEs, one polybrominated biphenyl, and for comparative purposes, 41 polychlorinated biphenyl (PCB) congeners in plasma samples from 110 healthy postmenopausal women who were recruited at a mammography clinic in 2003-2004. RESULTS: PBDE-47 was the major PBDE congener, with a mean (geometric) concentration of 8.1 ng/g lipids and extreme values reaching 1,780 ng/g. By comparison, the mean concentration of the major PCB congener (PCB-153) was 41.7 ng/g and the highest value was 177 ng/g. PBDEs 47, 99, and 100 were strongly intercorrelated, but weaker correlations were noted with PBDE-153. As the sum of PBDEs (summation operatorPBDEs) increased, the relative contribution of PBDE-47 to the summation operatorPBDEs increased, whereas that of PBDE-153 decreased. PBDE-153 was the only brominated compound correlated to PCB-153. PBDE levels were not linked to any sociodemographic, anthropometric, reproductive, or lifestyle variables documented in the present study. Age and body mass index gain since the age of 18 years were significant predictors of PCB-153 plasma levels. CONCLUSION: Our results suggest that exposure to PBDE-47 likely occurs through direct contact with the penta-PBDE formulation, whereas exposure to PBDE-153 may originate in part from the food chain.

Assuntos

Compostos de Bromo/sangue , Exposição Ambiental/efeitos adversos , Hidrocarbonetos Bromados/sangue , Éteres Fenílicos/sangue , Bifenil Polibromatos/sangue , Monitoramento Ambiental , Monitoramento Epidemiológico , Feminino , Cadeia Alimentar , Éteres Difenil Halogenados , Humanos , Pessoa de Meia-Idade , Bifenilos Policlorados/sangue , Pós-Menopausa , Quebeque/epidemiologia

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA