RESUMO
BACKGROUND: The identification of genomic biomarkers is a key step towards improving diagnostic tests and therapies. We present a reference-free method for this task that relies on a k-mer representation of genomes and a machine learning algorithm that produces intelligible models. The method is computationally scalable and well-suited for whole genome sequencing studies. RESULTS: The method was validated by generating models that predict the antibiotic resistance of C. difficile, M. tuberculosis, P. aeruginosa, and S. pneumoniae for 17 antibiotics. The obtained models are accurate, faithful to the biological pathways targeted by the antibiotics, and they provide insight into the process of resistance acquisition. Moreover, a theoretical analysis of the method revealed tight statistical guarantees on the accuracy of the obtained models, supporting its relevance for genomic biomarker discovery. CONCLUSIONS: Our method allows the generation of accurate and interpretable predictive models of phenotypes, which rely on a small set of genomic variations. The method is not limited to predicting antibiotic resistance in bacteria and is applicable to a variety of organisms and phenotypes. Kover, an efficient implementation of our method, is open-source and should guide biological efforts to understand a plethora of phenotypes ( http://github.com/aldro61/kover/ ).
RESUMO
BACKGROUND: The cellular function of a vast majority of proteins is performed through physical interactions with other biomolecules, which, most of the time, are other proteins. Peptides represent templates of choice for mimicking a secondary structure in order to modulate protein-protein interaction. They are thus an interesting class of therapeutics since they also display strong activity, high selectivity, low toxicity and few drug-drug interactions. Furthermore, predicting peptides that would bind to a specific MHC alleles would be of tremendous benefit to improve vaccine based therapy and possibly generate antibodies with greater affinity. Modern computational methods have the potential to accelerate and lower the cost of drug and vaccine discovery by selecting potential compounds for testing in silico prior to biological validation. RESULTS: We propose a specialized string kernel for small bio-molecules, peptides and pseudo-sequences of binding interfaces. The kernel incorporates physico-chemical properties of amino acids and elegantly generalizes eight kernels, comprised of the Oligo, the Weighted Degree, the Blended Spectrum, and the Radial Basis Function. We provide a low complexity dynamic programming algorithm for the exact computation of the kernel and a linear time algorithm for it's approximation. Combined with kernel ridge regression and SupCK, a novel binding pocket kernel, the proposed kernel yields biologically relevant and good prediction accuracy on the PepX database. For the first time, a machine learning predictor is capable of predicting the binding affinity of any peptide to any protein with reasonable accuracy. The method was also applied to both single-target and pan-specific Major Histocompatibility Complex class II benchmark datasets and three Quantitative Structure Affinity Model benchmark datasets. CONCLUSION: On all benchmarks, our method significantly (p-value ≤ 0.057) outperforms the current state-of-the-art methods at predicting peptide-protein binding affinities. The proposed approach is flexible and can be applied to predict any quantitative biological activity. Moreover, generating reliable peptide-protein binding affinities will also improve system biology modelling of interaction pathways. Lastly, the method should be of value to a large segment of the research community with the potential to accelerate the discovery of peptide-based drugs and facilitate vaccine development. The proposed kernel is freely available at http://graal.ift.ulaval.ca/downloads/gs-kernel/.
Assuntos
Inteligência Artificial , Peptídeos/química , Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas/métodos , Algoritmos , Alelos , Sítios de Ligação , Simulação por Computador , Antígenos de Histocompatibilidade Classe II/química , Antígenos de Histocompatibilidade Classe II/genética , Antígenos de Histocompatibilidade Classe II/metabolismo , Peptídeos/imunologia , Peptídeos/metabolismoRESUMO
N-Mesyloxylactams undergo an efficient ring-contraction to N-heterocycles of various ring sizes. Yields increase with the degree of substitution α to the carbonyl. The stereochemical information of a chiral migrating carbon is conserved making this reaction a synthetically useful complement to the well-known Hofmann, Curtius, Lossen, and Schmidt rearrangements.
Assuntos
Compostos Heterocíclicos/química , Compostos Heterocíclicos/síntese química , Lactamas/química , Mesilatos/química , Estrutura Molecular , Processos Fotoquímicos , EstereoisomerismoRESUMO
We report a novel ring contraction allowing the direct conversion of N-chlorolactams to their corresponding ring-contraction N-heterocycles upon photolysis. Results show that the rearrangement occurs with a variety of N-chlorolactams and that the greater the substitution at the migrating carbon, the greater the yield of product. Importantly, stereochemistry at the migrating carbon is conserved in the product. Rearranged products were isolated as their methyl carbamates in yields varying from 17% to 58%, with the major side product being the recyclable parent lactam.
Assuntos
Lactamas/química , Nitrogênio/química , Processos FotoquímicosRESUMO
Understanding the relationship between the genome of a cell and its phenotype is a central problem in precision medicine. Nonetheless, genotype-to-phenotype prediction comes with great challenges for machine learning algorithms that limit their use in this setting. The high dimensionality of the data tends to hinder generalization and challenges the scalability of most learning algorithms. Additionally, most algorithms produce models that are complex and difficult to interpret. We alleviate these limitations by proposing strong performance guarantees, based on sample compression theory, for rule-based learning algorithms that produce highly interpretable models. We show that these guarantees can be leveraged to accelerate learning and improve model interpretability. Our approach is validated through an application to the genomic prediction of antimicrobial resistance, an important public health concern. Highly accurate models were obtained for 12 species and 56 antibiotics, and their interpretation revealed known resistance mechanisms, as well as some potentially new ones. An open-source disk-based implementation that is both memory and computationally efficient is provided with this work. The implementation is turnkey, requires no prior knowledge of machine learning, and is complemented by comprehensive tutorials.
Assuntos
Estudos de Associação Genética , Genoma/genética , Aprendizado de Máquina , Medicina de Precisão , Algoritmos , Inteligência Artificial , Genômica , Humanos , SoftwareRESUMO
In recent years, deep learning algorithms have become increasingly more prominent for their unparalleled ability to automatically learn discriminant features from large amounts of data. However, within the field of electromyography-based gesture recognition, deep learning algorithms are seldom employed as they require an unreasonable amount of effort from a single person, to generate tens of thousands of examples. This paper's hypothesis is that general, informative features can be learned from the large amounts of data generated by aggregating the signals of multiple users, thus reducing the recording burden while enhancing gesture recognition. Consequently, this paper proposes applying transfer learning on aggregated data from multiple users while leveraging the capacity of deep learning algorithms to learn discriminant features from large datasets. Two datasets comprised 19 and 17 able-bodied participants, respectively (the first one is employed for pre-training), were recorded for this work, using the Myo armband. A third Myo armband dataset was taken from the NinaPro database and is comprised ten able-bodied participants. Three different deep learning networks employing three different modalities as input (raw EMG, spectrograms, and continuous wavelet transform (CWT)) are tested on the second and third dataset. The proposed transfer learning scheme is shown to systematically and significantly enhance the performance for all three networks on the two datasets, achieving an offline accuracy of 98.31% for 7 gestures over 17 participants for the CWT-based ConvNet and 68.98% for 18 gestures over 10 participants for the raw EMG-based ConvNet. Finally, a use-case study employing eight able-bodied participants suggests that real-time feedback allows users to adapt their muscle activation strategy which reduces the degradation in accuracy normally experienced over time.
Assuntos
Aprendizado Profundo , Eletromiografia/métodos , Gestos , Algoritmos , Bases de Dados Factuais , Humanos , Redes Neurais de Computação , Transferência de Experiência , Análise de Ondaletas , Dispositivos Eletrônicos VestíveisRESUMO
Mass spectrometry is a valued method to evaluate the metabolomics content of a biological sample. The recent advent of rapid ionization technologies such as Laser Diode Thermal Desorption (LDTD) and Direct Analysis in Real Time (DART) has rendered high-throughput mass spectrometry possible. It is used for large-scale comparative analysis of populations of samples. In practice, many factors resulting from the environment, the protocol, and even the instrument itself, can lead to minor discrepancies between spectra, rendering automated comparative analysis difficult. In this work, a sequence/pipeline of algorithms to correct variations between spectra is proposed. The algorithms correct multiple spectra by identifying peaks that are common to all and, from those, computes a spectrum-specific correction. We show that these algorithms increase comparability within large datasets of spectra, facilitating comparative analysis, such as machine learning.
RESUMO
A new method for solid phase parallel synthesis of chemically and conformationally diverse macrocyclic peptidomimetics is reported. A key feature of the method is access to broad chemical and conformational diversity. Synthesis and mechanistic studies on the macrocyclization step are reported.
Assuntos
Química Farmacêutica/métodos , Peptídeos Cíclicos/química , Técnicas de Química Combinatória , Dimerização , Dipeptídeos , Modelos Químicos , Modelos Moleculares , Conformação Molecular , Mimetismo Molecular , Estrutura Molecular , Peptídeos/química , Prata/química , Estereoisomerismo , Relação Estrutura-AtividadeRESUMO
The HCV NS5B RNA dependent RNA polymerase plays an essential role in viral replication. The discovery of a novel class of inhibitors based on an N,N-disubstituted phenylalanine scaffold and structure-activity relationships studies to improve potency are described.
Assuntos
Antivirais/síntese química , Inibidores Enzimáticos/síntese química , Hepacivirus/enzimologia , Fenilalanina/análogos & derivados , Fenilalanina/síntese química , RNA Polimerase Dependente de RNA/antagonistas & inibidores , Proteínas não Estruturais Virais/antagonistas & inibidores , Antivirais/química , Inibidores Enzimáticos/química , Fenilalanina/química , Relação Estrutura-AtividadeRESUMO
We present MHC-NP, a tool for predicting peptides naturally processed by the MHC pathway. The method was part of the 2nd Machine Learning Competition in Immunology and yielded state-of-the-art accuracy for the prediction of peptides eluted from human HLA-A*02:01, HLA-B*07:02, HLA-B*35:01, HLA-B*44:03, HLA-B*53:01, HLA-B*57:01 and mouse H2-D(b) and H2-K(b) MHC molecules. We briefly explain the theory and motivations that have led to developing this tool. General applicability in the field of immunology and specifically epitope-based vaccine are expected. Our tool is freely available online and hosted by the Immune Epitope Database at http://tools.immuneepitope.org/mhcnp/.