Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 80
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Methods ; 224: 1-9, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38295891

RESUMO

The Major Histocompatibility Complex (MHC) is a critical element of the vertebrate cellular immune system, responsible for presenting peptides derived from intracellular proteins. MHC-I presentation is pivotal in the immune response and holds considerable potential in the realms of vaccine development and cancer immunotherapy. This study delves into the limitations of current methods and benchmarks for MHC-I presentation. We introduce a novel benchmark designed to assess generalization properties and the reliability of models on unseen MHC molecules and peptides, with a focus on the Human Leukocyte Antigen (HLA)-a specific subset of MHC genes present in humans. Finally, we introduce HLABERT, a pretrained language model that outperforms previous methods significantly on our benchmark and establishes a new state-of-the-art on existing benchmarks.


Assuntos
Peptídeos , Proteínas , Humanos , Reprodutibilidade dos Testes , Peptídeos/química , Proteínas/metabolismo , Complexo Principal de Histocompatibilidade/genética , Ligação Proteica
2.
Anal Chem ; 96(9): 3763-3771, 2024 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-38373058

RESUMO

This study introduces a simplified purification method for analyzing 82Se/78Se isotope ratios in diverse natural samples using hydride generation MC-ICP-MS. Unlike the thiol resin method, which is time-consuming and sensitive to the concentrations of reagents used at individual stages, our proposed alternative is quicker, simpler, and robust. The procedure involves coprecipitation of selenium with iron hydroxide and dissolution in hydrochloric acid. Combining hydride generation and a second cleanup stage achieves sufficient purification for Se isotope ratio measurements. The method is efficient, taking 3-4 h after sample decomposition, utilizing common reagents [HCl, Fe(NO3)3, NH4Cl] without evaporation or clean lab conditions. Results on 82Se/78Se isotope ratios in various matrices are presented, comparing them with literature data. All isotopic results have been subjected to a newly proposed state-of-the-art approach to uncertainty estimation dedicated to isotope ratio measurements. The approach is based on applying Monte Carlo simulations with consideration of different samples' results normalized by the expected value. By doing that, we obtained estimated uncertainty for any Se sample with the influence of particular measurements on the final estimation included. We employ a Monte Carlo simulation-based uncertainty estimation approach for isotope ratio measurements, providing estimated uncertainty for each selenium sample.

3.
Anal Chem ; 96(1): 188-196, 2024 Jan 09.
Artigo em Inglês | MEDLINE | ID: mdl-38117933

RESUMO

1H NMR spectroscopy is a powerful tool for analyzing mixtures including determining the concentrations of individual components. When signals from multiple compounds overlap, this task requires computational solutions. They are typically based on peak-picking and the comparison of obtained peak lists with libraries of individual components. This can fail if peaks are not sufficiently resolved or when peak positions differ between the library and the mixture. In this paper, we present Magnetstein, a quantification algorithm rooted in the optimal transport theory that makes it robust to unexpected frequency shifts and overlapping signals. Thanks to this, Magnetstein can quantitatively analyze difficult spectra with the estimation trueness an order of magnitude higher than that of commercial tools. Furthermore, the method is easier to use than other approaches, having only two parameters with default values applicable to a broad range of experiments and requiring little to no preprocessing of the spectra.

4.
Anal Chem ; 96(23): 9343-9352, 2024 Jun 11.
Artigo em Inglês | MEDLINE | ID: mdl-38804718

RESUMO

Oligonucleotide therapeutics have emerged as an important class of drugs offering targeted therapeutic strategies that complement traditional modalities, such as monoclonal antibodies and small molecules. Their unique ability to precisely modulate gene expression makes them vital for addressing previously undruggable targets. A critical aspect of developing these therapies is characterizing their molecular composition accurately. This includes determining the monoisotopic mass of oligonucleotides, which is essential for identifying impurities, degradants, and modifications that can affect the drug efficacy and safety. Mass spectrometry (MS) plays a pivotal role in this process, yet the accurate interpretation of complex mass spectra remains challenging, especially for large molecules, where the monoisotopic peak is often undetectable. To address this issue, we have adapted the MIND algorithm, originally developed for top-down proteomics, for use with oligonucleotide data. This adaptation allows for the prediction of monoisotopic mass from the more readily detectable, most-abundant peak mass, enhancing the ability to annotate complex spectra of oligonucleotides. Our comprehensive validation of this modified algorithm on both in silico and real-world oligonucleotide data sets has demonstrated its effectiveness and reliability. To facilitate wider adoption of this advanced analytical technique, we have encapsulated the enhanced MIND algorithm in a user-friendly Shiny application. This online platform simplifies the process of annotating complex oligonucleotide spectra, making advanced mass spectrometry analysis accessible to researchers and drug developers. The application is available at https://valkenborg-lab.shinyapps.io/mind4oligos/.


Assuntos
Algoritmos , Espectrometria de Massas , Oligonucleotídeos , Oligonucleotídeos/análise , Espectrometria de Massas/métodos , Peso Molecular
5.
Nucleic Acids Res ; 50(W1): W744-W752, 2022 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-35524567

RESUMO

In recent years great progress has been made in identification of structural variants (SV) in the human genome. However, the interpretation of SVs, especially located in non-coding DNA, remains challenging. One of the reasons stems in the lack of tools exclusively designed for clinical SVs evaluation acknowledging the 3D chromatin architecture. Therefore, we present TADeus2 a web server dedicated for a quick investigation of chromatin conformation changes, providing a visual framework for the interpretation of SVs affecting topologically associating domains (TADs). This tool provides a convenient visual inspection of SVs, both in a continuous genome view as well as from a rearrangement's breakpoint perspective. Additionally, TADeus2 allows the user to assess the influence of analyzed SVs within flaking coding/non-coding regions based on the Hi-C matrix. Importantly, the SVs pathogenicity is quantified and ranked using TADA, ClassifyCNV tools and sampling-based P-value. TADeus2 is publicly available at https://tadeus2.mimuw.edu.pl.


Assuntos
Cromatina , DNA , Humanos , Cromatina/genética , Cromossomos , Genoma Humano
6.
Methods ; 203: 584-593, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35085741

RESUMO

After more than one and a half year since the COVID-19 pandemics outbreak the scientific world is constantly trying to understand its dynamics. In this paper of the case fatality rates (CFR) for COVID-19 we study the historic data regarding mortality in Poland during the first six months of pandemic, when no SARS-CoV-2 variants of concern were present among infected. To this end, we apply competing risk models to perform both uni- and multivariate analyses on specific subpopulations selected by different factors including the key indicators: age, sex, hospitalization. The study explores the case fatality rate to find out its decreasing trend in time. Furthermore, we describe the differences in mortality among hospitalized and other cases indicating a sudden increase of mortality among hospitalized cases at the end of the 2020 spring season. Exploratory and multivariate analysis revealed the real impact of each variable and besides the expected factors indicating increased mortality (age, comorbidities) we track more non-obvious indicators. Recent medical care as well as the identification of the source contact, independently of the comorbidities, significantly impact an individual mortality risk. As a result, the study provides a twofold insight into the COVID-19 mortality in Poland. On one hand we explore mortality in different groups with respect to different variables, on the other we indicate novel factors that may be crucial in reducing mortality. The later can be coped, e.g. by more efficient contact tracing and proper organization and management of the health care system to accompany those who need medical care independently of comorbidities or COVID-19 infection.


Assuntos
COVID-19 , SARS-CoV-2 , COVID-19/epidemiologia , Busca de Comunicante , Humanos , Pandemias , Polônia/epidemiologia
7.
BMC Genomics ; 23(Suppl 6): 616, 2022 Aug 25.
Artigo em Inglês | MEDLINE | ID: mdl-36008753

RESUMO

BACKGROUND: The reduction of the chromosome number from 48 in the Great Apes to 46 in modern humans is thought to result from the end-to-end fusion of two ancestral non-human primate chromosomes forming the human chromosome 2 (HSA2). Genomic signatures of this event are the presence of inverted telomeric repeats at the HSA2 fusion site and a block of degenerate satellite sequences that mark the remnants of the ancestral centromere. It has been estimated that this fusion arose up to 4.5 million years ago (Mya). RESULTS: We have developed an enhanced algorithm for the detection and efficient counting of the locally over-represented weak-to-strong (AT to GC) substitutions. By analyzing the enrichment of these substitutions around the fusion site of HSA2 we estimated its formation time at 0.9 Mya with a 95% confidence interval of 0.4-1.5 Mya. Additionally, based on the statistics derived from our algorithm, we have reconstructed the evolutionary distances among the Great Apes (Hominoidea). CONCLUSIONS: Our results shed light on the HSA2 fusion formation and provide a novel computational alternative for the estimation of the speciation chronology.


Assuntos
Evolução Molecular , Hominidae , Animais , Centrômero/genética , Cromossomos Humanos , Genoma , Hominidae/genética , Humanos
8.
Bioinformatics ; 36(3): 953-955, 2020 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-31504154

RESUMO

SUMMARY: The biggest hurdle in studying topology in biopolymers is the steep learning curve for actually seeing the knots in structure visualization. Knot_pull is a command line utility designed to simplify this process-it presents the user with a smoothing trajectory for provided structures (any number and length of protein, RNA or chromatin chains in PDB, CIF or XYZ format), and calculates the knot type (including presence of any links, and slipknots when a subchain is specified). AVAILABILITY AND IMPLEMENTATION: Knot_pull works under Python >=2.7 and is system independent. Source code and documentation are available at http://github.com/dzarmola/knot_pull under GNU GPL license and include also a wrapper script for PyMOL for easier visualization. Examples of smoothing trajectories can be found at: https://www.youtube.com/watch?v=IzSGDfc1vAY. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteínas , Software , Biopolímeros
9.
Rapid Commun Mass Spectrom ; : e8956, 2020 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-32996651

RESUMO

RATIONALE: The linear regression of mass spectra is a computational problem defined as fitting a linear combination of reference spectra to an experimental one. It is typically used to estimate the relative quantities of selected ions. In this work, we study this problem in an abstract setting to develop new approaches applicable to a diverse range of experiments. METHODS: To overcome the sensitivity of the ordinary least-squares regression to measurement inaccuracies, we base our methods on a non-conventional spectral dissimilarity measure, known as the Wasserstein or the Earth Mover's distance. This distance is based on the notion of the cost of transporting signal between mass spectra, which renders it naturally robust to measurement inaccuracies in the mass domain. RESULTS: Using a data set of 200 mass spectra, we show that our approach is capable of estimating ion proportions accurately without extensive preprocessing of spectra required by other methods. The conclusions are further substantiated using data sets simulated in a way that mimics most of the measurement inaccuracies occurring in real experiments. CONCLUSIONS: We have developed a linear regression algorithm based on the notion of the cost of transporting signal between spectra. Our implementation is available in a Python 3 package called masserstein, which is freely available at https://github.com/mciach/masserstein.

10.
J Med Genet ; 56(2): 104-112, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30352868

RESUMO

BACKGROUND: Mapping the breakpoints in de novo balanced chromosomal translocations (BCT) in symptomatic individuals provides a unique opportunity to identify in an unbiased way the likely causative genetic defect and thus find novel human disease candidate genes. Our aim was to fine-map breakpoints of de novo BCTs in a case series of nine patients. METHODS: Shallow whole-genome mate pair sequencing (SGMPS) together with long-range PCR and Sanger sequencing. In one case (BCT disrupting BAHD1 and RET) cDNA analysis was used to verify expression of a fusion transcript in cultured fibroblasts. RESULTS: In all nine probands 11 disrupted genes were found, that is, EFNA5, EBF3, LARGE, PPP2R5E, TXNDC5, ZNF423, NIPBL, BAHD1, RET, TRPS1 and SLC4A10. Five subjects had translocations that disrupted genes with so far unknown (EFNA5, BAHD1, PPP2R5E, TXNDC5) or poorly delineated impact on the phenotype (SLC4A10, two previous reports of BCT disrupting the gene). The four genes with no previous disease associations (EFNA5, BAHD1, PPP2R5E, TXNDC5), when compared with all human genes by a bootstrap test, had significantly higher pLI (p<0.017) and DOMINO (p<0.02) scores indicating enrichment in genes likely to be intolerant to single copy damage. Inspection of individual pLI and DOMINO scores, and local topologically associating domain structure suggested that EFNA5, BAHD1 and PPP2R5E were particularly good candidates for novel disease loci. The pathomechanism for BAHD1 may involve deregulation of expression due to fusion with RET promoter. CONCLUSION: SGMPS in symptomatic carriers of BCTs is a powerful approach to delineate novel human gene-disease associations.


Assuntos
Proteínas Cromossômicas não Histona/genética , Pontos de Quebra do Cromossomo , Transtornos Cromossômicos/genética , Efrina-A5/genética , Proteína Fosfatase 2/genética , Translocação Genética , Sequenciamento Completo do Genoma/métodos , Adolescente , Adulto , Criança , Pré-Escolar , Feminino , Humanos , Lactente , Masculino , Adulto Jovem
11.
Entropy (Basel) ; 22(11)2020 Oct 31.
Artigo em Inglês | MEDLINE | ID: mdl-33287006

RESUMO

The constantly and rapidly increasing amount of the biological data gained from many different high-throughput experiments opens up new possibilities for data- and model-driven inference. Yet, alongside, emerges a problem of risks related to data integration techniques. The latter are not so widely taken account of. Especially, the approaches based on the flux balance analysis (FBA) are sensitive to the structure of a metabolic network for which the low-entropy clusters can prevent the inference from the activity of the metabolic reactions. In the following article, we set forth problems that may arise during the integration of metabolomic data with gene expression datasets. We analyze common pitfalls, provide their possible solutions, and exemplify them by a case study of the renal cell carcinoma (RCC). Using the proposed approach we provide a metabolic description of the known morphological RCC subtypes and suggest a possible existence of the poor-prognosis cluster of patients, which are commonly characterized by the low activity of the drug transporting enzymes crucial in the chemotherapy. This discovery suits and extends the already known poor-prognosis characteristics of RCC. Finally, the goal of this work is also to point out the problem that arises from the integration of high-throughput data with the inherently nonuniform, manually curated low-throughput data. In such cases, the over-represented information may potentially overshadow the non-trivial discoveries.

12.
BMC Bioinformatics ; 20(Suppl 15): 644, 2019 Dec 24.
Artigo em Inglês | MEDLINE | ID: mdl-31874610

RESUMO

BACKGROUND: A survey of presences and absences of specific species across multiple biogeographic units (or bioregions) are used in a broad area of biological studies from ecology to microbiology. Using binary presence-absence data, we evaluate species co-occurrences that help elucidate relationships among organisms and environments. To summarize similarity between occurrences of species, we routinely use the Jaccard/Tanimoto coefficient, which is the ratio of their intersection to their union. It is natural, then, to identify statistically significant Jaccard/Tanimoto coefficients, which suggest non-random co-occurrences of species. However, statistical hypothesis testing using this similarity coefficient has been seldom used or studied. RESULTS: We introduce a hypothesis test for similarity for biological presence-absence data, using the Jaccard/Tanimoto coefficient. Several key improvements are presented including unbiased estimation of expectation and centered Jaccard/Tanimoto coefficients, that account for occurrence probabilities. The exact and asymptotic solutions are derived. To overcome a computational burden due to high-dimensionality, we propose the bootstrap and measurement concentration algorithms to efficiently estimate statistical significance of binary similarity. Comprehensive simulation studies demonstrate that our proposed methods produce accurate p-values and false discovery rates. The proposed estimation methods are orders of magnitude faster than the exact solution, particularly with an increasing dimensionality. We showcase their applications in evaluating co-occurrences of bird species in 28 islands of Vanuatu and fish species in 3347 freshwater habitats in France. The proposed methods are implemented in an open source R package called jaccard (https://cran.r-project.org/package=jaccard). CONCLUSION: We introduce a suite of statistical methods for the Jaccard/Tanimoto similarity coefficient for binary data, that enable straightforward incorporation of probabilistic measures in analysis for species co-occurrences. Due to their generality, the proposed methods and implementations are applicable to a wide range of binary data arising from genomics, biochemistry, and other areas of science.


Assuntos
Biologia de Ecossistemas de Água Doce/métodos , Algoritmos , Animais , Biometria , Peixes , Probabilidade
13.
Anal Chem ; 91(3): 1801-1807, 2019 02 05.
Artigo em Inglês | MEDLINE | ID: mdl-30608646

RESUMO

Top-down mass spectrometry methods are becoming continuously more popular in the effort to describe the proteome. They rely on the fragmentation of intact protein ions inside the mass spectrometer. Among the existing fragmentation methods, electron transfer dissociation is known for its precision and wide coverage of different cleavage sites. However, several side reactions can occur under electron transfer dissociation (ETD) conditions, including nondissociative electron transfer and proton transfer reaction. Evaluating their extent can provide more insight into reaction kinetics as well as instrument operation. Furthermore, preferential formation of certain reaction products can reveal important structural information. To the best of our knowledge, there are currently no tools capable of tracing and analyzing the products of these reactions in a systematic way. In this Article, we present in detail masstodon: a computer program for assigning peaks and interpreting mass spectra. Besides being a general purpose tool, masstodon also offers the possibility to trace the products of reactions occurring under ETD conditions and provides insights into the parameters driving them. It is available free of charge under the GNU AGPL V3 public license.


Assuntos
Apolipoproteína A-I/análise , Espectrometria de Massas/estatística & dados numéricos , Software , Substância P/análise , Ubiquitina/análise , Algoritmos , Elétrons
14.
Anal Chem ; 91(15): 10310-10319, 2019 08 06.
Artigo em Inglês | MEDLINE | ID: mdl-31283196

RESUMO

Top-down proteomics approaches are becoming ever more popular, due to the advantages offered by knowledge of the intact protein mass in correctly identifying the various proteoforms that potentially arise due to point mutation, alternative splicing, post-translational modifications, etc. Usually, the average mass is used in this context; however, it is known that this can fluctuate significantly due to both natural and technical causes. Ideally, one would prefer to use the monoisotopic precursor mass, but this falls below the detection limit for all but the smallest proteins. Methods that predict the monoisotopic mass based on the average mass are potentially affected by imprecisions associated with the average mass. To address this issue, we have developed a framework based on simple, linear models that allows prediction of the monoisotopic mass based on the exact mass of the most-abundant (aggregated) isotope peak, which is a robust measure of mass, insensitive to the aforementioned natural and technical causes. This linear model was tested experimentally, as well as in silico, and typically predicts monoisotopic masses with an accuracy of only a few parts per million. A confidence measure is associated with the predicted monoisotopic mass to handle the off-by-one-Da prediction error. Furthermore, we introduce a correction function to extract the "true" (i.e., theoretically) most-abundant isotope peak from a spectrum, even if the observed isotope distribution is distorted by noise or poor ion statistics. The method is available online as an R shiny app: https://valkenborg-lab.shinyapps.io/mind/.


Assuntos
Algoritmos , Cromatografia Líquida/métodos , Modelos Estatísticos , Proteínas/análise , Proteoma/análise , Espectrometria de Massas em Tandem/métodos , Humanos , Processamento de Proteína Pós-Traducional , Proteínas/metabolismo
15.
J Theor Biol ; 478: 74-101, 2019 10 07.
Artigo em Inglês | MEDLINE | ID: mdl-31181241

RESUMO

A proper response to rapid environmental changes is essential for cell survival and requires efficient modifications in the pattern of gene expression. In this respect, a prominent example is Hsp70, a chaperone protein whose synthesis is dynamically regulated in stress conditions. In this paper, we expand a formal model of Hsp70 heat induction originally proposed in previous articles. To accurately capture various modes of heat shock effects, we not only introduce temperature dependencies in transcription to Hsp70 mRNA and in dissociation of transcriptional complexes, but we also derive a new formal expression for the temperature dependence in protein denaturation. We calibrate our model using comprehensive sets of both previously published experimental data and also biologically justified constraints. Interestingly, we obtain a biologically plausible temperature dependence of the transcriptional complex dissociation, despite the lack of biological constraints imposed in the calibration process. Finally, based on a sensitivity analysis of the model carried out in both deterministic and stochastic settings, we suggest that the regulation of the binding of transcriptional complexes plays a key role in Hsp70 induction upon heat shock. In conclusion, we provide a model that is able to capture the essential dynamics of the Hsp70 heat induction whilst being biologically sound in terms of temperature dependencies, description of protein denaturation and imposed calibration constraints.


Assuntos
Proteínas de Choque Térmico HSP70/metabolismo , Resposta ao Choque Térmico , Modelos Biológicos , Cinética , Desnaturação Proteica , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Temperatura
16.
Hum Mutat ; 39(12): 1916-1925, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-30084155

RESUMO

Transposable elements modify human genome by inserting into new loci or by mediating homology-, microhomology-, or homeology-driven DNA recombination or repair, resulting in genomic structural variation. Alveolar capillary dysplasia with misalignment of pulmonary veins (ACDMPV) is a rare lethal neonatal developmental lung disorder caused by point mutations or copy-number variant (CNV) deletions of FOXF1 or its distant tissue-specific enhancer. Eighty-five percent of 45 ACDMPV-causative CNV deletions, of which junctions have been sequenced, had at least one of their two breakpoints located in a retrotransposon, with more than half of them being Alu elements. We describe a novel ∼35 kb-large genomic instability hotspot at 16q24.1, involving two evolutionarily young LINE-1 (L1) elements, L1PA2 and L1PA3, flanking AluY, two AluSx, AluSx1, and AluJr elements. The occurrence of L1s at this location coincided with the branching out of the Homo-Pan-Gorilla clade, and was preceded by the insertion of AluSx, AluSx1, and AluJr. Our data show that, in addition to mediating recurrent CNVs, L1 and Alu retrotransposons can predispose the human genome to formation of variably sized CNVs, both of clinical and evolutionary relevance. Nonetheless, epigenetic or other genomic features of this locus might also contribute to its increased instability.


Assuntos
Cromossomos Humanos Par 16/genética , Variações do Número de Cópias de DNA , Instabilidade Genômica , Síndrome da Persistência do Padrão de Circulação Fetal/genética , Elementos Alu , Evolução Molecular , Fatores de Transcrição Forkhead/genética , Predisposição Genética para Doença , Humanos , Elementos Nucleotídeos Longos e Dispersos , Linhagem , Mutação Puntual
17.
Plant Physiol ; 175(4): 1634-1648, 2017 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-29018097

RESUMO

In this work, we studied the changes in high-light tolerance and photosynthetic activity in leaves of the Arabidopsis (Arabidopsis thaliana) rosette throughout the vegetative stage of growth. We implemented an image-analysis work flow to analyze the capacity of both the whole plant and individual leaves to cope with excess excitation energy by following the changes in absorbed light energy partitioning. The data show that leaf and plant age are both important factors influencing the fate of excitation energy. During the dark-to-light transition, the age of the plant affects mostly steady-state levels of photochemical and nonphotochemical quenching, leading to an increased photosynthetic performance of its leaves. The age of the leaf affects the induction kinetics of nonphotochemical quenching. These observations were confirmed using model selection procedures. We further investigated how different leaves on a rosette acclimate to high light and show that younger leaves are less prone to photoinhibition than older leaves. Our results stress that both plant and leaf age should be taken into consideration during the quantification of photosynthetic and photoprotective traits to produce repeatable and reliable results.


Assuntos
Arabidopsis/fisiologia , Luz , Fotossíntese/fisiologia , Folhas de Planta/fisiologia , Aclimatação , Arabidopsis/crescimento & desenvolvimento , Clorofila , Metabolismo Energético , Modelos Biológicos , Fatores de Tempo
18.
BMC Bioinformatics ; 18(Suppl 12): 422, 2017 Oct 16.
Artigo em Inglês | MEDLINE | ID: mdl-29072141

RESUMO

BACKGROUND: The constant progress in sequencing technology leads to ever increasing amounts of genomic data. In the light of current evidence transposable elements (TEs for short) are becoming useful tools for learning about the evolution of host genome. Therefore the software for genome-wide detection and analysis of TEs is of great interest. RESULTS: Here we describe the computational tool for mining, classifying and storing TEs from newly sequenced genomes. This is an online, web-based, user-friendly service, enabling users to upload their own genomic data, and perform de-novo searches for TEs. The detected TEs are automatically analyzed, compared to reference databases, annotated, clustered into families, and stored in TEs repository. Also, the genome-wide nesting structure of found elements are detected and analyzed by new method for inferring evolutionary history of TEs. We illustrate the functionality of our tool by performing a full-scale analyses of TE landscape in Medicago truncatula genome. CONCLUSIONS: TRANScendence is an effective tool for the de-novo annotation and classification of transposable elements in newly-acquired genomes. Its streamlined interface makes it well-suited for evolutionary studies.


Assuntos
Elementos de DNA Transponíveis/genética , Mineração de Dados , Bases de Dados Genéticas , Software , Algoritmos , Animais , Drosophila melanogaster/genética , Genoma Humano , Humanos , Modelos Teóricos , Reprodutibilidade dos Testes
19.
Anal Chem ; 89(6): 3272-3277, 2017 03 21.
Artigo em Inglês | MEDLINE | ID: mdl-28234451

RESUMO

As high-resolution mass spectrometry (HRMS) becomes increasingly available, the need of software tools capable of handling more complex data is surging. The complexity of the HRMS data stems partly from the presence of isotopes that give rise to more peaks to interpret compared to lower resolution instruments. However, a new generation of fine isotope calculators is on the rise. They calculate the smallest possible sets of isotopologues. However, none of these calculators lets the user specify the joint probability of the revealed envelope in advance. Instead, the user must provide a lower limit on the probability of isotopologues of interest, that is, provide minimal peak height. The choice of such threshold is far from obvious. In particular, it is impossible to a priori balance the trade-off between the algorithm speed and the portion of the revealed theoretical spectrum. We show that this leads to considerable inefficiencies. Here, we present IsoSpec: an algorithm for fast computation of isotopologues of chemical substances that can alternate between joint probability and peak height threshold. We prove that IsoSpec is optimal in terms of time complexity. Its implementation is freely available under a 2-clause BSD license, with bindings for C++, C, R, and Python.

20.
Nucleic Acids Res ; 43(4): 2188-98, 2015 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-25613453

RESUMO

Nonallelic homologous recombination (NAHR), occurring between low-copy repeats (LCRs) >10 kb in size and sharing >97% DNA sequence identity, is responsible for the majority of recurrent genomic rearrangements in the human genome. Recent studies have shown that transposable elements (TEs) can also mediate recurrent deletions and translocations, indicating the features of substrates that mediate NAHR may be significantly less stringent than previously believed. Using >4 kb length and >95% sequence identity criteria, we analyzed of the genome-wide distribution of long interspersed element (LINE) retrotransposon and their potential to mediate NAHR. We identified 17 005 directly oriented LINE pairs located <10 Mbp from each other as potential NAHR substrates, placing 82.8% of the human genome at risk of LINE-LINE-mediated instability. Cross-referencing these regions with CNVs in the Baylor College of Medicine clinical chromosomal microarray database of 36 285 patients, we identified 516 CNVs potentially mediated by LINEs. Using long-range PCR of five different genomic regions in a total of 44 patients, we confirmed that the CNV breakpoints in each patient map within the LINE elements. To additionally assess the scale of LINE-LINE/NAHR phenomenon in the human genome, we tested DNA samples from six healthy individuals on a custom aCGH microarray targeting LINE elements predicted to mediate CNVs and identified 25 LINE-LINE rearrangements. Our data indicate that LINE-LINE-mediated NAHR is widespread and under-recognized, and is an important mechanism of structural rearrangement contributing to human genomic variability.


Assuntos
Genoma Humano , Recombinação Homóloga , Elementos Nucleotídeos Longos e Dispersos , Algoritmos , Pontos de Quebra do Cromossomo , Hibridização Genômica Comparativa , Variações do Número de Cópias de DNA , Genômica/métodos , Humanos , Reação em Cadeia da Polimerase
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA