Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 40
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Epilepsia ; 63(7): 1693-1703, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35460272

RESUMO

OBJECTIVE: Antiseizure drugs (ASDs) modulate synaptic and ion channel function to prevent abnormal hypersynchronous or excitatory activity arising in neuronal networks, but the relationship between ASDs with respect to their impact on network activity is poorly defined. In this study, we first investigated whether different ASD classes exert differential impact upon network activity, and we then sought to classify ASDs according to their impact on network activity. METHODS: We used multielectrode arrays (MEAs) to record the network activity of cultured cortical neurons after applying ASDs from two classes: sodium channel blockers (SCBs) and γ-aminobutyric acid type A receptor-positive allosteric modulators (GABA PAMs). A two-dimensional representation of changes in network features was then derived, and the ability of this low-dimensional representation to classify ASDs with different molecular targets was assessed. RESULTS: A two-dimensional representation of network features revealed a separation between the SCB and GABA PAM drug classes, and could classify several test compounds known to act through these molecular targets. Interestingly, several ASDs with novel targets, such as cannabidiol and retigabine, had closer similarity to the SCB class with respect to their impact upon network activity. SIGNIFICANCE: These results demonstrate that the molecular target of two common classes of ASDs is reflected through characteristic changes in network activity of cultured neurons. Furthermore, a low-dimensional representation of network features can be used to infer an ASDs molecular target. This approach may allow for drug screening to be performed based on features extracted from MEA recordings.


Assuntos
Neurônios , Aprendizado de Máquina não Supervisionado , Neurônios/fisiologia , Receptores de GABA , Bloqueadores dos Canais de Sódio , Ácido gama-Aminobutírico
2.
BMC Bioinformatics ; 19(Suppl 13): 377, 2019 Feb 04.
Artigo em Inglês | MEDLINE | ID: mdl-30717665

RESUMO

BACKGROUND: Estimating the parameters that describe the ecology of viruses,particularly those that are novel, can be made possible using metagenomic approaches. However, the best-performing existing methods require databases to first estimate an average genome length of a viral community before being able to estimate other parameters, such as viral richness. Although this approach has been widely used, it can adversely skew results since the majority of viruses are yet to be catalogued in databases. RESULTS: In this paper, we present ENVirT, a method for estimating the richness of novel viral mixtures, and for the first time we also show that it is possible to simultaneously estimate the average genome length without a priori information. This is shown to be a significant improvement over database-dependent methods, since we can now robustly analyze samples that may include novel viral types under-represented in current databases. We demonstrate that the viral richness estimates produced by ENVirT are several orders of magnitude higher in accuracy than the estimates produced by existing methods named PHACCS and CatchAll when benchmarked against simulated data. We repeated the analysis of 20 metavirome samples using ENVirT, which produced results in close agreement with complementary in virto analyses. CONCLUSIONS: These insights were previously not captured by existing computational methods. As such, ENVirT is shown to be an essential tool for enhancing our understanding of novel viral populations.


Assuntos
Algoritmos , Fenômenos Ecológicos e Ambientais , Metagenômica , Simulação por Computador , Alimentos Fermentados , Microbioma Gastrointestinal , Genoma Viral , Humanos , Lagos/virologia , Fatores de Tempo , Vírus/genética
3.
BMC Bioinformatics ; 19(1): 129, 2018 04 11.
Artigo em Inglês | MEDLINE | ID: mdl-29642848

RESUMO

BACKGROUND: Drug repositioning is the process of identifying new uses for existing drugs. Computational drug repositioning methods can reduce the time, costs and risks of drug development by automating the analysis of the relationships in pharmacology networks. Pharmacology networks are large and heterogeneous. Clustering drugs into small groups can simplify large pharmacology networks, these subgroups can also be used as a starting point for repositioning drugs. In this paper, we propose a two-tiered drug-centric unsupervised clustering approach for drug repositioning, integrating heterogeneous drug data profiles: drug-chemical, drug-disease, drug-gene, drug-protein and drug-side effect relationships. RESULTS: The proposed drug repositioning approach is threefold; (i) clustering drugs based on their homogeneous profiles using the Growing Self Organizing Map (GSOM); (ii) clustering drugs based on drug-drug relation matrices based on the previous step, considering three state-of-the-art graph clustering methods; and (iii) inferring drug repositioning candidates and assigning a confidence value for each identified candidate. In this paper, we compare our two-tiered clustering approach against two existing heterogeneous data integration approaches with reference to the Anatomical Therapeutic Chemical (ATC) classification, using GSOM. Our approach yields Normalized Mutual Information (NMI) and Standardized Mutual Information (SMI) of 0.66 and 36.11, respectively, while the two existing methods yield NMI of 0.60 and 0.64 and SMI of 22.26 and 33.59. Moreover, the two existing approaches failed to produce useful cluster separations when using graph clustering algorithms while our approach is able to identify useful clusters for drug repositioning. Furthermore, we provide clinical evidence for four predicted results (Chlorthalidone, Indomethacin, Metformin and Thioridazine) to support that our proposed approach can be reliably used to infer ATC code and drug repositioning. CONCLUSION: The proposed two-tiered unsupervised clustering approach is suitable for drug clustering and enables heterogeneous data integration. It also enables identifying reliable repositioning drug candidates with reference to ATC therapeutic classification. The repositioning drug candidates identified consistently by multiple clustering algorithms and with high confidence have a higher possibility of being effective repositioning candidates.


Assuntos
Reposicionamento de Medicamentos , Estatística como Assunto , Algoritmos , Análise por Conglomerados , Biologia Computacional , Humanos , Preparações Farmacêuticas/classificação
4.
BMC Bioinformatics ; 18(Suppl 16): 551, 2017 12 28.
Artigo em Inglês | MEDLINE | ID: mdl-29297291

RESUMO

BACKGROUND: Cancer constitutes a momentous health burden in our society. Critical information on cancer may be hidden in its signaling pathways. However, even though a large amount of money has been spent on cancer research, some critical information on cancer-related signaling pathways still remains elusive. Hence, new works towards a complete understanding of cancer-related signaling pathways will greatly benefit the prevention, diagnosis, and treatment of cancer. RESULTS: We propose the node-weighted Steiner tree approach to identify important elements of cancer-related signaling pathways at the level of proteins. This new approach has advantages over previous approaches since it is fast in processing large protein-protein interaction networks. We apply this new approach to identify important elements of two well-known cancer-related signaling pathways: PI3K/Akt and MAPK. First, we generate a node-weighted protein-protein interaction network using protein and signaling pathway data. Second, we modify and use two preprocessing techniques and a state-of-the-art Steiner tree algorithm to identify a subnetwork in the generated network. Third, we propose two new metrics to select important elements from this subnetwork. On a commonly used personal computer, this new approach takes less than 2 s to identify the important elements of PI3K/Akt and MAPK signaling pathways in a large node-weighted protein-protein interaction network with 16,843 vertices and 1,736,922 edges. We further analyze and demonstrate the significance of these identified elements to cancer signal transduction by exploring previously reported experimental evidences. CONCLUSIONS: Our node-weighted Steiner tree approach is shown to be both fast and effective to identify important elements of cancer-related signaling pathways. Furthermore, it may provide new perspectives into the identification of signaling pathways for other human diseases.


Assuntos
Biologia Computacional/métodos , Neoplasias/genética , Mapas de Interação de Proteínas/genética , Algoritmos , Humanos , Transdução de Sinais
5.
BMC Bioinformatics ; 18(1): 140, 2017 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-28249566

RESUMO

BACKGROUND: Investigating and understanding drug-drug interactions (DDIs) is important in improving the effectiveness of clinical care. DDIs can occur when two or more drugs are administered together. Experimentally based DDI detection methods require a large cost and time. Hence, there is a great interest in developing efficient and useful computational methods for inferring potential DDIs. Standard binary classifiers require both positives and negatives for training. In a DDI context, drug pairs that are known to interact can serve as positives for predictive methods. But, the negatives or drug pairs that have been confirmed to have no interaction are scarce. To address this lack of negatives, we introduce a Positive-Unlabeled Learning method for inferring potential DDIs. RESULTS: The proposed method consists of three steps: i) application of Growing Self Organizing Maps to infer negatives from the unlabeled dataset; ii) using a pairwise similarity function to quantify the overlap between individual features of drugs and iii) using support vector machine classifier for inferring DDIs. We obtained 6036 DDIs from DrugBank database. Using the proposed approach, we inferred 589 drug pairs that are likely to not interact with each other; these drug pairs are used as representative data for the negative class in binary classification for DDI prediction. Moreover, we classify the predicted DDIs as Cytochrome P450 (CYP) enzyme-Dependent and CYP-Independent interactions invoking their locations on the Growing Self Organizing Map, due to the particular importance of these enzymes in clinically significant interaction effects. Further, we provide a case study on three predicted CYP-Dependent DDIs to evaluate the clinical relevance of this study. CONCLUSION: Our proposed approach showed an absolute improvement in F1-score of 14 and 38% in comparison to the method that randomly selects unlabeled data points as likely negatives, depending on the choice of similarity function. We inferred 5300 possible CYP-Dependent DDIs and 592 CYP-Independent DDIs with the highest posterior probabilities. Our discoveries can be used to improve clinical care as well as the research outcomes of drug development.


Assuntos
Interações Medicamentosas/fisiologia , Preparações Farmacêuticas/metabolismo , Máquina de Vetores de Suporte , Análise por Conglomerados , Sistema Enzimático do Citocromo P-450/metabolismo , Bases de Dados Factuais , Humanos , Preparações Farmacêuticas/química , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo
6.
BMC Bioinformatics ; 18(Suppl 16): 571, 2017 12 28.
Artigo em Inglês | MEDLINE | ID: mdl-29297295

RESUMO

BACKGROUND: In metagenomics, the separation of nucleotide sequences belonging to an individual or closely matched populations is termed binning. Binning helps the evaluation of underlying microbial population structure as well as the recovery of individual genomes from a sample of uncultivable microbial organisms. Both supervised and unsupervised learning methods have been employed in binning; however, characterizing a metagenomic sample containing multiple strains remains a significant challenge. In this study, we designed and implemented a new workflow, Coverage and composition based binning of Metagenomes (CoMet), for binning contigs in a single metagenomic sample. CoMet utilizes coverage values and the compositional features of metagenomic contigs. The binning strategy in CoMet includes the initial grouping of contigs in guanine-cytosine (GC) content-coverage space and refinement of bins in tetranucleotide frequencies space in a purely unsupervised manner. With CoMet, the clustering algorithm DBSCAN is employed for binning contigs. The performances of CoMet were compared against four existing approaches for binning a single metagenomic sample, including MaxBin, Metawatt, MyCC (default) and MyCC (coverage) using multiple datasets including a sample comprised of multiple strains. RESULTS: Binning methods based on both compositional features and coverages of contigs had higher performances than the method which is based only on compositional features of contigs. CoMet yielded higher or comparable precision in comparison to the existing binning methods on benchmark datasets of varying complexities. MyCC (coverage) had the highest ranking score in F1-score. However, the performances of CoMet were higher than MyCC (coverage) on the dataset containing multiple strains. Furthermore, CoMet recovered contigs of more species and was 18 - 39% higher in precision than the compared existing methods in discriminating species from the sample of multiple strains. CoMet resulted in higher precision than MyCC (default) and MyCC (coverage) on a real metagenome. CONCLUSIONS: The approach proposed with CoMet for binning contigs, improves the precision of binning while characterizing more species in a single metagenomic sample and in a sample containing multiple strains. The F1-scores obtained from different binning strategies vary with different datasets; however, CoMet yields the highest F1-score with a sample comprised of multiple strains.


Assuntos
Algoritmos , Mapeamento de Sequências Contíguas , Metagenômica/métodos , Fluxo de Trabalho , Sequência de Bases , Análise por Conglomerados , Bases de Dados Genéticas , Genoma , Humanos , Metagenoma
7.
Bioinformatics ; 31(6): 886-96, 2015 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-25398613

RESUMO

MOTIVATION: The combined effect of a high replication rate and the low fidelity of the viral polymerase in most RNA viruses and some DNA viruses results in the formation of a viral quasispecies. Uncovering information about quasispecies populations significantly benefits the study of disease progression, antiviral drug design, vaccine design and viral pathogenesis. We present a new analysis pipeline called ViQuaS for viral quasispecies spectrum reconstruction using short next-generation sequencing reads. ViQuaS is based on a novel reference-assisted de novo assembly algorithm for constructing local haplotypes. A significantly extended version of an existing global strain reconstruction algorithm is also used. RESULTS: Benchmarking results showed that ViQuaS outperformed three other previously published methods named ShoRAH, QuRe and PredictHaplo, with improvements of at least 3.1-53.9% in recall, 0-12.1% in precision and 0-38.2% in F-score in terms of strain sequence assembly and improvements of at least 0.006-0.143 in KL-divergence and 0.001-0.035 in root mean-squared error in terms of strain frequency estimation, over the next-best algorithm under various simulation settings. We also applied ViQuaS on a real read set derived from an in vitro human immunodeficiency virus (HIV)-1 population, two independent datasets of foot-and-mouth-disease virus derived from the same biological sample and a real HIV-1 dataset and demonstrated better results than other methods available.


Assuntos
Algoritmos , Vírus da Febre Aftosa/genética , HIV-1/genética , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Vírus da Febre Aftosa/classificação , HIV-1/classificação , Humanos
8.
Bioinformatics ; 31(19): 3198-206, 2015 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-26063840

RESUMO

MOTIVATION: Matrix Assisted Laser Desorption Ionization-Imaging Mass Spectrometry (MALDI-IMS) in 'omics' data acquisition generates detailed information about the spatial distribution of molecules in a given biological sample. Various data processing methods have been developed for exploring the resultant high volume data. However, most of these methods process data in the spectral domain and do not make the most of the important spatial information available through this technology. Therefore, we propose a novel streamlined data analysis pipeline specifically developed for MALDI-IMS data utilizing significant spatial information for identifying hidden significant molecular distribution patterns in these complex datasets. METHODS: The proposed unsupervised algorithm uses Sliding Window Normalization (SWN) and a new spatial distribution based peak picking method developed based on Gray level Co-Occurrence (GCO) matrices followed by clustering of biomolecules. We also use gist descriptors and an improved version of GCO matrices to extract features from molecular images and minimum medoid distance to automatically estimate the number of possible groups. RESULTS: We evaluated our algorithm using a new MALDI-IMS metabolomics dataset of a plant (Eucalypt) leaf. The algorithm revealed hidden significant molecular distribution patterns in the dataset, which the current Component Analysis and Segmentation Map based approaches failed to extract. We further demonstrate the performance of our peak picking method over other traditional approaches by using a publicly available MALDI-IMS proteomics dataset of a rat brain. Although SWN did not show any significant improvement as compared with using no normalization, the visual assessment showed an improvement as compared to using the median normalization. AVAILABILITY AND IMPLEMENTATION: The source code and sample data are freely available at http://exims.sourceforge.net/. CONTACT: awgcdw@student.unimelb.edu.au or chalini_w@live.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Encéfalo/metabolismo , Eucalyptus/química , Metabolômica/métodos , Folhas de Planta/metabolismo , Proteômica/métodos , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Animais , Ratos
9.
J Appl Biomech ; 32(2): 128-39, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26426798

RESUMO

Normalization of gait data is performed to reduce the effects of intersubject variations due to physical characteristics. This study reports a multiple regression normalization approach for spatiotemporal gait data that takes into account intersubject variations in self-selected walking speed and physical properties including age, height, body mass, and sex. Spatiotemporal gait data including stride length, cadence, stance time, double support time, and stride time were obtained from healthy subjects including 782 children, 71 adults, 29 elderly subjects, and 28 elderly Parkinson's disease (PD) patients. Data were normalized using standard dimensionless equations, a detrending method, and a multiple regression approach. After normalization using dimensionless equations and the detrending method, weak to moderate correlations between walking speed, physical properties, and spatiotemporal gait features were observed (0.01 < |r| < 0.88), whereas normalization using the multiple regression method reduced these correlations to weak values (|r| <0.29). Data normalization using dimensionless equations and detrending resulted in significant differences in stride length and double support time of PD patients; however the multiple regression approach revealed significant differences in these features as well as in cadence, stance time, and stride time. The proposed multiple regression normalization may be useful in machine learning, gait classification, and clinical evaluation of pathological gait patterns.


Assuntos
Interpretação Estatística de Dados , Transtornos Neurológicos da Marcha/fisiopatologia , Marcha , Doença de Parkinson/fisiopatologia , Análise Espaço-Temporal , Caminhada , Adolescente , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Criança , Pré-Escolar , Feminino , Transtornos Neurológicos da Marcha/diagnóstico , Transtornos Neurológicos da Marcha/etiologia , Humanos , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Doença de Parkinson/complicações , Doença de Parkinson/diagnóstico , Reconhecimento Automatizado de Padrão , Exame Físico/métodos , Análise de Regressão , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
10.
BMC Bioinformatics ; 16 Suppl 18: S3, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26678073

RESUMO

BACKGROUND: Estimating the number of different species (richness) in a mixed microbial population has been a main focus in metagenomic research. Existing methods of species richness estimation ride on the assumption that the reads in each assembled contig correspond to only one of the microbial genomes in the population. This assumption and the underlying probabilistic formulations of existing methods are not useful for quasispecies populations where the strains are highly genetically related. RESULTS: On benchmark data sets, our estimation method provided accurate richness estimates (< 0.2 median estimation error) and improved the precision of ViQuaS by 2%-13% and F-score by 1%-9% without compromising the recall rates. We also demonstrate that our estimation method can be used to improve the precision and F-score of ShoRAH by 0%-7% and 0%-5% respectively. CONCLUSIONS: The proposed probabilistic estimation method can be used to estimate the richness of viral populations with a quasispecies behavior and to improve the accuracy of the quasispecies spectra reconstructed by the existing methods ViQuaS and ShoRAH in the presence of a moderate level of technical sequencing errors. AVAILABILITY: http://sourceforge.net/projects/viquas/.


Assuntos
Metagenômica , Algoritmos , Benchmarking , Sequenciamento de Nucleotídeos em Larga Escala , Internet , Interface Usuário-Computador
11.
BMC Genomics ; 16 Suppl 12: S12, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26680279

RESUMO

BACKGROUND: Mass Spectrometry (MS) is a ubiquitous analytical tool in biological research and is used to measure the mass-to-charge ratio of bio-molecules. Peak detection is the essential first step in MS data analysis. Precise estimation of peak parameters such as peak summit location and peak area are critical to identify underlying bio-molecules and to estimate their abundances accurately. We propose a new method to detect and quantify peaks in mass spectra. It uses dual-tree complex wavelet transformation along with Stein's unbiased risk estimator for spectra smoothing. Then, a new method, based on the modified Asymmetric Pseudo-Voigt (mAPV) model and hierarchical particle swarm optimization, is used for peak parameter estimation. RESULTS: Using simulated data, we demonstrated the benefit of using the mAPV model over Gaussian, Lorentz and Bi-Gaussian functions for MS peak modelling. The proposed mAPV model achieved the best fitting accuracy for asymmetric peaks, with lower percentage errors in peak summit location estimation, which were 0.17% to 4.46% less than that of the other models. It also outperformed the other models in peak area estimation, delivering lower percentage errors, which were about 0.7% less than its closest competitor - the Bi-Gaussian model. In addition, using data generated from a MALDI-TOF computer model, we showed that the proposed overall algorithm outperformed the existing methods mainly in terms of sensitivity. It achieved a sensitivity of 85%, compared to 77% and 71% of the two benchmark algorithms, continuous wavelet transformation based method and Cromwell respectively. CONCLUSIONS: The proposed algorithm is particularly useful for peak detection and parameter estimation in MS data with overlapping peak distributions and asymmetric peaks. The algorithm is implemented using MATLAB and the source code is freely available at http://mapv.sourceforge.net.


Assuntos
Biologia Computacional/métodos , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Algoritmos , Simulação por Computador
12.
BMC Genomics ; 16: 219, 2015 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-25879764

RESUMO

BACKGROUND: Prokaryotic microbes, the most abundant organisms in the ocean, are remarkably diverse. Despite numerous studies of marine prokaryotes, the zonation of their communities in pelagic zones has been poorly delineated. By exploiting the persistent stratification of the South China Sea (SCS), we performed a 2-year, large spatial scale (10, 100, 1000, and 3000 m) survey, which included a pilot study in 2006 and comprehensive sampling in 2007, to investigate the biological zonation of bacteria and archaea using 16S rRNA tag and shotgun metagenome sequencing. RESULTS: Alphaproteobacteria dominated the bacterial community in the surface SCS, where the abundance of Betaproteobacteria was seemingly associated with climatic activity. Gammaproteobacteria thrived in the deep SCS, where a noticeable amount of Cyanobacteria were also detected. Marine Groups II and III Euryarchaeota were predominant in the archaeal communities in the surface and deep SCS, respectively. Bacterial diversity was higher than archaeal diversity at all sampling depths in the SCS, and peaked at mid-depths, agreeing with the diversity pattern found in global water columns. Metagenomic analysis not only showed differential %GC values and genome sizes between the surface and deep SCS, but also demonstrated depth-dependent metabolic potentials, such as cobalamin biosynthesis at 10 m, osmoregulation at 100 m, signal transduction at 1000 m, and plasmid and phage replication at 3000 m. When compared with other oceans, urease at 10 m and both exonuclease and permease at 3000 m were more abundant in the SCS. Finally, enriched genes associated with nutrient assimilation in the sea surface and transposase in the deep-sea metagenomes exemplified the functional zonation in global oceans. CONCLUSIONS: Prokaryotic communities in the SCS stratified with depth, with maximal bacterial diversity at mid-depth, in accordance with global water columns. The SCS had functional zonation among depths and endemically enriched metabolic potentials at the study site, in contrast to other oceans.


Assuntos
Archaea/genética , Bactérias/genética , Metagenômica , Água do Mar/microbiologia , Archaea/metabolismo , Bactérias/metabolismo , China , Análise por Conglomerados , Biologia Computacional , Exonucleases/genética , Exonucleases/metabolismo , Genoma Arqueal , Genoma Bacteriano , Proteínas de Membrana Transportadoras/genética , Proteínas de Membrana Transportadoras/metabolismo , RNA Ribossômico 16S/análise , RNA Ribossômico 16S/genética , Análise de Sequência de DNA , Urease/genética , Urease/metabolismo , Vitamina B 12/biossíntese
13.
BMC Genomics ; 15: 732, 2014 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-25167919

RESUMO

BACKGROUND: Using whole exome sequencing to predict aberrations in tumours is a cost effective alternative to whole genome sequencing, however is predominantly used for variant detection and infrequently utilised for detection of somatic copy number variation. RESULTS: We propose a new method to infer copy number and genotypes using whole exome data from paired tumour/normal samples. Our algorithm uses two Hidden Markov Models to predict copy number and genotypes and computationally resolves polyploidy/aneuploidy, normal cell contamination and signal baseline shift. Our method makes explicit detection on chromosome arm level events, which are commonly found in tumour samples. The methods are combined into a package named ADTEx (Aberration Detection in Tumour Exome). We applied our algorithm to a cohort of 17 in-house generated and 18 TCGA paired ovarian cancer/normal exomes and evaluated the performance by comparing against the copy number variations and genotypes predicted using Affymetrix SNP 6.0 data of the same samples. Further, we carried out a comparison study to show that ADTEx outperformed its competitors in terms of precision and F-measure. CONCLUSIONS: Our proposed method, ADTEx, uses both depth of coverage ratios and B allele frequencies calculated from whole exome sequencing data, to predict copy number variations along with their genotypes. ADTEx is implemented as a user friendly software package using Python and R statistical language. Source code and sample data are freely available under GNU license (GPLv3) at http://adtex.sourceforge.net/.


Assuntos
Variações do Número de Cópias de DNA , Exoma , Genótipo , Neoplasias/genética , Algoritmos , Aberrações Cromossômicas , Biologia Computacional/métodos , Feminino , Genômica/métodos , Técnicas de Genotipagem , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Perda de Heterozigosidade , Neoplasias Ovarianas/genética , Polimorfismo de Nucleotídeo Único , Poliploidia , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
14.
Nucleic Acids Res ; 40(5): e34, 2012 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-22180538

RESUMO

An approach to infer the unknown microbial population structure within a metagenome is to cluster nucleotide sequences based on common patterns in base composition, otherwise referred to as binning. When functional roles are assigned to the identified populations, a deeper understanding of microbial communities can be attained, more so than gene-centric approaches that explore overall functionality. In this study, we propose an unsupervised, model-based binning method with two clustering tiers, which uses a novel transformation of the oligonucleotide frequency-derived error gradient and GC content to generate coarse groups at the first tier of clustering; and tetranucleotide frequency to refine these groups at the secondary clustering tier. The proposed method has a demonstrated improvement over PhyloPythia, S-GSOM, TACOA and TaxSOM on all three benchmarks that were used for evaluation in this study. The proposed method is then applied to a pyrosequenced metagenomic library of mud volcano sediment sampled in southwestern Taiwan, with the inferred population structure validated against complementary sequencing of 16S ribosomal RNA marker genes. Finally, the proposed method was further validated against four publicly available metagenomes, including a highly complex Antarctic whale-fall bone sample, which was previously assumed to be too complex for binning prior to functional analysis.


Assuntos
Metagenoma , Metagenômica/métodos , Animais , Composição de Bases , Biofilmes , Osso e Ossos/microbiologia , Análise por Conglomerados , Biblioteca Genômica , Sedimentos Geológicos/microbiologia , Oligoquetos/microbiologia , Análise de Sequência de DNA , Esgotos/microbiologia
15.
Sci Rep ; 14(1): 13558, 2024 06 12.
Artigo em Inglês | MEDLINE | ID: mdl-38866809

RESUMO

Longitudinal studies that continuously generate data enable the capture of temporal variations in experimentally observed parameters, facilitating the interpretation of results in a time-aware manner. We propose IL-VIS (incrementally learned visualizer), a new machine learning pipeline that incrementally learns and visualizes a progression trajectory representing the longitudinal changes in longitudinal studies. At each sampling time point in an experiment, IL-VIS generates a snapshot of the longitudinal process on the data observed thus far, a new feature that is beyond the reach of classical static models. We first verify the utility and correctness of IL-VIS using simulated data, for which the true progression trajectories are known. We find that it accurately captures and visualizes the trends and (dis)similarities between high-dimensional progression trajectories. We then apply IL-VIS to longitudinal multi-electrode array data from brain cortical organoids when exposed to different levels of quinolinic acid, a metabolite contributing to many neuroinflammatory diseases including Alzheimer's disease, and its blocking antibody. We uncover valuable insights into the organoids' electrophysiological maturation and response patterns over time under these conditions.


Assuntos
Aprendizado de Máquina , Estudos Longitudinais , Humanos , Organoides , Doença de Alzheimer/metabolismo , Encéfalo/fisiologia
16.
BMC Bioinformatics ; 14 Suppl 2: S2, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23368785

RESUMO

BACKGROUND: One of the main types of genetic variations in cancer is Copy Number Variations (CNV). Whole exome sequencing (WES) is a popular alternative to whole genome sequencing (WGS) to study disease specific genomic variations. However, finding CNV in Cancer samples using WES data has not been fully explored. RESULTS: We present a new method, called CoNVEX, to estimate copy number variation in whole exome sequencing data. It uses ratio of tumour and matched normal average read depths at each exonic region, to predict the copy gain or loss. The useful signal produced by WES data will be hindered by the intrinsic noise present in the data itself. This limits its capacity to be used as a highly reliable CNV detection source. Here, we propose a method that consists of discrete wavelet transform (DWT) to reduce noise. The identification of copy number gains/losses of each targeted region is performed by a Hidden Markov Model (HMM). CONCLUSION: HMM is frequently used to identify CNV in data produced by various technologies including Array Comparative Genomic Hybridization (aCGH) and WGS. Here, we propose an HMM to detect CNV in cancer exome data. We used modified data from 1000 Genomes project to evaluate the performance of the proposed method. Using these data we have shown that CoNVEX outperforms the existing methods significantly in terms of precision. Overall, CoNVEX achieved a sensitivity of more than 92% and a precision of more than 50%.


Assuntos
Variações do Número de Cópias de DNA , Exoma , Neoplasias/genética , Hibridização Genômica Comparativa , Éxons , Genômica/métodos , Humanos , Cadeias de Markov , Modelos Estatísticos
17.
Bioinformatics ; 28(10): 1307-13, 2012 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-22474122

RESUMO

MOTIVATION: In light of the increasing adoption of targeted resequencing (TR) as a cost-effective strategy to identify disease-causing variants, a robust method for copy number variation (CNV) analysis is needed to maximize the value of this promising technology. RESULTS: We present a method for CNV detection for TR data, including whole-exome capture data. Our method calls copy number gains and losses for each target region based on normalized depth of coverage. Our key strategies include the use of base-level log-ratios to remove GC-content bias, correction for an imbalanced library size effect on log-ratios, and the estimation of log-ratio variations via binning and interpolation. Our methods are made available via CONTRA (COpy Number Targeted Resequencing Analysis), a software package that takes standard alignment formats (BAM/SAM) and outputs in variant call format (VCF4.0), for easy integration with other next-generation sequencing analysis packages. We assessed our methods using samples from seven different target enrichment assays, and evaluated our results using simulated data and real germline data with known CNV genotypes.


Assuntos
Variações do Número de Cópias de DNA , Exoma , Análise de Sequência de DNA , Animais , Simulação por Computador , Projeto HapMap , Humanos , Camundongos , Software
18.
Biosystems ; 220: 104749, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-35917953

RESUMO

High throughput technologies used in experimental biological sciences produce data with a vast number of variables at a rapid pace, making large volumes of high-dimensional data available. The exploratory analysis of such high-dimensional data can be aided by human interpretable low-dimensional visualizations. This work investigates how both discrete and continuous structures in biological data can be captured using the recently proposed dimensionality reduction method SONG, and compares the results with commonly used methods UMAP and PHATE. Using simulated and real-world datasets, we observe that SONG produces insightful visualizations by preserving various patterns, including discrete clusters, continuums, and branching structures in all considered datasets. More importantly, for datasets containing both discrete and continuous structures, SONG performs better at preserving both the structures compared to UMAP and PHATE. Furthermore, our quantitative evaluation of the three methods using downstream analysis validates the on par quality of the SONG's low-dimensional embeddings compared to the other methods.

19.
Genomics ; 96(2): 92-101, 2010 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-20417269

RESUMO

The second codon of a transcript, besides encoding for an amino acid, is now known to also have multiple molecular functions and is involved in translation efficiency and protein turn-over and maturation processing. These multiple purposes therefore make the selection constraints on this codon's composition more complex. To examine the biological significance of various permutations of the second codon, we conducted a systematic survey of second codon composition from 442 selected genomes across three domains. The amino acid bias of the second codon is associated with specific protein functions. The most common amino acids (S, A, K and T) are significantly avoided in Cell Envelope-related genes but preferred in Translation or Energy Metabolism-related genes, suggesting that the function of a gene product is a significant factor influencing the composition of the second codon.


Assuntos
Aminoácidos/genética , Códon/fisiologia , Genoma/genética , Proteínas/fisiologia , Seleção Genética , Archaea/genética , Bactérias/genética , Composição de Bases , Códon/genética , Eucariotos/genética , Mutação/genética , Proteínas/genética , Análise de Sequência de Proteína
20.
J Biomech ; 125: 110552, 2021 08 26.
Artigo em Inglês | MEDLINE | ID: mdl-34237661

RESUMO

Joint angle quantification from inertial measurement units (IMUs) is commonly performed using kinematic modelling, which depends on anatomical sensor placement and/or functional joint calibration; however, accurate three-dimensional joint motion measurement remains challenging to achieve. The aims of this study were firstly to employ deep neural networks to convert IMU data to ankle joint angles that are indistinguishable from those derived from motion capture-based inverse kinematics (IK) - the reference standard; and secondly, to validate the robustness of this approach across contrasting walking speeds in healthy individuals. Kinematics data were simultaneously calculated using IMUs and IK from 9 subjects during walking on a treadmill at 0.5 m/s, 1.0 m/s and 1.5 m/s. A generative adversarial network was trained using gait data at two of the walking speeds to predict ankle kinematics from IMU data alone for the third walking speed. There were significant differences between IK and IMU joint angle predictions for ankle eversion and internal rotation during walking at 0.5 m/s and 1.0 m/s (p < 0.001); however, no significant differences in joint angles were observed between the generative adversarial network prediction and IK at any speed or plane of joint motion (p < 0.05). The RMS difference in ankle joint kinematics between the generative adversarial network and IK for walking at 1.0 m/s was 3.8°, 2.1° and 3.5° for dorsiflexion, inversion and axial rotation, respectively. The modeling approach presented for real-time IMU to ankle joint angle conversion, which can be readily expanded to other joints, may provide enhanced IMU capability in applications such as telemedicine, remote monitoring and rehabilitation.


Assuntos
Articulação do Tornozelo , Caminhada , Fenômenos Biomecânicos , Marcha , Humanos , Redes Neurais de Computação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA