Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 51
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 39(39 Suppl 1): i242-i251, 2023 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-37387144

RESUMO

MOTIVATION: Non-canonical (or non-B) DNA are genomic regions whose three-dimensional conformation deviates from the canonical double helix. Non-B DNA play an important role in basic cellular processes and are associated with genomic instability, gene regulation, and oncogenesis. Experimental methods are low-throughput and can detect only a limited set of non-B DNA structures, while computational methods rely on non-B DNA base motifs, which are necessary but not sufficient indicators of non-B structures. Oxford Nanopore sequencing is an efficient and low-cost platform, but it is currently unknown whether nanopore reads can be used for identifying non-B structures. RESULTS: We build the first computational pipeline to predict non-B DNA structures from nanopore sequencing. We formalize non-B detection as a novelty detection problem and develop the GoFAE-DND, an autoencoder that uses goodness-of-fit (GoF) tests as a regularizer. A discriminative loss encourages non-B DNA to be poorly reconstructed and optimizing Gaussian GoF tests allows for the computation of P-values that indicate non-B structures. Based on whole genome nanopore sequencing of NA12878, we show that there exist significant differences between the timing of DNA translocation for non-B DNA bases compared with B-DNA. We demonstrate the efficacy of our approach through comparisons with novelty detection methods using experimental data and data synthesized from a new translocation time simulator. Experimental validations suggest that reliable detection of non-B DNA from nanopore sequencing is achievable. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/bayesomicslab/ONT-nonb-GoFAE-DND.


Assuntos
Sequenciamento por Nanoporos , Humanos , DNA , Carcinogênese , Transformação Celular Neoplásica , Genômica
2.
J Magn Reson Imaging ; 57(3): 856-868, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-35808911

RESUMO

BACKGROUND: Studies have identified imaging markers of binge drinking. Functional connectivity during both task challenges and resting state was shown to distinguish binge and nonbinge drinkers. However, no studies have compared the efficacy of task and resting data in the classification. HYPOTHESIS: Task outperforms resting-state functional magnetic resonance imaging (fMRI) data in the differentiation of binge and nonbinge drinkers. We tested the hypothesis via multiple deep learning algorithms. STUDY TYPE: Cross-sectional; retrospective. POPULATION: A total of 149 binge (107 men) and 151 demographically matched, nonbinge (92 men) drinkers curated from the Human Connectome Project, with 80% randomly selected for model development and 20% for validation/test. FIELD STRENGTH/SEQUENCE: A 3 T; fMRI with a blood oxygen level-dependent (BOLD) gradient-echo echo-planar sequence. ASSESSMENT: FMRI data of resting state and seven behavioral tasks were acquired. Graph convolutional network (GCN), long short-term memory, convolutional, and recurrent neural network models were built to distinguish bingers and nonbingers using connectivity matrices of 8, 116, and 268 regions of interest (ROI). Nodal metrics including betweenness centrality, degree centrality, clustering coefficient, efficiency, local efficiency, and shortest path length were calculated from the GCN model. STATISTICAL TESTS: Model performance was quantified by the area under the curve (AUC) in receiver operating characteristic analysis. A P value < 0.05 was considered statistically significant. RESULTS: Task outperformed resting data in classification by approximately 8% by AUC in the test set. Across models and ROI sets, the gambling, motor, language and working memory tasks, each with AUC of 0.614, 0.612, 0.605, and 0.603, performed better than resting data (AUC = 0.548). Models with 116 ROIs (AUC = 0.602) consistently outperformed those with 8 ROIs (AUC = 0.569). Task data performed best with GCN (AUC = 0.619). Nodal metrics of left supplementary motor area and right cuneus showed significant group main effect across tasks. CONCLUSION: Neural responses to cognitive challenges relative to resting state better characterize binge drinking. The performance of different network models may depend on behavioral tasks and the number of ROIs. EVIDENCE LEVEL: 3 TECHNICAL EFFICACY: Stage 2.


Assuntos
Consumo Excessivo de Bebidas Alcoólicas , Aprendizado Profundo , Masculino , Humanos , Imageamento por Ressonância Magnética/métodos , Consumo Excessivo de Bebidas Alcoólicas/diagnóstico por imagem , Estudos Retrospectivos , Estudos Transversais , Etanol , Cognição/fisiologia , Encéfalo
3.
Neurocomputing (Amst) ; 481: 333-356, 2022 Apr 07.
Artigo em Inglês | MEDLINE | ID: mdl-35342226

RESUMO

Adaptive gradient methods (AGMs) have become popular in optimizing the nonconvex problems in deep learning area. We revisit AGMs and identify that the adaptive learning rate (A-LR) used by AGMs varies significantly across the dimensions of the problem over epochs (i.e., anisotropic scale), which may lead to issues in convergence and generalization. All existing modified AGMs actually represent efforts in revising the A-LR. Theoretically, we provide a new way to analyze the convergence of AGMs and prove that the convergence rate of Adam also depends on its hyper-parameter є, which has been overlooked previously. Based on these two facts, we propose a new AGM by calibrating the A-LR with an activation (softplus) function, resulting in the Sadam and SAMSGrad methods. We further prove that these algorithms enjoy better convergence speed under nonconvex, non-strongly convex, and Polyak-Lojasiewicz conditions compared with Adam. Empirical studies support our observation of the anisotropic A-LR and show that the proposed methods outperform existing AGMs and generalize even better than S-Momentum in multiple deep learning tasks.

4.
Parallel Comput ; 1012021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-33363295

RESUMO

Although first-order stochastic algorithms, such as stochastic gradient descent, have been the main force to scale up machine learning models, such as deep neural nets, the second-order quasi-Newton methods start to draw attention due to their effectiveness in dealing with ill-conditioned optimization problems. The L-BFGS method is one of the most widely used quasi-Newton methods. We propose an asynchronous parallel algorithm for stochastic quasi-Newton (AsySQN) method. Unlike prior attempts, which parallelize only the calculation for gradient or the two-loop recursion of L-BFGS, our algorithm is the first one that truly parallelizes L-BFGS with a convergence guarantee. Adopting the variance reduction technique, a prior stochastic L-BFGS, which has not been designed for parallel computing, reaches a linear convergence rate. We prove that our asynchronous parallel scheme maintains the same linear convergence rate but achieves significant speedup. Empirical evaluations in both simulations and benchmark datasets demonstrate the speedup in comparison with the non-parallel stochastic L-BFGS, as well as the better performance than first-order methods in solving ill-conditioned problems.

5.
BMC Bioinformatics ; 21(Suppl 1): 192, 2020 Dec 09.
Artigo em Inglês | MEDLINE | ID: mdl-33297952

RESUMO

BACKGROUND: Automatic segmentation and localization of lesions in mammogram (MG) images are challenging even with employing advanced methods such as deep learning (DL) methods. We developed a new model based on the architecture of the semantic segmentation U-Net model to precisely segment mass lesions in MG images. The proposed end-to-end convolutional neural network (CNN) based model extracts contextual information by combining low-level and high-level features. We trained the proposed model using huge publicly available databases, (CBIS-DDSM, BCDR-01, and INbreast), and a private database from the University of Connecticut Health Center (UCHC). RESULTS: We compared the performance of the proposed model with those of the state-of-the-art DL models including the fully convolutional network (FCN), SegNet, Dilated-Net, original U-Net, and Faster R-CNN models and the conventional region growing (RG) method. The proposed Vanilla U-Net model outperforms the Faster R-CNN model significantly in terms of the runtime and the Intersection over Union metric (IOU). Training with digitized film-based and fully digitized MG images, the proposed Vanilla U-Net model achieves a mean test accuracy of 92.6%. The proposed model achieves a mean Dice coefficient index (DI) of 0.951 and a mean IOU of 0.909 that show how close the output segments are to the corresponding lesions in the ground truth maps. Data augmentation has been very effective in our experiments resulting in an increase in the mean DI and the mean IOU from 0.922 to 0.951 and 0.856 to 0.909, respectively. CONCLUSIONS: The proposed Vanilla U-Net based model can be used for precise segmentation of masses in MG images. This is because the segmentation process incorporates more multi-scale spatial context, and captures more local and global context to predict a precise pixel-wise segmentation map of an input full MG image. These detected maps can help radiologists in differentiating benign and malignant lesions depend on the lesion shapes. We show that using transfer learning, introducing augmentation, and modifying the architecture of the original model results in better performance in terms of the mean accuracy, the mean DI, and the mean IOU in detecting mass lesion compared to the other DL and the conventional models.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Mamografia , Redes Neurais de Computação , Automação , Bases de Dados Factuais , Humanos
6.
J Psychiatry Neurosci ; 45(1): 34-44, 2020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31490055

RESUMO

Background: Phenotypic heterogeneity and complicated gene­environment interplay in etiology are among the primary factors that hinder the identification of genetic variants associated with cocaine use disorder. Methods: To detect novel genetic variants associated with cocaine use disorder, we derived disease traits with reduced phenotypic heterogeneity using cluster analysis of a study sample (n = 9965). We then used these traits in genome-wide association tests, performed separately for 2070 African Americans and 1570 European Americans, using a new mixed model that accounted for the moderating effects of 5 childhood environmental factors. We used an independent sample (918 African Americans, 1382 European Americans) for replication. Results: The cluster analysis yielded 5 cocaine use disorder subtypes, of which subtypes 4 (n = 3258) and 5 (n = 1916) comprised heavy cocaine users, had high heritability estimates (h2 = 0.66 and 0.64, respectively) and were used in association tests. Seven of the 13 identified genetic loci in the discovery phase were available in the replication sample. In African Americans, rs114492924 (discovery p = 1.23 × E−8), a single nucleotide polymorphism in LINC01411, was replicated in the replication sample (p = 3.63 × E−3). In a meta-analysis that combined the discovery and replication results, 3 loci in African Americans were significant genome-wide: rs10188036 in TRAK2 (p = 2.95 × E−8), del-1:15511771 in TMEM51 (p = 9.11 × E−10) and rs149843442 near LPHN2 (p = 3.50 × E−8). Limitations: Lack of data prevented us from replicating 6 of the 13 identified loci. Conclusion: Our results demonstrate the importance of considering phenotypic heterogeneity and gene­environment interplay in detecting genetic variations that contribute to cocaine use disorder, because new genetic loci have been identified using our novel analytic method.


Assuntos
Negro ou Afro-Americano/genética , Transtornos Relacionados ao Uso de Cocaína/genética , Transtornos Relacionados ao Uso de Cocaína/fisiopatologia , Interação Gene-Ambiente , Estudo de Associação Genômica Ampla , População Branca/genética , Adulto , Estudos de Casos e Controles , Análise por Conglomerados , Transtornos Relacionados ao Uso de Cocaína/classificação , Família , Feminino , Loci Gênicos , Variação Genética , Humanos , Masculino , Pessoa de Meia-Idade , Fenótipo , Polimorfismo de Nucleotídeo Único , Estados Unidos
7.
J Chem Inf Model ; 60(12): 6167-6184, 2020 12 28.
Artigo em Inglês | MEDLINE | ID: mdl-33095006

RESUMO

Structurally similar analogues of given query compounds can be rapidly retrieved from chemical databases by the molecular similarity search approaches. However, the computational cost associated with the exhaustive similarity search of a large compound database will be quite high. Although the latest indexing algorithms can greatly speed up the search process, they cannot be readily applicable to molecular similarity search problems due to the lack of Tanimoto similarity metric implementation. In this paper, we first implement Python or C++ codes to enable the Tanimoto similarity search via several recent indexing algorithms, such as Hnsw and Onng. Moreover, there are increasing interests in computational communities to develop robust benchmarking systems to access the performance of various computational algorithms. Here, we provide a benchmark to evaluate the molecular similarity searching performance of these recent indexing algorithms. To avoid the potential package dependency issues, two separate benchmarks are built based on currently popular container technologies, Docker and Singularity. The Singularity container is a rather new container framework specifically designed for the high-performance computing (HPC) platform and does not need the privileged permissions or the separated daemon process. Both benchmarking methods are extensible to incorporate other new indexing algorithms, benchmarking data sets, and different customized parameter settings. Our results demonstrate that the graph-based methods, such as Hnsw and Onng, consistently achieve the best trade-off between searching effectiveness and searching efficiencies. The source code of the entire benchmark systems can be downloaded from https://github.uconn.edu/mldrugdiscovery/MssBenchmark.


Assuntos
Algoritmos , Benchmarking , Metodologias Computacionais , Bases de Dados Factuais , Software
8.
Inf Sci (N Y) ; 494: 278-293, 2019 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-32863420

RESUMO

Multi-view cluster analysis, as a popular granular computing method, aims to partition sample subjects into consistent clusters across different views in which the subjects are characterized. Frequently, data entries can be missing from some of the views. The latest multi-view co-clustering methods cannot effectively deal with incomplete data, especially when there are mixed patterns of missing values. We propose an enhanced formulation for a family of multi-view co-clustering methods to cope with the missing data problem by introducing an indicator matrix whose elements indicate which data entries are observed and assessing cluster validity only on observed entries. In comparison with the simple strategy of removing subjects with missing values, our approach can use all available data in cluster analysis. In comparison with common methods that impute missing data in order to use regular multi-view analytics, our approach is less sensitive to imputation uncertainty. In comparison with other state-of-the-art multi-view incomplete clustering methods, our approach is sensible in the cases of missing any value in a view or missing the entire view, the most common scenario in practice. We first validated the proposed strategy in simulations, and then applied it to a treatment study of heroin dependence which would have been impossible with previous methods due to a number of missing-data patterns. Patients in a treatment study were naturally assessed in different feature spaces such as in the pre-, during-and post-treatment time windows. Our algorithm was able to identify subgroups where patients in each group showed similarities in all of the three time windows, thus leading to the recognition of pre-treatment (baseline) features predictive of post-treatment outcomes.

9.
Inf Process Lett ; 145: 1-5, 2019 May.
Artigo em Inglês | MEDLINE | ID: mdl-31741499

RESUMO

The VC-dimension, which has wide uses in learning theory, has been used in the analysis and design of graph algorithms recently. In this paper, we study the problem of bounding the VC-dimension of unique round-trip shortest path set systems (URTSP), which are set systems induced by sets of vertices in unique round-trip shortest paths in directed graphs. We first show that different from the VC-dimensions of set systems induced by unique undirected and directed shortest paths in undirected and directed graphs respectively, the VC-dimension of URTSP can be larger than 3. We then prove that the VC-dimension of URTSP is at most 32. Furthermore, we apply the VC-dimension result to the minimum k-round-trip shortest path cover problem (k-RTSPC), which is to find for a directed graph a minimum vertex set to intersect every round-trip shortest path containing at least k vertices, and derive an upper bound on the size of the vertex set. The k-RTSPC problem can be useful in many real-world applications, including optimal placement of facilities.

10.
Bioinformatics ; 32(12): i137-i146, 2016 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-27307610

RESUMO

MOTIVATION: A growing number of studies have explored the process of pre-implantation embryonic development of multiple mammalian species. However, the conservation and variation among different species in their developmental programming are poorly defined due to the lack of effective computational methods for detecting co-regularized genes that are conserved across species. The most sophisticated method to date for identifying conserved co-regulated genes is a two-step approach. This approach first identifies gene clusters for each species by a cluster analysis of gene expression data, and subsequently computes the overlaps of clusters identified from different species to reveal common subgroups. This approach is ineffective to deal with the noise in the expression data introduced by the complicated procedures in quantifying gene expression. Furthermore, due to the sequential nature of the approach, the gene clusters identified in the first step may have little overlap among different species in the second step, thus difficult to detect conserved co-regulated genes. RESULTS: We propose a cross-species bi-clustering approach which first denoises the gene expression data of each species into a data matrix. The rows of the data matrices of different species represent the same set of genes that are characterized by their expression patterns over the developmental stages of each species as columns. A novel bi-clustering method is then developed to cluster genes into subgroups by a joint sparse rank-one factorization of all the data matrices. This method decomposes a data matrix into a product of a column vector and a row vector where the column vector is a consistent indicator across the matrices (species) to identify the same gene cluster and the row vector specifies for each species the developmental stages that the clustered genes co-regulate. Efficient optimization algorithm has been developed with convergence analysis. This approach was first validated on synthetic data and compared to the two-step method and several recent joint clustering methods. We then applied this approach to two real world datasets of gene expression during the pre-implantation embryonic development of the human and mouse. Co-regulated genes consistent between the human and mouse were identified, offering insights into conserved functions, as well as similarities and differences in genome activation timing between the human and mouse embryos. AVAILABILITY AND IMPLEMENTATION: The R package containing the implementation of the proposed method in C ++ is available at: https://github.com/JavonSun/mvbc.git and also at the R platform https://www.r-project.org/ CONTACT: jinbo@engr.uconn.edu.


Assuntos
Expressão Gênica , Algoritmos , Animais , Análise por Conglomerados , Feminino , Perfilação da Expressão Gênica , Humanos , Camundongos
11.
BMC Genomics ; 15: 756, 2014 Sep 04.
Artigo em Inglês | MEDLINE | ID: mdl-25185836

RESUMO

BACKGROUND: During mammalian pre-implantation embryonic development dramatic and orchestrated changes occur in gene transcription. The identification of the complete changes has not been possible until the development of the Next Generation Sequencing Technology. RESULTS: Here we report comprehensive transcriptome dynamics of single matured bovine oocytes and pre-implantation embryos developed in vivo. Surprisingly, more than half of the estimated 22,000 bovine genes, 11,488 to 12,729 involved in more than 100 pathways, is expressed in oocytes and early embryos. Despite the similarity in the total numbers of genes expressed across stages, the nature of the expressed genes is dramatically different. A total of 2,845 genes were differentially expressed among different stages, of which the largest change was observed between the 4- and 8-cell stages, demonstrating that the bovine embryonic genome is activated at this transition. Additionally, 774 genes were identified as only expressed/highly enriched in particular stages of development, suggesting their stage-specific roles in embryogenesis. Using weighted gene co-expression network analysis, we found 12 stage-specific modules of co-expressed genes that can be used to represent the corresponding stage of development. Furthermore, we identified conserved key members (or hub genes) of the bovine expressed gene networks. Their vast association with other embryonic genes suggests that they may have important regulatory roles in embryo development; yet, the majority of the hub genes are relatively unknown/under-studied in embryos. We also conducted the first comparison of embryonic expression profiles across three mammalian species, human, mouse and bovine, for which RNA-seq data are available. We found that the three species share more maternally deposited genes than embryonic genome activated genes. More importantly, there are more similarities in embryonic transcriptomes between bovine and humans than between humans and mice, demonstrating that bovine embryos are better models for human embryonic development. CONCLUSIONS: This study provides a comprehensive examination of gene activities in bovine embryos and identified little-known potential master regulators of pre-implantation development.


Assuntos
Desenvolvimento Embrionário/genética , Perfilação da Expressão Gênica , Transcriptoma , Animais , Blastocisto/metabolismo , Bovinos , Mapeamento Cromossômico , Análise por Conglomerados , Biologia Computacional , Feminino , Regulação da Expressão Gênica no Desenvolvimento , Redes Reguladoras de Genes , Humanos , Camundongos , Oócitos/metabolismo , Gravidez , Reprodutibilidade dos Testes
12.
BMC Genet ; 15: 73, 2014 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-24938865

RESUMO

BACKGROUND: Accurate classification of patients with a complex disease into subtypes has important implications for medicine and healthcare. Using more homogeneous disease subtypes in genetic association analysis will facilitate the detection of new genetic variants that are not detectible using the non-differentiated disease phenotype. Subtype differentiation can also improve diagnostic classification, which can in turn inform clinical decision making and treatment matching. Currently, the most sophisticated methods for disease subtyping perform cluster analysis using patients' clinical features. Without guidance from genetic information, the resultant subtypes are likely to be suboptimal and efforts at genetic association may fail. RESULTS: We propose a multi-view matrix decomposition approach that integrates clinical features with genetic markers to detect confirmatory evidence for a disease subtype. This approach groups patients into clusters that are consistent between the clinical and genetic dimensions of data; it simultaneously identifies the clinical features that define the subtype and the genotypes associated with the subtype. A simulation study validated the proposed approach, showing that it identified hypothesized subtypes and associated features. In comparison to the latest biclustering and multi-view data analytics using real-life disease data, the proposed approach identified clinical subtypes of a disease that differed from each other more significantly in the genetic markers, thus demonstrating the superior performance of the proposed approach. CONCLUSIONS: The proposed algorithm is an effective and superior alternative to the disease subtyping methods employed to date. Integration of phenotypic features with genetic markers in the subtyping analysis is a promising approach to identify concurrently disease subtypes and their genetic associations.


Assuntos
Algoritmos , Análise por Conglomerados , Doença/classificação , Doença/genética , Estudos de Associação Genética , Genótipo , Humanos , Fenótipo
13.
Am J Med Genet B Neuropsychiatr Genet ; 165B(2): 148-56, 2014 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-24339190

RESUMO

Because DSM-IV cocaine dependence (CD) is heterogeneous, it is not an optimal phenotype to identify genetic variation contributing to risk for cocaine use and related behaviors (CRBs). We used a cluster analytic method to differentiate homogeneous, highly heritable subtypes of CRBs and to compare their utility with that of the DSM-IV CD as traits for genetic association analysis. Clinical features of CRBs and co-occurring disorders were obtained via a poly-diagnostic interview administered to 9,965 participants in genetic studies of substance dependence. A subsample of subjects (N = 3,443) were genotyped for 1,350 single nucleotide polymorphisms (SNPs) selected from 130 candidate genes related to addiction. Cluster analysis of clinical features of the sample yielded five subgroups, two of which were characterized by heavy cocaine use and high heritability: a heavy cocaine use, infrequent intravenous injection group and an early-onset, heavy cocaine use, high comorbidity group. The utility of these traits was compared with the CD diagnosis through association testing of 2,320 affected subjects and 480 cocaine-exposed controls. Analyses examined both single SNP (main) and SNP-SNP interaction (epistatic) effects, separately for African-Americans and European-Americans. The two derived subtypes showed more significant P values for 6 of 8 main effects and 7 of 8 epistatic effects. Variants in the CLOCK gene were significantly associated with the heavy cocaine use, infrequent intravenous injection group, but not with the DSM-IV diagnosis of CD. These results support the utility of subtypes based on CRBs to detect risk variants for cocaine addiction.


Assuntos
Transtornos Relacionados ao Uso de Cocaína/genética , Frequência do Gene/genética , Predisposição Genética para Doença , Polimorfismo de Nucleotídeo Único/genética , Adulto , Mapeamento Cromossômico/métodos , Manual Diagnóstico e Estatístico de Transtornos Mentais , Feminino , Genótipo , Humanos , Masculino , Adulto Jovem
14.
Artigo em Inglês | MEDLINE | ID: mdl-37696489

RESUMO

BACKGROUND: Magnetic resonance imaging provides noninvasive tools to investigate alcohol use disorder (AUD) and nicotine use disorder (NUD) and neural phenotypes for genetic studies. A data-driven transdiagnostic approach could provide a new perspective on the neurobiology of AUD and NUD. METHODS: Using samples of individuals with AUD (n = 140), individuals with NUD (n = 249), and healthy control participants (n = 461) from the UK Biobank, we integrated clinical, neuroimaging, and genetic markers to identify biotypes of AUD and NUD. We partitioned participants with AUD and NUD based on resting-state functional connectivity (FC) features associated with clinical metrics. A multitask artificial neural network was trained to evaluate the cluster-defined biotypes and jointly infer AUD and NUD diagnoses. RESULTS: Three biotypes-primary NUD, mixed NUD/AUD with depression and anxiety, and mixed AUD/NUD-were identified. Multitask classifiers incorporating biotype knowledge achieved higher area under the curve (AUD: 0.76, NUD: 0.74) than single-task classifiers without biotype differentiation (AUD: 0.61, NUD: 0.64). Cerebellar FC features were important in distinguishing the 3 biotypes. The biotype of mixed NUD/AUD with depression and anxiety demonstrated the largest number of FC features (n = 5), all related to the visual cortex, that significantly differed from healthy control participants and were validated in a replication sample (p < .05). A polymorphism in TNRC6A was associated with the mixed AUD/NUD biotype in both the discovery (p = 7.3 × 10-5) and replication (p = 4.2 × 10-2) sets. CONCLUSIONS: Biotyping and multitask learning using FC features can characterize the clinical and genetic profiles of AUD and NUD and help identify cerebellar and visual circuit markers to differentiate the AUD/NUD group from the healthy control group. These markers support a new growing body of literature.


Assuntos
Alcoolismo , Tabagismo , Humanos , Alcoolismo/diagnóstico por imagem , Imageamento por Ressonância Magnética , Transtornos de Ansiedade , Aprendizado de Máquina
15.
Neuroimage Rep ; 4(1)2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38605733

RESUMO

Background: Deficient sleep is implicated in nicotine dependence as well as depressive and anxiety disorders. The hypothalamus regulates the sleep-wake cycle and supports motivated behavior, and hypothalamic dysfunction may underpin comorbid nicotine dependence, depression and anxiety. We aimed to investigate whether and how the resting state functional connectivities (rsFCs) of the hypothalamus relate to cigarette smoking, deficient sleep, depression and anxiety. Methods: We used the data of 64 smokers and 198 age- and sex-matched adults who never smoked, curated from the Human Connectome Project. Deficient sleep and psychiatric problems were each assessed with Pittsburgh Sleep Quality Index (PSQI) and Achenbach Adult Self-Report. We processed the imaging data with published routines and evaluated the results at a corrected threshold, all with age, sex, and the severity of alcohol use as covariates. Results: Smokers vs. never smokers showed poorer sleep quality and greater severity of depression and anxiety. In smokers only, the total PSQI score, indicating more sleep deficits, was positively associated with hypothalamic rsFCs with the right inferior frontal/insula/superior temporal and postcentral (rPoCG) gyri. Stronger hypothalamus-rPoCG rsFCs were also associated with greater severity of depression and anxiety in smokers but not never smokers. Additionally, in smokers, the PSQI score completely mediated the relationships of hypothalamus-rPoCG rsFCs with depression and anxiety severity. Conclusions: These findings associate hypothalamic circuit dysfunction to sleep deficiency and severity of depression and anxiety symptoms in adults who smoke. Future studies may investigate the roles of the hypothalamic circuit in motivated behaviors to better characterize the inter-related neural markers of smoking, deficient sleep, depression and anxiety.

16.
BMC Med Inform Decis Mak ; 13: 41, 2013 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-23557276

RESUMO

BACKGROUND: Single Nucleotide Polymorphisms (SNPs) are sequence variations found in individuals at some specific points in the genomic sequence. As SNPs are highly conserved throughout evolution and within a population, the map of SNPs serves as an excellent genotypic marker. Conventional SNPs analysis mechanisms suffer from large run times, inefficient memory usage, and frequent overestimation. In this paper, we propose efficient, scalable, and reliable algorithms to select a small subset of SNPs from a large set of SNPs which can together be employed to perform phenotypic classification. METHODS: Our algorithms exploit the techniques of gene selection and random projections to identify a meaningful subset of SNPs. To the best of our knowledge, these techniques have not been employed before in the context of genotype-phenotype correlations. Random projections are used to project the input data into a lower dimensional space (closely preserving distances). Gene selection is then applied on the projected data to identify a subset of the most relevant SNPs. RESULTS: We have compared the performance of our algorithms with one of the currently known best algorithms called Multifactor Dimensionality Reduction (MDR), and Principal Component Analysis (PCA) technique. Experimental results demonstrate that our algorithms are superior in terms of accuracy as well as run time. CONCLUSIONS: In our proposed techniques, random projection is used to map data from a high dimensional space to a lower dimensional space, and thus overcomes the curse of dimensionality problem. From this space of reduced dimension, we select the best subset of attributes. It is a unique mechanism in the domain of SNPs analysis, and to the best of our knowledge it is not employed before. As revealed by our experimental results, our proposed techniques offer the potential of high accuracies while keeping the run times low.


Assuntos
Estudos de Associação Genética , Redução Dimensional com Múltiplos Fatores/métodos , Polimorfismo de Nucleotídeo Único , Algoritmos , Humanos , Análise de Componente Principal
17.
Mol Inform ; 42(5): e2200215, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-36764926

RESUMO

Graph generative models have recently emerged as an interesting approach to construct molecular structures atom-by-atom or fragment-by-fragment. In this study, we adopt the fragment-based strategy and decompose each input molecule into a set of small chemical fragments. In drug discovery, a few drug molecules are designed by replacing certain chemical substituents with their bioisosteres or alternative chemical moieties. This inspires us to group decomposed fragments into different fragment clusters according to their local structural environment around bond-breaking positions. In this way, an input structure can be transformed into an equivalent three-layer graph, in which individual atoms, decomposed fragments, or obtained fragment clusters act as graph nodes at each corresponding layer. We further implement a prototype model, named multi-resolution graph variational autoencoder (MRGVAE), to learn embeddings of constituted nodes at each layer in a fine-to-coarse order. Our decoder adopts a similar but conversely hierarchical structure. It first predicts the next possible fragment cluster, then samples an exact fragment structure out of the determined fragment cluster, and sequentially attaches it to the preceding chemical moiety. Our proposed approach demonstrates comparatively good performance in molecular evaluation metrics compared with several other graph-based molecular generative models. The introduction of the additional fragment cluster graph layer will hopefully increase the odds of assembling new chemical moieties absent in the original training set and enhance their structural diversity. We hope that our prototyping work will inspire more creative research to explore the possibility of incorporating different kinds of chemical domain knowledge into a similar multi-resolution neural network architecture.


Assuntos
Benchmarking , Descoberta de Drogas , Modelos Moleculares , Redes Neurais de Computação
18.
Transl Psychiatry ; 12(1): 253, 2022 06 16.
Artigo em Inglês | MEDLINE | ID: mdl-35710901

RESUMO

Alcohol use behaviors are highly heterogeneous, posing significant challenges to etiologic research of alcohol use disorder (AUD). Magnetic resonance imaging (MRI) provides intermediate endophenotypes in characterizing problem alcohol use and assessing the genetic architecture of addictive behavior. We used connectivity features derived from resting state functional MRI to subtype alcohol misuse (AM) behavior. With a machine learning pipeline of feature selection, dimension reduction, clustering, and classification we identified three AM biotypes-mild, comorbid, and moderate AM biotypes (MIA, COA, and MOA)-from a Human Connectome Project (HCP) discovery sample (194 drinkers). The three groups and controls (397 non-drinkers) demonstrated significant differences in alcohol use frequency during the heaviest 12-month drinking period (MOA > MIA; COA > non-drinkers) and were distinguished by connectivity features involving the frontal, parietal, subcortical and default mode networks. Further, COA relative to MIA, MOA and controls endorsed significantly higher scores in antisocial personality. A genetic association study identified that an alcohol use and antisocial behavior related variant rs16930842 from LINC01414 was significantly associated with COA. Using a replication HCP sample (28 drinkers and 46 non-drinkers), we found that subtyping helped in classifying AM from controls (area under the curve or AUC = 0.70, P < 0.005) in comparison to classifiers without subtyping (AUC = 0.60, not significant) and successfully reproduced the genetic association. Together, the results suggest functional connectivities as important features in classifying AM subgroups and the utility of reducing the heterogeneity in connectivity features among AM subgroups in advancing the research of etiological neural markers of AUD.


Assuntos
Alcoolismo , Conectoma , Consumo de Bebidas Alcoólicas , Alcoolismo/diagnóstico por imagem , Alcoolismo/genética , Encéfalo/diagnóstico por imagem , Coenzima A , Conectoma/métodos , Humanos , Imageamento por Ressonância Magnética/métodos
19.
IEEE Trans Artif Intell ; 2(2): 146-168, 2021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-35308425

RESUMO

Clustering is a machine learning paradigm of dividing sample subjects into a number of groups such that subjects in the same groups are more similar to those in other groups. With advances in information acquisition technologies, samples can frequently be viewed from different angles or in different modalities, generating multi-view data. Multi-view clustering, that clusters subjects into subgroups using multi-view data, has attracted more and more attentions. Although MVC methods have been developed rapidly, there has not been enough survey to summarize and analyze the current progress. Therefore, we propose a novel taxonomy of the MVC approaches. Similar to other machine learning methods, we categorize them into generative and discriminative classes. In discriminative class, based on the way of view integration, we split it further into five groups: Common Eigenvector Matrix, Common Coefficient Matrix, Common Indicator Matrix, Direct Combination and Combination After Projection. Furthermore, we relate MVC to other topics: multi-view representation, ensemble clustering, multi-task clustering, multi-view supervised and semi-supervised learning. Several representative real-world applications are elaborated for practitioners. Some benchmark multi-view datasets are introduced and representative MVC algorithms from each group are empirically evaluated to analyze how they perform on benchmark datasets. To promote future development of MVC approaches, we point out several open problems that may require further investigation and thorough examination.

20.
Proc AAAI Conf Artif Intell ; 35(12): 11193-11201, 2021 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-34745766

RESUMO

In stochastic contextual bandit (SCB) problems, an agent selects an action based on certain observed context to maximize the cumulative reward over iterations. Recently there have been a few studies using a deep neural network (DNN) to predict the expected reward for an action, and the DNN is trained by a stochastic gradient based method. However, convergence analysis has been greatly ignored to examine whether and where these methods converge. In this work, we formulate the SCB that uses a DNN reward function as a non-convex stochastic optimization problem, and design a stage-wise stochastic gradient descent algorithm to optimize the problem and determine the action policy. We prove that with high probability, the action sequence chosen by this algorithm converges to a greedy action policy respecting a local optimal reward function. Extensive experiments have been performed to demonstrate the effectiveness and efficiency of the proposed algorithm on multiple real-world datasets.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA