Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Microscopy (Oxf) ; 2024 May 17.
Artículo en Inglés | MEDLINE | ID: mdl-38757783

RESUMEN

Spectral image (SI) measurement techniques, such as X-ray absorption fine structure (XAFS) imaging and scanning transmission electron microscopy (STEM) with energy-dispersive X-ray spectroscopy (EDS) or electron energy loss spectroscopy (EELS), are useful for identifying chemical structures in composite materials. Machine-learning techniques have been developed for automatic analysis of SI data, and their usefulness has been proven. Recently, an extended measurement technique combining SI with a computed tomography (CT) technique (CT-SI), such as CT-XAFS and STEM-EDS/EELS tomography, was developed to identify the three-dimensional (3D) structures of chemical components. CT-SI analysis can be conducted by combining CT reconstruction algorithms and chemical component analysis based on machine learning techniques. However, this analysis incurs high computational costs owing to the size of the CT-SI datasets. To address this problem, this study proposed a fast computational approach for 3D chemical component analysis in an unsupervised learning setting. The primary idea for reducing the computational cost involved compressing the CT-SI data prior to CT computation and performing 3D reconstruction and chemical component analysis on the compressed data. The proposed approach significantly reduced the computational cost without losing information about the 3D structure and chemical components. We experimentally evaluated the proposed approach using synthetic and real CT-XAFS data, which demonstrated that our approach achieved a significantly faster computational speed than the conventional approach while maintaining analysis performance. As the proposed procedure can be implemented with any CT algorithm, it is expected to accelerate 3D analyses with sparse regularized CT algorithms in noisy and sparse CT-SI datasets.

2.
J Phys Chem A ; 128(4): 716-726, 2024 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-38236195

RESUMEN

Understanding disordered structure is difficult due to insufficient information in experimental data. Here, we overcome this issue by using a combination of diffraction and simulation to investigate oxygen packing and network topology in glassy (g-) and liquid (l-) MgO-SiO2 based on a comparison with the crystalline topology. We find that packing of oxygen atoms in Mg2SiO4 is larger than that in MgSiO3, and that of the glasses is larger than that of the liquids. Moreover, topological analysis suggests that topological similarity between crystalline (c)- and g-(l-) Mg2SiO4 is the signature of low glass-forming ability (GFA), and high GFA g-(l-) MgSiO3 shows a unique glass topology, which is different from c-MgSiO3. We also find that the lowest unoccupied molecular orbital (LUMO) is a free electron-like state at a void site of magnesium atom arising from decreased oxygen coordination, which is far away from crystalline oxides in which LUMO is occupied by oxygen's 3s orbital state in g- and l-MgO-SiO2, suggesting that electronic structure does not play an important role to determine GFA. We finally concluded the GFA of MgO-SiO2 binary is dominated by the atomic structure in terms of network topology.

3.
Neural Comput ; 34(10): 2145-2203, 2022 Sep 12.
Artículo en Inglés | MEDLINE | ID: mdl-36027725

RESUMEN

Bayesian optimization (BO) is a popular method for expensive black-box optimization problems; however, querying the objective function at every iteration can be a bottleneck that hinders efficient search capabilities. In this regard, multifidelity Bayesian optimization (MFBO) aims to accelerate BO by incorporating lower-fidelity observations available with a lower sampling cost. In our previous work, we proposed an information-theoretic approach to MFBO, referred to as multifidelity max-value entropy search (MF-MES), which inherits practical effectiveness and computational simplicity of the well-known max-value entropy search (MES) for the single-fidelity BO. However, the applicability of MF-MES is still limited to the case that a single observation is sequentially obtained. In this letter, we generalize MF-MES so that information gain can be evaluated even when multiple observations are simultaneously obtained. This generalization enables MF-MES to address two practical problem settings: synchronous parallelization and trace-aware querying. We show that the acquisition functions for these extensions inherit the simplicity of MF-MES without introducing additional assumptions. We also provide computational techniques for entropy evaluation and posterior sampling in the acquisition functions, which can be commonly used for all variants of MF-MES. The effectiveness of MF-MES is demonstrated using benchmark functions and real-world applications such as materials science data and hyperparameter tuning of machine-learning algorithms.

4.
Sci Rep ; 11(1): 22180, 2021 Nov 12.
Artículo en Inglés | MEDLINE | ID: mdl-34772967

RESUMEN

The network topology in disordered materials is an important structural descriptor for understanding the nature of disorder that is usually hidden in pairwise correlations. Here, we compare the covalent network topology of liquid and solidified silicon (Si) with that of silica (SiO2) on the basis of the analyses of the ring size and cavity distributions and tetrahedral order. We discover that the ring size distributions in amorphous (a)-Si are narrower and the cavity volume ratio is smaller than those in a-SiO2, which is a signature of poor amorphous-forming ability in a-Si. Moreover, a significant difference is found between the liquid topology of Si and that of SiO2. These topological features, which are reflected in diffraction patterns, explain why silica is an amorphous former, whereas it is impossible to prepare bulk a-Si. We conclude that the tetrahedral corner-sharing network of AX2, in which A is a fourfold cation and X is a twofold anion, as indicated by the first sharp diffraction peak, is an important motif for the amorphous-forming ability that can rule out a-Si as an amorphous former. This concept is consistent with the fact that an elemental material cannot form a bulk amorphous phase using melt quenching technique.

5.
Microscopy (Oxf) ; 69(2): 110-122, 2020 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-31682260

RESUMEN

The combination of scanning transmission electron microscopy (STEM) with analytical instruments has become one of the most indispensable analytical tools in materials science. A set of microscopic image/spectral intensities collected from many sampling points in a region of interest, in which multiple physical/chemical components may be spatially and spectrally entangled, could be expected to be a rich source of information about a material. To unfold such an entangled image comprising information and spectral features into its individual pure components would necessitate the use of statistical treatment based on informatics and statistics. These computer-aided schemes or techniques are referred to as multivariate curve resolution, blind source separation or hyperspectral image analysis, depending on their application fields, and are classified as a subset of machine learning. In this review, we introduce non-negative matrix factorization, one of these unfolding techniques, to solve a wide variety of problems associated with the analysis of materials, particularly those related to STEM, electron energy-loss spectroscopy and energy-dispersive X-ray spectroscopy. This review, which commences with the description of the basic concept, the advantages and drawbacks of the technique, presents several additional strategies to overcome existing problems and their extensions to more general tensor decomposition schemes for further flexible applications are described.

6.
Sci Rep ; 9(1): 15794, 2019 Oct 31.
Artículo en Inglés | MEDLINE | ID: mdl-31673031

RESUMEN

In this study, an efficient method for estimating material parameters based on the experimental data of precipitate shape is proposed. First, a computational model that predicts the energetically favorable shape of precipitate when a d-dimensional material parameter (x) is given is developed. Second, the discrepancy (y) between the precipitate shape obtained through the experiment and that predicted using the computational model is calculated. Third, the Gaussian process (GP) is used to model the relation between x and y. Finally, for identifying the "low-error region (LER)" in the material parameter space where y is less than a threshold, we introduce an adaptive sampling strategy, wherein the estimated GP model suggests the subsequent candidate x to be sampled/calculated. To evaluate the effectiveness of the proposed method, we apply it to the estimation of interface energy and lattice mismatch between MgZn2 ([Formula: see text]) and α-Mg phases in an Mg-based alloy. The result shows that the number of computational calculations of the precipitate shape required for the LER estimation is significantly decreased by using the proposed method.

7.
Nat Commun ; 9(1): 4418, 2018 10 24.
Artículo en Inglés | MEDLINE | ID: mdl-30356117

RESUMEN

The response to respiratory viruses varies substantially between individuals, and there are currently no known molecular predictors from the early stages of infection. Here we conduct a community-based analysis to determine whether pre- or early post-exposure molecular factors could predict physiologic responses to viral exposure. Using peripheral blood gene expression profiles collected from healthy subjects prior to exposure to one of four respiratory viruses (H1N1, H3N2, Rhinovirus, and RSV), as well as up to 24 h following exposure, we find that it is possible to construct models predictive of symptomatic response using profiles even prior to viral exposure. Analysis of predictive gene features reveal little overlap among models; however, in aggregate, these genes are enriched for common pathways. Heme metabolism, the most significantly enriched pathway, is associated with a higher risk of developing symptoms following viral exposure. This study demonstrates that pre-exposure molecular predictors can be identified and improves our understanding of the mechanisms of response to respiratory viruses.


Asunto(s)
Expresión Génica/genética , Voluntarios Sanos , Hemo/metabolismo , Humanos , Subtipo H1N2 del Virus de la Influenza A/inmunología , Subtipo H1N2 del Virus de la Influenza A/patogenicidad , Subtipo H3N2 del Virus de la Influenza A/inmunología , Subtipo H3N2 del Virus de la Influenza A/patogenicidad , Virus Sincitiales Respiratorios/inmunología , Virus Sincitiales Respiratorios/patogenicidad , Rhinovirus/inmunología , Rhinovirus/patogenicidad
8.
F1000Res ; 5: 2678, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27990267

RESUMEN

Metastatic castrate resistant prostate cancer (mCRPC) is the major cause of death in prostate cancer patients. Even though some options for treatment of mCRPC have been developed, the most effective therapies remain unclear. Thus finding key patient clinical variables related with mCRPC is an important issue for understanding the disease progression mechanism of mCRPC and clinical decision making for these patients. The Prostate Cancer DREAM Challenge is a crowd-based competition to tackle this essential challenge using new large clinical datasets. This paper proposes an effective procedure for predicting global risks and survival times of these patients, aimed at sub-challenge 1a and 1b of the Prostate Cancer DREAM challenge. The procedure implements a two-step feature selection procedure, which first implements sparse feature selection for numerical clinical variables and statistical hypothesis testing of differences between survival curves caused by categorical clinical variables, and then implements a forward feature selection to narrow the list of informative features. Using Cox's proportional hazards model with these selected features, this method predicted global risk and survival time of patients using a linear model whose input is a median time computed from the hazard model. The challenge results demonstrated that the proposed procedure outperforms the state of the art model by correctly selecting more informative features on both the global risk prediction and the survival time prediction.

9.
Ultramicroscopy ; 170: 43-59, 2016 11.
Artículo en Inglés | MEDLINE | ID: mdl-27529804

RESUMEN

Advances in scanning transmission electron microscopy (STEM) techniques have enabled us to automatically obtain electron energy-loss (EELS)/energy-dispersive X-ray (EDX) spectral datasets from a specified region of interest (ROI) at an arbitrary step width, called spectral imaging (SI). Instead of manually identifying the potential constituent chemical components from the ROI and determining the chemical state of each spectral component from the SI data stored in a huge three-dimensional matrix, it is more effective and efficient to use a statistical approach for the automatic resolution and extraction of the underlying chemical components. Among many different statistical approaches, we adopt a non-negative matrix factorization (NMF) technique, mainly because of the natural assumption of non-negative values in the spectra and cardinalities of chemical components, which are always positive in actual data. This paper proposes a new NMF model with two penalty terms: (i) an automatic relevance determination (ARD) prior, which optimizes the number of components, and (ii) a soft orthogonal constraint, which clearly resolves each spectrum component. For the factorization, we further propose a fast optimization algorithm based on hierarchical alternating least-squares. Numerical experiments using both phantom and real STEM-EDX/EELS SI datasets demonstrate that the ARD prior successfully identifies the correct number of physically meaningful components. The soft orthogonal constraint is also shown to be effective, particularly for STEM-EELS SI data, where neither the spatial nor spectral entries in the matrices are sparse.

10.
Artículo en Inglés | MEDLINE | ID: mdl-26355515

RESUMEN

We review methods for capturing differential coexpression, which can be divided into two cases by the size of gene sets: 1) two paired genes and 2) multiple genes. In the first case, two genes are positively and negatively correlated with each other under one and the other conditions, respectively. In the second case, multiple genes are coexpressed and randomly expressed under one and the other conditions, respectively. We summarize a variety of methods for the first and second cases into four and three approaches, respectively. We describe each of these approaches in detail technically, being followed by thorough comparative experiments with both synthetic and real data sets. Our experimental results imply high possibility of improving the efficiency of the current methods, particularly in the case of multiple genes, because of low performance achieved by the best methods which are relatively simple intuitive ones.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes/genética , Análisis de Secuencia por Matrices de Oligonucleótidos
11.
PLoS One ; 6(7): e22281, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-21829453

RESUMEN

N-terminal tails of H2A, H2B, H3 and H4 histone families are subjected to posttranslational modifications that take part in transcriptional regulation mechanisms, such as transcription factor binding and gene expression. Regulation mechanisms under control of histone modification are important but remain largely unclear, despite of emerging datasets for comprehensive analysis of histone modification. In this paper, we focus on what we call genetic harmonious units (GHUs), which are co-occurring patterns among transcription factor binding, gene expression and histone modification. We present the first genome-wide approach that captures GHUs by combining ChIP-chip with microarray datasets from Saccharomyces cerevisiae. Our approach employs noise-robust soft clustering to select patterns which share the same preferences in transcription factor-binding, histone modification and gene expression, which are all currently implied to be closely correlated. The detected patterns are a well-studied acetylation of lysine 16 of H4 in glucose depletion as well as co-acetylation of five lysine residues of H3 with H4 Lys12 and H2A Lys7 responsible for ribosome biogenesis. Furthermore, our method further suggested the recognition of acetylated H4 Lys16 being crucial to histone acetyltransferase ESA1, whose essential role is still under controversy, from a microarray dataset on ESA1 and its bypass suppressor mutants. These results demonstrate that our approach allows us to provide clearer principles behind gene regulation mechanisms under histone modifications and detect GHUs further by applying to other microarray and ChIP-chip datasets. The source code of our method, which was implemented in MATLAB (http://www.mathworks.com/), is available from the supporting page for this paper: http://www.bic.kyoto-u.ac.jp/pathway/natsume/hm_detector.htm.


Asunto(s)
Regulación Fúngica de la Expresión Génica , Genoma Fúngico , Histonas/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Factores de Transcripción/metabolismo , Acetilación , Biomarcadores/metabolismo , Inmunoprecipitación de Cromatina , Perfilación de la Expresión Génica , Lisina/química , Lisina/genética , Análisis de Secuencia por Matrices de Oligonucleótidos , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/crecimiento & desarrollo , Saccharomyces cerevisiae/metabolismo , Factores de Transcripción/genética , Transcripción Genética
12.
Nucleic Acids Res ; 39(11): e74, 2011 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-21459849

RESUMEN

A switching mechanism in gene expression, where two genes are positively correlated in one condition and negatively correlated in the other condition, is a key to elucidating complex biological systems. There already exist methods for detecting switching mechanisms from microarrays. However, current approaches have problems under three real cases: outliers, expression values with a very small range and a small number of examples. ROS-DET overcomes these three problems, keeping the computational complexity of current approaches. We demonstrated that ROS-DET outperformed existing methods, under that all these three situations are considered. Furthermore, for each of the top 10 pairs ranked by ROS-DET, we attempted to identify a pathway, i.e. consecutive biological phenomena, being related with the corresponding two genes by checking the biological literature. In 8 out of the 10 pairs, we found two parallel pathways, one of the two genes being in each of the two pathways and two pathways coming to (or starting with) the same gene. This indicates that two parallel pathways would be cooperatively used under one experimental condition, corresponding to the positive correlation, and the two pathways might be alternatively used under the other condition, corresponding to the negative correlation. ROS-DET is available from http://www.bic.kyoto-u.ac.jp/pathway/kayano/ros-det.htm.


Asunto(s)
Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Interpretación Estadística de Datos , Curva ROC
13.
Genome Inform ; 22: 95-120, 2010 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-20238422

RESUMEN

Annotating genes is a fundamental issue in the post-genomic era. A typical procedure for this issue is first clustering genes by their features and then assigning functions of unknown genes by using known genes in the same cluster. A lot of genomic information are available for this issue, but two major types of data which can be measured for any gene are microarray expressions and sequences, both of which however have their own flaws. Thus a natural and promising approach for gene annotation is to integrate these two data sources, especially in terms of their costs to be optimized in clustering. We develop an efficient gene annotation method with three steps containing spectral clustering over the integrated cost, based on the idea of network modularity. We rigorously examined the performance of our proposed method from three different viewpoints. All experimental results indicate the performance advantage of our method over possible clustering/classification-based approaches of gene function annotation, using expressions and/or sequences.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Expresión Génica/fisiología , Genes/fisiología , Reconocimiento de Normas Patrones Automatizadas , Transducción de Señal/fisiología , Integración de Sistemas , Algoritmos , Humanos
14.
Genome Inform ; 24: 69-83, 2010.
Artículo en Inglés | MEDLINE | ID: mdl-22081590

RESUMEN

We address an issue of detecting a switching mechanism in gene expression, where two genes are positively correlated for one experimental condition while they are negatively correlated for another. We compare the performance of existing methods for this issue, roughly divided into two types: interaction test (IT) and the difference of correlation coefficients. Interaction test, currently a standard approach for detecting epistasis in genetics, is the log-likelihood ratio test between two logistic regressions with/without an interaction term, resulting in checking the strength of interaction between two genes. On the other hand, two correlation coefficients can be computed for two experimental conditions and the difference of them shows the alteration of expression trends in a more straightforward manner. In our experiments, we tested three different types of correlation coefficients: Pearson, Spearman and a midcorrelation (biweight midcorrelation). The experiment was performed by using ~ 2.3 × 10(9) combinations selected out of the GEO (Gene Expression Omnibus) database. We sorted all combinations according to the p-values of IT or by the absolute values of the difference of correlation coefficients and then visually evaluated the top ranked combinations in terms of the switching mechanism. The result showed that 1) combinations detected by IT included non-switching combinations and 2) Pearson was affected by outliers easily while Spearman and the midcorrelation seemed likely to avoid them.


Asunto(s)
Perfilación de la Expresión Génica , Expresión Génica , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Algoritmos , Biología Computacional/métodos , Bases de Datos Genéticas , Epistasis Genética , Funciones de Verosimilitud , Modelos Logísticos , Modelos Genéticos , Análisis de Regresión , Programas Informáticos
15.
Bioinformatics ; 25(21): 2735-43, 2009 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-19736252

RESUMEN

MOTIVATION: We address the issue of finding a three-way gene interaction, i.e. two interacting genes in expression under the genotypes of another gene, given a dataset in which expressions and genotypes are measured at once for each individual. This issue can be a general, switching mechanism in expression of two genes, being controlled by categories of another gene, and finding this type of interaction can be a key to elucidating complex biological systems. The most suitable method for this issue is likelihood ratio test using logistic regressions, which we call interaction test, but a serious problem of this test is computational intractability at a genome-wide level. RESULTS: We developed a fast method for this issue which improves the speed of interaction test by around 10 times for any size of datasets, keeping highly interacting genes with an accuracy of approximately 85%. We applied our method to approximately 3 x 10(8) three-way combinations generated from a dataset on human brain samples and detected three-way gene interactions with small P-values. To check the reliability of our results, we first conducted permutations by which we can show that the obtained P-values are significantly smaller than those obtained from permuted null examples. We then used GEO (Gene Expression Omnibus) to generate gene expression datasets with binary classes to confirm the detected three-way interactions by using these datasets and interaction tests. The result showed us some datasets with significantly small P-values, strongly supporting the reliability of the detected three-way interactions. AVAILABILITY: Software is available from http://www.bic.kyoto-u.ac.jp/pathway/kayano/bioinfo_three-way.html CONTACT: kayano@kuicr.kyoto-u.ac.jp SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Genoma , Genómica/métodos , Genotipo , Bases de Datos Genéticas , Expresión Génica , Modelos Logísticos , Programas Informáticos
16.
Bioinformatics ; 24(16): i167-73, 2008 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-18689820

RESUMEN

MOTIVATION: Carbohydrate sugar chains or glycans, the third major class of macromolecules, hold branch shaped tree structures. Glycan motifs are known to be two types: (1) conserved patterns called 'cores' containing the root and (2) ubiquitous motifs which appear in external parts including leaves and are distributed over different glycan classes. Finding these glycan tree motifs is an important issue, but there have been no computational methods to capture these motifs efficiently. RESULTS: We have developed an efficient method for mining motifs or significant subtrees from glycans. The key contribution of this method is: (1) to have proposed a new concept, 'á-closed frequent subtrees', and an efficient method for mining all these subtrees from given trees and (2) to have proposed to apply statistical hypothesis testing to rerank the frequent subtrees in significance. We experimentally verified the effectiveness of the proposed method using real glycans: (1)We examined the top 10 subtrees obtained by our method at some parameter setting and confirmed that all subtrees are significant motifs in glycobiology. (2) We applied the results of our method to a classification problem and found that our method outperformed other competing methods, SVM with three different tree kernels, being all statistically significant. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Carbohidratos/química , Almacenamiento y Recuperación de la Información/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Polisacáridos/química , Secuencia de Carbohidratos
17.
Bioinformatics ; 23(13): i468-78, 2007 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-17646332

RESUMEN

MOTIVATION: A promising and reliable approach to annotate gene function is clustering genes not only by using gene expression data but also literature information, especially gene networks. RESULTS: We present a systematic method for gene clustering by combining these totally different two types of data, particularly focusing on network modularity, a global feature of gene networks. Our method is based on learning a probabilistic model, which we call a hidden modular random field in which the relation between hidden variables directly represents a given gene network. Our learning algorithm which minimizes an energy function considering the network modularity is practically time-efficient, regardless of using the global network property. We evaluated our method by using a metabolic network and microarray expression data, changing with microarray datasets, parameters of our model and gold standard clusters. Experimental results showed that our method outperformed other four competing methods, including k-means and existing graph partitioning methods, being statistically significant in all cases. Further detailed analysis showed that our method could group a set of genes into a cluster which corresponds to the folate metabolic pathway while other methods could not. From these results, we can say that our method is highly effective for gene clustering and annotating gene function.


Asunto(s)
Bases de Datos Genéticas , Perfilación de la Expresión Génica/métodos , Expresión Génica/fisiología , Publicaciones Periódicas como Asunto , Proteoma/clasificación , Proteoma/metabolismo , Transducción de Señal/fisiología , Inteligencia Artificial , Almacenamiento y Recuperación de la Información/métodos , Procesamiento de Lenguaje Natural , Reconocimiento de Normas Patrones Automatizadas/métodos , Proteoma/genética , Integración de Sistemas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...