Búsqueda | Portal de Búsqueda de la BVS Enfermería

Unique ion filter-A data reduction tool for chemometric analysis of raw comprehensive two-dimensional gas chromatography-mass spectrometry data.

Adutwum, Lawrence A; Kwao, Joanna Koryo; Harynuk, James J.

J Sep Sci ; 44(14): 2773-2784, 2021 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-33932270

RESUMEN

Comprehensive gas chromatography with time of flight mass spectrometry is a powerful tool in the analysis of complex samples. Chemometric analysis of raw chromatographic data is more useful in one- and two-dimensional separations relative to peak tables. The data volume from such experiments generally necessitates the use of data reduction tools. Such tools often sacrifice some of the multivariate information in the mass to charge ratio dimension. The unique ion filter reduces the over-redundancy in two-dimensional gas chromatography-mass spectrometry data by limiting the data to a few unique/pseudo-unique ions, sub-peaks/slices in the first dimension, and spectra in the second dimension. We explore the performance of this algorithm through careful inspection of two-dimensional gas chromatography-mass spectrometry data before and after application of the filter. A reduction (99%) in the number of variables in a two-dimensional gas chromatography-mass spectrometry chromatogram passed on to subsequent analysis was observed. Feature selection times for model optimization reduced from 229 (±13) to 6.8 (±0.5) min when the filter was applied. An estimate of two unique/pseudo-unique ions, one sub-peak in the first dimension and five spectra in the second dimension were considered to provide a true representation of each chromatogram and provided enough information to achieve 100% model prediction accuracy.

Solving the Coloring Problem in Half-Heusler Structures: Machine-Learning Predictions and Experimental Validation.

Gzyl, Alexander S; Oliynyk, Anton O; Adutwum, Lawrence A; Mar, Arthur.

Inorg Chem ; 58(14): 9280-9289, 2019 Jul 15.

Artículo en Inglés | MEDLINE | ID: mdl-31247819

RESUMEN

The site preferences within the structures of half-Heusler compounds have been evaluated through a machine-learning approach. A support-vector machine algorithm was applied to develop a model which was trained on 179 experimentally reported structures and 23 descriptors based solely on the chemical composition. The model gave excellent performance, with sensitivity of 93%, selectivity of 96%, and accuracy of 95%. As an illustration of data sanitization, two compounds (GdPtSb, HoPdBi) flagged by the model to have potentially incorrect site assignments were resynthesized and structurally characterized. The predictions of the correct site assignments from the machine-learning model were confirmed by single-crystal and powder X-ray diffraction analysis. These site assignments also corresponded to the lowest total energy configurations as revealed from first-principles calculations.

Disentangling Structural Confusion through Machine Learning: Structure Prediction and Polymorphism of Equiatomic Ternary Phases ABC.

Oliynyk, Anton O; Adutwum, Lawrence A; Rudyk, Brent W; Pisavadia, Harshil; Lotfi, Sogol; Hlukhyy, Viktor; Harynuk, James J; Mar, Arthur; Brgoch, Jakoah.

J Am Chem Soc ; 139(49): 17870-17881, 2017 12 13.

Artículo en Inglés | MEDLINE | ID: mdl-29129069

RESUMEN

A method to predict the crystal structure of equiatomic ternary compositions based only on the constituent elements was developed using cluster resolution feature selection (CR-FS) and support vector machine (SVM) classification. The supervised machine-learning model was first trained with 1037 individual compounds that adopt the most populated ternary 1:1:1 structure types (TiNiSi-, ZrNiAl-, PbFCl-, LiGaGe-, YPtAs-, UGeTe-, and LaPtSi-type) and then validated using an additional 519 compounds. The CR-FS algorithm improves class discrimination and indicates that 113 variables including size, electronegativity, number of valence electrons, and position on the periodic table (group number) influence the structure preference. The final model prediction sensitivity, specificity, and accuracy were 97.3%, 93.9%, and 96.9%, respectively, establishing that this method is capable of reliably predicting the crystal structure given only its composition. The power of CR-FS and SVM classification is further demonstrated by segregating the crystal structure of polymorphs, specifically to examine polymorphism in TiNiSi- and ZrNiAl-type structures. Analyzing 19 compositions that are experimentally reported in both structure types, this machine-learning model correctly identifies, with high confidence (>0.7), the low-temperature polymorph from its high-temperature form. Interestingly, machine learning also reveals that certain compositions cannot be clearly differentiated and lie in a "confused" region (0.3-0.7 confidence), suggesting that both polymorphs may be observed in a single sample at certain experimental conditions. The ensuing synthesis and characterization of TiFeP adopting both TiNiSi- and ZrNiAl-type structures in a single sample, even after long annealing times (3 months), validate the occurrence of the region of structural uncertainty predicted by machine learning.

Estimation of start and stop numbers for cluster resolution feature selection algorithm: an empirical approach using null distribution analysis of Fisher ratios.

Adutwum, Lawrence A; de la Mata, A Paulina; Bean, Heather D; Hill, Jane E; Harynuk, James J.

Anal Bioanal Chem ; 409(28): 6699-6708, 2017 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-28963623

RESUMEN

Cluster resolution feature selection (CR-FS) is a hybrid feature selection algorithm which involves the evaluation of ranked variables via sequential backward elimination (SBE) and sequential forward selection (SFS). The implementation of CR-FS requires two main inputs, namely, start and stop number. The start number is the number of the highly ranked variables for the SBE while the stop number is the point at which the search for additional features during the SFS stage is halted. The setting of these critical parameters has always relied on trial and error which introduced subjectivity in the results obtained. The start and stop numbers are known to vary with each dataset. Drawing inspiration from overlapping coefficients, a method for comparing two probability density functions, empirical equations toward the estimation of start and stop number for a dataset were developed. All of the parameters in the empirical equations are obtained from the comparisons of the two probability density functions except the constant termed d. The equations were optimized using three real-world datasets. The optimum range of d was determined to be 0.48 to 0.57. An implementation of CR-FS using two new datasets demonstrated the validity of this approach. Partial least squares discriminant analysis (PLS-DA) model prediction accuracies increased from 90 and 96 to 100% for both datasets using start and stop numbers calculated with this approach. Additionally, there was a twofold increase in the explained variance captured in the first two principal components. Graphical abstract Here, we describe how to determine the start and stop numbers for an automated feature selection routine, ensuring that you get the best model you can for your data with minimal effort.

Ultraviolet-Visible Spectroscopy and Chemometric Strategy Enable the Classification and Detection of Expired Antimalarial Herbal Medicinal Product in Ghana.

Mensah, Jacob N; Brobbey, Abena A; Addotey, John N; Ayensu, Isaac; Asare-Nkansah, Samuel; Opuni, Kwabena F M; Adutwum, Lawrence A.

Int J Anal Chem ; 2021: 5592217, 2021.

Artículo en Inglés | MEDLINE | ID: mdl-34257664

RESUMEN

To meet the growing demand for complementary and alternative treatment for malaria, manufacturers produce several antimalarial herbal medicinal products. Herbal medicinal products regulation is difficult due to their complex chemical nature, requiring cumbersome, expensive, and time-consuming methods of analysis. The aim of this study was to develop a simple spectroscopic method together with a chemometric model for the classification and the identification of expired liquid antimalarial herbal medicinal products. Principal component analysis model was successfully used to distinguish between different herbal medicinal products and identify expired products. Principal component analysis showed a clear class separation between all five herbal medicinal products (HMP) studied, with explained variance for first and second principal components as 37.51% and 26.38%, respectively, while the third principal component had 18.74%. Support vector machine classification gave specificity and accuracy of 1.00 (100%) for training set data for all the products. The validation set HMP1, HMP2, and HMP3 had sensitivity, specificity, and accuracy of 1.00. HMP4 and HMP5 had sensitivity and specificity of 0.90 and 1.00, respectively, and an accuracy of 0.98. The support vector machine classification and principal component analysis models were successfully used to identify expired herbal medicinal products. This strategy can be used for rapid field detection of expired liquid antimalarial herbal medicinal products.

Total Ion Spectra versus Segmented Total Ion Spectra as Preprocessing Tools for Gas Chromatography - Mass Spectrometry Data.

Adutwum, Lawrence A; Abel, Robin J; Harynuk, James.

J Forensic Sci ; 63(4): 1059-1068, 2018 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-29023723

RESUMEN

Alignment of fire debris data from GC-MS for chemometric analysis is challenged by highly variable, uncontrolled sample and matrix composition. The total ion spectrum (TIS) obviates the need for alignment but loses all separation information. We introduce the segmented total ion spectrum (STIS), which retains the advantages of TIS while retaining some retention information. We compare the performance of STIS with TIS for the classification of casework fire debris samples. TIS and STIS achieve good model prediction accuracies of 96% and 98%, respectively. Baseline removal improved model prediction accuracies for both TIS and STIS to 97% and 99%, respectively. The importance of maintaining some chromatographic information to aid in deciphering the underlying chemistry of the results and reasons for false positive/negative results was also examined.

How To Optimize Materials and Devices via Design of Experiments and Machine Learning: Demonstration Using Organic Photovoltaics.

Cao, Bing; Adutwum, Lawrence A; Oliynyk, Anton O; Luber, Erik J; Olsen, Brian C; Mar, Arthur; Buriak, Jillian M.

ACS Nano ; 12(8): 7434-7444, 2018 Aug 28.

Artículo en Inglés | MEDLINE | ID: mdl-30027732

RESUMEN

Most discoveries in materials science have been made empirically, typically through one-variable-at-a-time (Edisonian) experimentation. The characteristics of materials-based systems are, however, neither simple nor uncorrelated. In a device such as an organic photovoltaic, for example, the level of complexity is high due to the sheer number of components and processing conditions, and thus, changing one variable can have multiple unforeseen effects due to their interconnectivity. Design of Experiments (DoE) is ideally suited for such multivariable analyses: by planning one's experiments as per the principles of DoE, one can test and optimize several variables simultaneously, thus accelerating the process of discovery and optimization while saving time and precious laboratory resources. When combined with machine learning, the consideration of one's data in this manner provides a different perspective for optimization and discovery, akin to climbing out of a narrow valley of serial (one-variable-at-a-time) experimentation, to a mountain ridge with a 360° view in all directions.

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA