Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
Sensors (Basel) ; 24(8)2024 Apr 10.
Artigo em Inglês | MEDLINE | ID: mdl-38676037

RESUMO

The aim of this paper is to discuss the effect of the sensor on the acoustic emission (AE) signature and to develop a methodology to reduce the sensor effect. Pencil leads are broken on PMMA plates at different source-sensor distances, and the resulting waves are detected with different sensors. Several transducers, commonly used for acoustic emission measurements, are compared with regard to their ability to reproduce the characteristic shapes of plate waves. Their consequences for AE descriptors are discussed. Their different responses show why similar test specimens and test conditions can yield disparate results. This sensor effect will furthermore make the classification of different AE sources more difficult. In this context, a specific procedure is proposed to reduce the sensor effect and to propose an efficient selection of descriptors for data merging. Principal Component Analysis has demonstrated that using the Z-score normalized descriptor data in conjunction with the Krustal-Wallis test and identifying the outliers can help reduce the sensor effect. This procedure leads to the selection of a common descriptor set with the same distribution for all sensors. These descriptors can be merged to create a library. This result opens up new outlooks for the generalization of acoustic emission signature libraries. This aspect is a key point for the development of a database for machine learning.

2.
Sensors (Basel) ; 23(20)2023 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-37896564

RESUMO

For autonomous driving, perception is a primary and essential element that fundamentally deals with the insight into the ego vehicle's environment through sensors. Perception is challenging, wherein it suffers from dynamic objects and continuous environmental changes. The issue grows worse due to interrupting the quality of perception via adverse weather such as snow, rain, fog, night light, sand storms, strong daylight, etc. In this work, we have tried to improve camera-based perception accuracy, such as autonomous-driving-related object detection in adverse weather. We proposed the improvement of YOLOv8-based object detection in adverse weather through transfer learning using merged data from various harsh weather datasets. Two prosperous open-source datasets (ACDC and DAWN) and their merged dataset were used to detect primary objects on the road in harsh weather. A set of training weights was collected from training on the individual datasets, their merged versions, and several subsets of those datasets according to their characteristics. A comparison between the training weights also occurred by evaluating the detection performance on the datasets mentioned earlier and their subsets. The evaluation revealed that using custom datasets for training significantly improved the detection performance compared to the YOLOv8 base weights. Furthermore, using more images through the feature-related data merging technique steadily increased the object detection performance.

3.
Sensors (Basel) ; 20(5)2020 Feb 29.
Artigo em Inglês | MEDLINE | ID: mdl-32121462

RESUMO

In the present study, we assessed for the first time the performance of our custom-designed low-cost Particulate Matter (PM) monitoring devices (Atmos) in measuring PM10 concentrations. We examined the ambient PM10 levels during an intense measurement campaign at two sites in the Delhi National Capital Region (NCR), India. In this study, we validated the un-calibrated Atmos for measuring ambient PM10 concentrations at highly polluted monitoring sites. PM10 concentration from Atmos, containing laser scattering-based Plantower PM sensor, was comparable with that measured from research-grade scanning mobility particle sizers (SMPS) in combination with optical particle sizers (OPS) and aerodynamic particle sizers (APS). The un-calibrated sensors often provided accurate PM10 measurements, particularly in capturing real-time hourly concentrations variations. Quantile-Quantile plots (QQ-plots) for data collected during the selected deployment period showed positively skewed PM10 datasets. Strong Spearman's rank-order correlations (rs = 0.64-0.83) between the studied instruments indicated the utility of low-cost Plantower PM sensors in measuring PM10 in the real-world context. Additionally, the heat map for weekly datasets demonstrated high R2 values, establishing the efficacy of PM sensor in PM10 measurement in highly polluted environmental conditions.

4.
J Synchrotron Radiat ; 26(Pt 1): 244-252, 2019 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-30655492

RESUMO

At the Swiss Light Source macromolecular crystallography (MX) beamlines the collection of serial synchrotron crystallography (SSX) diffraction data is facilitated by the recent DA+ data acquisition and analysis software developments. The SSX suite allows easy, efficient and high-throughput measurements on a large number of crystals. The fast continuous diffraction-based two-dimensional grid scan method allows initial location of microcrystals. The CY+ GUI utility enables efficient assessment of a grid scan's analysis output and subsequent collection of multiple wedges of data (so-called minisets) from automatically selected positions in a serial and automated way. The automated data processing (adp) routines adapted to the SSX data collection mode provide near real time analysis for data in both CBF and HDF5 formats. The automatic data merging (adm) is the latest extension of the DA+ data analysis software routines. It utilizes the sxdm (SSX data merging) package, which provides automatic online scaling and merging of minisets and allows identification of a minisets subset resulting in the best quality of the final merged data. The results of both adp and adm are sent to the MX MongoDB database and displayed in the web-based tracker, which provides the user with on-the-fly feedback about the experiment.

5.
Adv Exp Med Biol ; 922: 119-135, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27553239

RESUMO

X-ray diffraction from crystals of membrane proteins very often yields incomplete datasets due to, among other things, severe radiation damage. Multiple crystals are thus required to form complete datasets, provided the crystals themselves are isomorphous. Selection and combination of data from multiple crystals is a difficult and tedious task that can be facilitated by purpose-built software. BLEND, in the CCP4 suite of programs for macromolecular crystallography (MX), has been created exactly for this reason. In this chapter the program is described and its workings illustrated by means of data from two membrane proteins.


Assuntos
Cristalografia por Raios X , Proteínas de Membrana/química , Software , Proteínas de Bactérias/química , Gráficos por Computador , Haemophilus influenzae/química , Humanos , Computação Matemática , Proteínas de Membrana/efeitos da radiação , Receptores Histamínicos H1/química
6.
Brief Bioinform ; 14(4): 469-90, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-22851511

RESUMO

Genomic data integration is a key goal to be achieved towards large-scale genomic data analysis. This process is very challenging due to the diverse sources of information resulting from genomics experiments. In this work, we review methods designed to combine genomic data recorded from microarray gene expression (MAGE) experiments. It has been acknowledged that the main source of variation between different MAGE datasets is due to the so-called 'batch effects'. The methods reviewed here perform data integration by removing (or more precisely attempting to remove) the unwanted variation associated with batch effects. They are presented in a unified framework together with a wide range of evaluation tools, which are mandatory in assessing the efficiency and the quality of the data integration process. We provide a systematic description of the MAGE data integration methodology together with some basic recommendation to help the users in choosing the appropriate tools to integrate MAGE data for large-scale analysis; and also how to evaluate them from different perspectives in order to quantify their efficiency. All genomic data used in this study for illustration purposes were retrieved from InSilicoDB http://insilico.ulb.ac.be.


Assuntos
Genômica/métodos , Análise de Sequência com Séries de Oligonucleotídeos , Transcriptoma , Simulação por Computador , Bases de Dados Genéticas , Expressão Gênica , Variação Genética , Genoma
7.
Biometrics ; 71(4): 929-40, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26193911

RESUMO

Merging multiple datasets collected from studies with identical or similar scientific objectives is often undertaken in practice to increase statistical power. This article concerns the development of an effective statistical method that enables to merge multiple longitudinal datasets subject to various heterogeneous characteristics, such as different follow-up schedules and study-specific missing covariates (e.g., covariates observed in some studies but missing in other studies). The presence of study-specific missing covariates presents great statistical methodology challenge in data merging and analysis. We propose a joint estimating function approach to addressing this challenge, in which a novel nonparametric estimating function constructed via splines-based sieve approximation is utilized to bridge estimating equations from studies with missing covariates to those with fully observed covariates. Under mild regularity conditions, we show that the proposed estimator is consistent and asymptotically normal. We evaluate finite-sample performances of the proposed method through simulation studies. In comparison to the conventional multiple imputation approach, our method exhibits smaller estimation bias. We provide an illustrative data analysis using longitudinal cohorts collected in Mexico City to assess the effect of lead exposures on children's somatic growth.


Assuntos
Biometria/métodos , Estudos Longitudinais , Peso Corporal/efeitos dos fármacos , Desenvolvimento Infantil/efeitos dos fármacos , Pré-Escolar , Simulação por Computador , Feminino , Sangue Fetal/metabolismo , Humanos , Lactente , Recém-Nascido , Chumbo/sangue , Chumbo/toxicidade , Masculino , Modelos Estatísticos , Análise Multivariada
8.
J Am Stat Assoc ; 118(543): 1786-1795, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37771512

RESUMO

Merging datafiles containing information on overlapping sets of entities is a challenging task in the absence of unique identifiers, and is further complicated when some entities are duplicated in the datafiles. Most approaches to this problem have focused on linking two files assumed to be free of duplicates, or on detecting which records in a single file are duplicates. However, it is common in practice to encounter scenarios that fit somewhere in between or beyond these two settings. We propose a Bayesian approach for the general setting of multifile record linkage and duplicate detection. We use a novel partition representation to propose a structured prior for partitions that can incorporate prior information about the data collection processes of the datafiles in a flexible manner, and extend previous models for comparison data to accommodate the multifile setting. We also introduce a family of loss functions to derive Bayes estimates of partitions that allow uncertain portions of the partitions to be left unresolved. The performance of our proposed methodology is explored through extensive simulations.

9.
JMIR Public Health Surveill ; 8(10): e38450, 2022 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-36219835

RESUMO

BACKGROUND: COVID-19 was first identified in December 2019 in the city of Wuhan, China. The virus quickly spread and was declared a pandemic on March 11, 2020. After infection, symptoms such as fever, a (dry) cough, nasal congestion, and fatigue can develop. In some cases, the virus causes severe complications such as pneumonia and dyspnea and could result in death. The virus also spread rapidly in the Netherlands, a small and densely populated country with an aging population. Health care in the Netherlands is of a high standard, but there were nevertheless problems with hospital capacity, such as the number of available beds and staff. There were also regions and municipalities that were hit harder than others. In the Netherlands, there are important data sources available for daily COVID-19 numbers and information about municipalities. OBJECTIVE: We aimed to predict the cumulative number of confirmed COVID-19 infections per 10,000 inhabitants per municipality in the Netherlands, using a data set with the properties of 355 municipalities in the Netherlands and advanced modeling techniques. METHODS: We collected relevant static data per municipality from data sources that were available in the Dutch public domain and merged these data with the dynamic daily number of infections from January 1, 2020, to May 9, 2021, resulting in a data set with 355 municipalities in the Netherlands and variables grouped into 20 topics. The modeling techniques random forest and multiple fractional polynomials were used to construct a prediction model for predicting the cumulative number of confirmed COVID-19 infections per 10,000 inhabitants per municipality in the Netherlands. RESULTS: The final prediction model had an R2 of 0.63. Important properties for predicting the cumulative number of confirmed COVID-19 infections per 10,000 inhabitants in a municipality in the Netherlands were exposure to particulate matter with diameters <10 µm (PM10) in the air, the percentage of Labour party voters, and the number of children in a household. CONCLUSIONS: Data about municipality properties in relation to the cumulative number of confirmed infections in a municipality in the Netherlands can give insight into the most important properties of a municipality for predicting the cumulative number of confirmed COVID-19 infections per 10,000 inhabitants in a municipality. This insight can provide policy makers with tools to cope with COVID-19 and may also be of value in the event of a future pandemic, so that municipalities are better prepared.


Assuntos
COVID-19 , Criança , Humanos , Idoso , COVID-19/epidemiologia , Países Baixos/epidemiologia , Cidades/epidemiologia , Material Particulado , Tosse , Algoritmos
10.
BioData Min ; 14(1): 43, 2021 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-34454568

RESUMO

BACKGROUND: The amount of available and potentially significant data describing study subjects is ever growing with the introduction and integration of different registries and data banks. The single specific attribute of these data are not always necessary; more often, membership to a specific group (e.g. diet, social 'bubble', living area) is enough to build a successful machine learning or data mining model without overfitting it. Therefore, in this article we propose an approach to building taxonomies using clustering to replace detailed data from large heterogenous data sets from different sources, while improving interpretability. We used the GISTAR study data base that holds exhaustive self-assessment questionnaire data to demonstrate this approach in the task of differentiating between H. pylori positive and negative study participants, and assessing their potential risk factors. We have compared the results of taxonomy-based classification to the results of classification using raw data. RESULTS: Evaluation of our approach was carried out using 6 classification algorithms that induce rule-based or tree-based classifiers. The taxonomy-based classification results show no significant loss in information, with similar and up to 2.5% better classification accuracy. Information held by 10 and more attributes can be replaced by one attribute demonstrating membership to a cluster in a hierarchy at a specific cut. The clusters created this way can be easily interpreted by researchers (doctors, epidemiologists) and describe the co-occurring features in the group, which is significant for the specific task. CONCLUSIONS: While there are always features and measurements that must be used in data analysis as they are, the use of taxonomies for the description of study subjects in parallel allows using membership to specific naturally occurring groups and their impact on an outcome. This can decrease the risk of overfitting (picking attributes and values specific to the training set without explaining the underlying conditions), improve the accuracy of the models, and improve privacy protection of study participants by decreasing the amount of specific information used to identify the individual.

11.
Methods Mol Biol ; 2228: 419-431, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33950507

RESUMO

Public databases featuring original, raw data from "Omics" experiments enable researchers to perform meta-analyses by combining either the raw data or the summarized results of several independent studies. In proteomics, high-throughput protein expression data is measured by diverse techniques such as mass spectrometry, 2-D gel electrophoresis or protein arrays yielding data of different scales. Therefore, direct data merging can be problematic, and combining the summarized data of the individual studies can be advantageous. A special form of meta-analysis is network meta-analysis, where studies with different settings of experimental groups can be combined. However, all studies must be linked by one experimental group that has to appear in each study. Usually that is the control group. Then, a study network is formed and indirect statistical inferences can also be made between study groups that appear not in each of the studies.In this chapter, we describe the working principle of and available software for network meta-analysis. The applicability to high-throughput protein expression data is demonstrated in an example from breast cancer research. We also describe the special challenges when applying this method.


Assuntos
Neoplasias da Mama/metabolismo , Mineração de Dados , Bases de Dados de Proteínas , Proteínas de Neoplasias/análise , Metanálise em Rede , Proteômica , Feminino , Ensaios de Triagem em Larga Escala , Humanos , Projetos de Pesquisa , Software
12.
Water Res ; 185: 116227, 2020 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-32736284

RESUMO

Long-term, continuous datasets of high quality are important for instrumentation, control, and automation efforts of wastewater resources recovery facility (WRRFs). This study presents a methodology to increase the reliability of measurements from ammonium ion-selective electrodes (ISEs). This is done by correcting corrupted ISE data with a data source that often is available at WRRFs (volume-proportional composite samples). A yearlong measurement campaign showed that the existing standard protocols for sensor maintenance might still create corrupted dataset, with poor sensor recalibrations responsible for abrupt and unrealistic jumps in the measurements. The proposed automatic correction methodology removes both recalibration jumps and signal drift by using information from composite samples that already are taken for reporting to legal authorities. Results showed that the developed methodology provided a continuous, high-quality time series without the major data quality issues of the original signal. In fact, the signal was improved for 87% of days when a reference sample was available. The effect of correcting the data before use in a data-driven software sensor was also investigated. The corrected dataset led to noticeably smaller day-to-day variations in estimated NH4+ loads, and to large improvements on both median estimates and prediction bounds. The long time series allowed for an investigation of how much training data that is required to fit a software sensor, which provides estimates that are representative for the entire study period. The results showed that 8 weeks of data allowed for a good median estimate, while 16 weeks are required for obtaining good 80% prediction bounds. Overall, the proposed method can increase the applicability of relatively cheaper ISE sensors for ICA application within WRRFs.


Assuntos
Compostos de Amônio , Águas Residuárias , Eletrodos Seletivos de Íons , Reprodutibilidade dos Testes
13.
Front Pharmacol ; 10: 127, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30842738

RESUMO

Because of the extended period of clinic data collection and huge size of analyzed samples, the long-term and large-scale pharmacometabonomics profiling is frequently encountered in the discovery of drug/target and the guidance of personalized medicine. So far, integration of the results (ReIn) from multiple experiments in a large-scale metabolomic profiling has become a widely used strategy for enhancing the reliability and robustness of analytical results, and the strategy of direct data merging (DiMe) among experiments is also proposed to increase statistical power, reduce experimental bias, enhance reproducibility and improve overall biological understanding. However, compared with the ReIn, the DiMe has not yet been widely adopted in current metabolomics studies, due to the difficulty in removing unwanted variations and the inexistence of prior knowledges on the performance of the available merging methods. It is therefore urgently needed to clarify whether DiMe can enhance the performance of metabolic profiling or not. Herein, the performance of DiMe on 4 pairs of benchmark datasets was comprehensively assessed by multiple criteria (classification capacity, robustness and false discovery rate). As a result, integration/merging-based strategies (ReIn and DiMe) were found to perform better under all criteria than those strategies based on single experiment. Moreover, DiMe was discovered to outperform ReIn in classification capacity and robustness, while the ReIn showed superior capacity in controlling false discovery rate. In conclusion, these findings provided valuable guidance to the selection of suitable analytical strategy for current metabolomics.

14.
Acta Crystallogr D Struct Biol ; 72(Pt 3): 296-302, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26960117

RESUMO

Merging of data from multiple crystals has proven to be useful for determination of the anomalously scattering atomic substructure for crystals with weak anomalous scatterers (e.g. S and P) and/or poor diffraction. Strategies for merging data from many samples, which require assessment of sample isomorphism, rely on metrics of variability in unit-cell parameters, anomalous signal correlation and overall data similarity. Local scaling, anomalous signal optimization and data-set weighting, implemented in phenix.scale_and_merge, provide an efficient protocol for merging data from many samples. The protein NS1 was used in a series of trials with data collected from 28 samples for phasing by single-wavelength anomalous diffraction of the native S atoms. The local-scaling, anomalous-optimization protocol produced merged data sets with higher anomalous signal quality indicators than did standard global-scaling protocols. The local-scaled data were also more successful in substructure determination. Merged data quality was assessed for data sets where the multiplicity was reduced in either of two ways: by excluding data from individual crystals (to reduce errors owing to non-isomorphism) or by excluding the last-recorded segments of data from each crystal (to minimize the effects of radiation damage). The anomalous signal was equivalent at equivalent multiplicity for the two procedures, and structure-determination success correlated with anomalous signal metrics. The quality of the anomalous signal was strongly correlated with data multiplicity over a range of 12-fold to 150-fold multiplicity. For the NS1 data, the local-scaling and anomalous-optimization protocol handled sample non-isomorphism and radiation-induced decay equally well.


Assuntos
Cristalografia por Raios X/métodos , Flavivirus/química , Enxofre/química , Proteínas não Estruturais Virais/química , Conformação Proteica
15.
Eur J Cancer ; 65: 150-5, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-27498140

RESUMO

With so many adults and children receiving successful treatment for their cancer, survivorship is now a 'new' and critical issue. It is increasingly recognised that the growing numbers of survivors face new challenges in their bid to return to 'normal' life. What is not yet so widely recognised is the need for a broad response to help them cope-with stigmatisation, misunderstanding, lifelong issues of confidence and social adaptation, and even access to employment and to financial services. As a further stage in its programme of attention to this aspect of cancer, the European Organisation for Research and Treatment of Cancer (EORTC) brought survivors, researchers, carers, authorities and policymakers together at a meeting in Brussels in March/April 2016, to learn at first hand about the posttreatment experience of cancer survivors. The meeting demonstrated that while research is well advanced in many of the medical consequences of survivorship, understanding is still lacking of many non-clinical, personal and administrative issues. The meeting raised the discussion of survivorship research beyond the individual to a population-based approach, exploring the related socioeconomic issues. Its exploration of initiatives across Europe countries provoked new thinking on the need for effective collaboration, with a new focus on non-clinical issues, including effective dialogue with financial service providers and employers, improvements in collecting, exchanging and accessing data, and above all, ways of translating research outcomes into action. This will require wider recognition that, as Françoise Meunier, Director Special Projects, EORTC, said, 'It is time for a new mind set'.


Assuntos
Neoplasias/psicologia , Sobreviventes , Adulto , Criança , Emprego , Europa (Continente) , Humanos , Qualidade de Vida , Discriminação Social , Apoio Social
16.
Methods Mol Biol ; 1326: 177-91, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26498621

RESUMO

Computational analyses of biological data are becoming increasingly powerful, and researchers intending on carrying out their own analyses can often choose from a wide array of tools and resources. However, their application might be obstructed by the wide variety of different data formats that are in use, from standard, commonly used formats to output files from high-throughput analysis platforms. The latter are often too large to be opened, viewed, or edited by standard programs, potentially leading to a bottleneck in the analysis. Perl one-liners provide a simple solution to quickly reformat, filter, and merge data sets in preparation for downstream analyses. This chapter presents example code that can be easily adjusted to meet individual requirements. An online version is available at http://bioinf.gen.tcd.ie/pol.


Assuntos
Conjuntos de Dados como Assunto , Microcomputadores , Linguagens de Programação , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA