Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
1.
Sensors (Basel) ; 24(5)2024 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-38475235

RESUMO

Algorithms for QRS detection are fundamental in the ECG interpretive processing chain. They must meet several challenges, such as high reliability, high temporal accuracy, high immunity to noise, and low computational complexity. Unfortunately, the accuracy expressed by missed or redundant events statistics is often the only parameter used to evaluate the detector's performance. In this paper, we first notice that statistics of true positive detections rely on researchers' arbitrary selection of time tolerance between QRS detector output and the database reference. Next, we propose a multidimensional algorithm evaluation method and present its use on four example QRS detectors. The dimensions are (a) influence of detection temporal tolerance, tested for values between 8.33 and 164 ms; (b) noise immunity, tested with an ECG signal with an added muscular noise pattern and signal-to-noise ratio to the effect of "no added noise", 15, 7, 3 dB; and (c) influence of QRS morphology, tested on the six most frequently represented morphology types in the MIT-BIH Arrhythmia Database. The multidimensional evaluation, as proposed in this paper, allows an in-depth comparison of QRS detection algorithms removing the limitations of existing one-dimensional methods. The method enables the assessment of the QRS detection algorithms according to the medical device application area and corresponding requirements of temporal accuracy, immunity to noise, and QRS morphology types. The analysis shows also that, for some algorithms, adding muscular noise to the ECG signal improves algorithm accuracy results.

2.
Entropy (Basel) ; 25(3)2023 Mar 17.
Artigo em Inglês | MEDLINE | ID: mdl-36981408

RESUMO

Recurrent Neural Networks (RNNs) are applied in safety-critical fields such as autonomous driving, aircraft collision detection, and smart credit. They are highly susceptible to input perturbations, but little research on RNN-oriented testing techniques has been conducted, leaving a threat to a large number of sequential application domains. To address these gaps, improve the test adequacy of RNNs, find more defects, and improve the performance of RNNs models and their robustness to input perturbations. We aim to propose a test coverage metric for the underlying structure of RNNs, which is used to guide the generation of test inputs to test RNNs. Although coverage metrics have been proposed for RNNs, such as the hidden state coverage in RNN-Test, they ignore the fact that the underlying structure of RNNs is still a fully connected neural network but with an additional "delayer" that records the network state at the time of data input. We use the contributions, i.e., the combination of the outputs of neurons and the weights they emit, as the minimum computational unit of RNNs to explore the finer-grained logical structure inside the recurrent cells. Compared to existing coverage metrics, our research covers the decision mechanism of RNNs in more detail and is more likely to generate more adversarial samples and discover more flaws in the model. In this paper, we redefine the contribution coverage metric applicable to Stacked LSTMs and Stacked GRUs by considering the joint effect of neurons and weights in the underlying structure of the neural network. We propose a new coverage metric, RNNCon, which can be used to guide the generation of adversarial test inputs. And we design and implement a test framework prototype RNNCon-Test. 2 datasets, 4 LSTM models, and 4 GRU models are used to verify the effectiveness of RNNCon-Test. Compared to the current state-of-the-art study RNN-Test, RNNCon can cover a deeper decision logic of RNNs. RNNCon-Test is not only effective in identifying defects in Deep Learning (DL) systems but also in improving the performance of the model if the adversarial inputs generated by RNNCon-Test are filtered and added to the training set to retrain the model. In the case where the accuracy of the model is already high, RNNCon-Test is still able to improve the accuracy of the model by up to 0.45%.

3.
Softw Qual J ; 31(3): 947-990, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37692292

RESUMO

Research in software testing often involves the development of software prototypes. Like any piece of software, there are challenges in the development, use and verification of such tools. However, some challenges are rather specific to this problem domain. For example, often these tools are developed by PhD students straight out of bachelor/master degrees, possibly lacking any industrial experience in software development. Prototype tools are used to carry out empirical studies, possibly studying different parameters of novel designed algorithms. Software scaffolding is needed to run large sets of experiments efficiently. Furthermore, when using AI-based techniques like evolutionary algorithms, care needs to be taken to deal with their randomness, which further complicates their verification. The aforementioned represent some of the challenges we have identified for this domain. In this paper, we report on our experience in building the open-source EvoMaster tool, which aims at system-level test case generation for enterprise applications. Many of the challenges we faced would be common to any researcher needing to build software testing tool prototypes. Therefore, one goal is that our shared experience here will boost the research community, by providing concrete solutions to many development challenges in the building of such kind of research prototypes. Ultimately, this will lead to increase the impact of scientific research on industrial practice.

4.
Empir Softw Eng ; 27(7): 187, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36199835

RESUMO

Test flakiness is a phenomenon occurring when a test case is non-deterministic and exhibits both a passing and failing behavior when run against the same code. Over the last years, the problem has been closely investigated by researchers and practitioners, who all have shown its relevance in practice. The software engineering research community has been working toward defining approaches for detecting and addressing test flakiness. Despite being quite accurate, most of these approaches rely on expensive dynamic steps, e.g., the computation of code coverage information. Consequently, they might suffer from scalability issues that possibly preclude their practical use. This limitation has been recently targeted through machine learning solutions that could predict the flakiness of tests using various features, like source code vocabulary or a mixture of static and dynamic metrics computed on individual snapshots of the system. In this paper, we aim to perform a step forward and predict test flakiness only using static metrics. We propose a large-scale experiment on 70 Java projects coming from the iDFlakies and FlakeFlagger datasets. First, we statistically assess the differences between flaky and non-flaky tests in terms of 25 test and production code metrics and smells, analyzing both their individual and combined effects. Based on the results achieved, we experiment with a machine learning approach that predicts test flakiness solely based on static features, comparing it with two state-of-the-art approaches. The key results of the study show that the static approach has performance comparable to those of the baselines. In addition, we found that the characteristics of the production code might impact the performance of the flaky test prediction models.

5.
Philos Trans A Math Phys Eng Sci ; 379(2197): 20200069, 2021 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-33775145

RESUMO

We carry out efforts to reproduce computational results for seven published articles and identify barriers to computational reproducibility. We then derive three principles to guide the practice and dissemination of reproducible computational research: (i) Provide transparency regarding how computational results are produced; (ii) When writing and releasing research software, aim for ease of (re-)executability; (iii) Make any code upon which the results rely as deterministic as possible. We then exemplify these three principles with 12 specific guidelines for their implementation in practice. We illustrate the three principles of reproducible research with a series of vignettes from our experimental reproducibility work. We define a novel Reproduction Package, a formalism that specifies a structured way to share computational research artifacts that implements the guidelines generated from our reproduction efforts to allow others to build, reproduce and extend computational science. We make our reproduction efforts in this paper publicly available as exemplar Reproduction Packages. This article is part of the theme issue 'Reliability and reproducibility in computational science: implementing verification, validation and uncertainty quantification in silico'.

6.
Sensors (Basel) ; 21(16)2021 Aug 09.
Artigo em Inglês | MEDLINE | ID: mdl-34450820

RESUMO

Nowadays, REpresentational State Transfer Application Programming Interfaces (REST APIs) are widely used in web applications, hence a plethora of test cases are developed to validate the APIs calls. We propose a solution that automates the generation of test cases for REST APIs based on their specifications. In our approach, apart from the automatic generation of test cases, we provide an option for the user to influence the test case generation process. By adding user interaction, we aim to augment the automatic generation of APIs test cases with human testing expertise and specific context. We use the latest version of OpenAPI 3.x and a wide range of coverage metrics to analyze the functionality and performance of the generated test cases, and non-functional metrics to analyze the performance of the APIs. The experiments proved the effectiveness and practicability of our method.


Assuntos
Software , Humanos
7.
Comput Sci Eng ; 22(2): 78-87, 2018 Nov 13.
Artigo em Inglês | MEDLINE | ID: mdl-32461738

RESUMO

Scientific model developers are able to verify and validate their software via metamorphic testing, even when the expected output of a given test case is not readily available. The tenet is to check whether certain relations hold among the expected outputs of multiple related inputs. Contemporary approaches require the relations to be defined before tests. Our experience shows that it is often straightforward to first define the multiple iterations of tests for performing continuous simulations, and then keep multiple and even competing metamorphic relations open for investigating the testing-result patterns. We call this new approach exploratory metamorphic testing, and report our experience of applying it to detect bugs, mismatches, and constraints in automatically calibrating parameters for the United States Environmental Protection Agency's Storm Water Management Model (SWMM).

8.
Inf Softw Technol ; 56(10): 1219-1232, 2014 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-25125798

RESUMO

CONTEXT: Scientific software plays an important role in critical decision making, for example making weather predictions based on climate models, and computation of evidence for research publications. Recently, scientists have had to retract publications due to errors caused by software faults. Systematic testing can identify such faults in code. OBJECTIVE: This study aims to identify specific challenges, proposed solutions, and unsolved problems faced when testing scientific software. METHOD: We conducted a systematic literature survey to identify and analyze relevant literature. We identified 62 studies that provided relevant information about testing scientific software. RESULTS: We found that challenges faced when testing scientific software fall into two main categories: (1) testing challenges that occur due to characteristics of scientific software such as oracle problems and (2) testing challenges that occur due to cultural differences between scientists and the software engineering community such as viewing the code and the model that it implements as inseparable entities. In addition, we identified methods to potentially overcome these challenges and their limitations. Finally we describe unsolved challenges and how software engineering researchers and practitioners can help to overcome them. CONCLUSIONS: Scientific software presents special challenges for testing. Specifically, cultural differences between scientist developers and software engineers, along with the characteristics of the scientific software make testing more difficult. Existing techniques such as code clone detection can help to improve the testing process. Software engineers should consider special challenges posed by scientific software such as oracle problems when developing testing techniques.

9.
Front Robot AI ; 11: 1363281, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39157792

RESUMO

Few mobile robot developers already test their software on simulated robots in virtual environments or sceneries. However, the majority still shy away from simulation-based test campaigns because it remains challenging to specify and execute suitable testing scenarios, that is, models of the environment and the robots' tasks. Through developer interviews, we identified that managing the enormous variability of testing scenarios is a major barrier to the application of simulation-based testing in robotics. Furthermore, traditional CAD or 3D-modelling tools such as SolidWorks, 3ds Max, or Blender are not suitable for specifying sceneries that vary significantly and serve different testing objectives. For some testing campaigns, it is required that the scenery replicates the dynamic (e.g., opening doors) and static features of real-world environments, whereas for others, simplified scenery is sufficient. Similarly, the task and mission specifications used for simulation-based testing range from simple point-to-point navigation tasks to more elaborate tasks that require advanced deliberation and decision-making. We propose the concept of composable and executable scenarios and associated tooling to support developers in specifying, reusing, and executing scenarios for the simulation-based testing of robotic systems. Our approach differs from traditional approaches in that it offers a means of creating scenarios that allow the addition of new semantics (e.g., dynamic elements such as doors or varying task specifications) to existing models without altering them. Thus, we can systematically construct richer scenarios that remain manageable. We evaluated our approach in a small simulation-based testing campaign, with scenarios defined around the navigation stack of a mobile robot. The scenarios gradually increased in complexity, composing new features into the scenery of previous scenarios. Our evaluation demonstrated how our approach can facilitate the reuse of models and revealed the presence of errors in the configuration of the publicly available navigation stack of our SUT, which had gone unnoticed despite its frequent use.

10.
JMIR Nurs ; 7: e56585, 2024 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-39028552

RESUMO

eHealth interventions are becoming a part of standard care, with software solutions increasingly created for patients and health care providers. Testing of eHealth software is important to ensure that the software realizes its goals. Software testing, which is comprised of alpha and beta testing, is critical to establish the effectiveness and usability of the software. In this viewpoint, we explore existing practices for testing software in health care settings. We scanned the literature using search terms related to eHealth software testing (eg, "health alpha testing," "eHealth testing," and "health app usability") to identify practices for testing eHealth software. We could not identify a single standard framework for software testing in health care settings; some articles reported frameworks, while others reported none. In addition, some authors misidentified alpha testing as beta testing and vice versa. There were several different objectives (ie, testing for safety, reliability, or usability) and methods of testing (eg, questionnaires, interviews) reported. Implementation of an iterative strategy in testing can introduce flexible and rapid changes when developing eHealth software. Further investigation into the best approach for software testing in health care settings would aid the development of effective and useful eHealth software, particularly for novice eHealth software developers.


Assuntos
Software , Telemedicina , Humanos , Telemedicina/tendências , Software/tendências , Reprodutibilidade dos Testes
11.
PeerJ Comput Sci ; 9: e1647, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38077586

RESUMO

The occurrence of faults in software systems represents an inevitable predicament. Testing is the most common means to detect such faults; however, exhaustive testing is not feasible for any nontrivial system. Software fault prediction (SFP), which identifies software components that are more prone to errors, seeks to supplement the testing process. Thus, testing efforts can be focused on such modules. Various approaches exist for SFP, with machine learning (ML) emerging as the prevailing methodology. ML-based SFP relies on a wide range of metrics, ranging from file-level and class-level to method-level and even line-level metrics. More granularized metrics are expected to possess a higher degree of micro-level coverage of the code. The Halstead metric suite offers coverage at the line level and has been extensively employed across diverse domains such as fault prediction, quality assessment, and similarity approximation for the past three decades. In this article, we propose to decompose Halstead base metrics and evaluate their fault prediction capability. The Halstead base metrics consist of operators and operands. In the context of the Java language, we partition operators into five distinct categories, i.e., assignment operators, arithmetic operators, logical operators, relational operators, and all other types of operators. Similarly, operands are classified into two classes: constants and variables. For the purpose of empirical evaluation, two experiments were designed. In the first experiment, the Halstead base metrics were used along with McCabe, Lines of Code (LoC), and Halstead-derived metrics as predictors. In the second experiment, decomposed Halstead base metrics were used along with McCabe, LoC, and Halstead-derived metrics. Five public datasets were selected for the experiments. The ML classifiers used included logistic regression, naïve Bayes, decision tree, multilayer perceptron, random forest, and support vector machines. The ML classifiers' effectiveness was assessed through metrics such as accuracy, F-measure, and AUC. Accuracy saw an enhancement from 0.82 to 0.97, while F-measure exhibited improvement from 0.81 to 0.99. Correspondingly, the AUC value advanced from 0.79 to 0.99. These findings highlight the superior performance of decomposed Halstead metrics, as opposed to the original Halstead base metrics, in predicting faults across all datasets.

12.
PeerJ Comput Sci ; 9: e1454, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37705636

RESUMO

Background: When using deep learning models, one of the most critical vulnerabilities is their exposure to adversarial inputs, which can cause wrong decisions (e.g., incorrect classification of an image) with minor perturbations. To address this vulnerability, it becomes necessary to retrain the affected model against adversarial inputs as part of the software testing process. In order to make this process energy efficient, data scientists need support on which are the best guidance metrics for reducing the adversarial inputs to create and use during testing, as well as optimal dataset configurations. Aim: We examined six guidance metrics for retraining deep learning models, specifically with convolutional neural network architecture, and three retraining configurations. Our goal is to improve the convolutional neural networks against the attack of adversarial inputs with regard to the accuracy, resource utilization and execution time from the point of view of a data scientist in the context of image classification. Method: We conducted an empirical study using five datasets for image classification. We explore: (a) the accuracy, resource utilization, and execution time of retraining convolutional neural networks with the guidance of six different guidance metrics (neuron coverage, likelihood-based surprise adequacy, distance-based surprise adequacy, DeepGini, softmax entropy and random), (b) the accuracy and resource utilization of retraining convolutional neural networks with three different configurations (one-step adversarial retraining, adversarial retraining and adversarial fine-tuning). Results: We reveal that adversarial retraining from original model weights, and by ordering with uncertainty metrics, gives the best model w.r.t. accuracy, resource utilization, and execution time. Conclusions: Although more studies are necessary, we recommend data scientists use the above configuration and metrics to deal with the vulnerability to adversarial inputs of deep learning models, as they can improve their models against adversarial inputs without using many inputs and without creating numerous adversarial inputs. We also show that dataset size has an important impact on the results.

13.
Front Med (Lausanne) ; 10: 1305415, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38259836

RESUMO

The growing interest in data-driven medicine, in conjunction with the formation of initiatives such as the European Health Data Space (EHDS) has demonstrated the need for methodologies that are capable of facilitating privacy-preserving data analysis. Distributed Analytics (DA) as an enabler for privacy-preserving analysis across multiple data sources has shown its potential to support data-intensive research. However, the application of DA creates new challenges stemming from its distributed nature, such as identifying single points of failure (SPOFs) in DA tasks before their actual execution. Failing to detect such SPOFs can, for example, result in improper termination of the DA code, necessitating additional efforts from multiple stakeholders to resolve the malfunctions. Moreover, these malfunctions disrupt the seamless conduct of DA and entail several crucial consequences, including technical obstacles to resolve the issues, potential delays in research outcomes, and increased costs. In this study, we address this challenge by introducing a concept based on a method called Smoke Testing, an initial and foundational test run to ensure the operability of the analysis code. We review existing DA platforms and systematically extract six specific Smoke Testing criteria for DA applications. With these criteria in mind, we create an interactive environment called Development Environment for AuTomated and Holistic Smoke Testing of Analysis-Runs (DEATHSTAR), which allows researchers to perform Smoke Tests on their DA experiments. We conduct a user-study with 29 participants to assess our environment and additionally apply it to three real use cases. The results of our evaluation validate its effectiveness, revealing that 96.6% of the analyses created and (Smoke) tested by participants using our approach successfully terminated without any errors. Thus, by incorporating Smoke Testing as a fundamental method, our approach helps identify potential malfunctions early in the development process, ensuring smoother data-driven research within the scope of DA. Through its flexibility and adaptability to diverse real use cases, our solution enables more robust and efficient development of DA experiments, which contributes to their reliability.

14.
PeerJ Comput Sci ; 9: e1625, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38077525

RESUMO

Software systems typically have an input domain that can be subdivided into sub-domains, each of which generates similar or related outputs. Testing it on the boundaries between these sub-domains is critical to ensure high-quality software. Therefore, boundary value analysis and testing have been a fundamental part of the software testing toolbox for a long time and are typically taught early to software engineering students. Despite its many argued benefits, boundary value analysis for a given software specification or application is typically described in abstract terms. This allows for variation in how testers apply it and in the benefits they see. Additionally, its adoption has been limited since it requires a specification or model to be analysed. We propose an automated black-box boundary value detection method to support software testers in performing systematic boundary value analysis. This dynamic method can be utilized even without a specification or model. The proposed method is based on a metric referred to as the program derivative, which quantifies the level of boundariness of test inputs. By combining this metric with search algorithms, we can identify and rank pairs of inputs as good boundary candidates, i.e., inputs that are in close proximity to each other but with outputs that are far apart. We have implemented the AutoBVA approach and evaluated it on a curated dataset of example programs. Furthermore, we have applied the approach broadly to a sample of 613 functions from the base library of the Julia programming language. The approach could identify boundary candidates that highlight diverse boundary behaviours in over 70% of investigated systems under test. The results demonstrate that even a simple variant of the program derivative, combined with broad sampling and search over the input space, can identify interesting boundary candidates for a significant portion of the functions under investigation. In conclusion, we also discuss the future extension of the approach to encompass more complex systems under test cases and datatypes.

15.
Microb Genom ; 8(3)2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35259087

RESUMO

Computational algorithms have become an essential component of research, with great efforts by the scientific community to raise standards on development and distribution of code. Despite these efforts, sustainability and reproducibility are major issues since continued validation through software testing is still not a widely adopted practice. Here, we report seven recommendations that help researchers implement software testing in microbial bioinformatics. We have developed these recommendations based on our experience from a collaborative hackathon organised prior to the American Society for Microbiology Next Generation Sequencing (ASM NGS) 2020 conference. We also present a repository hosting examples and guidelines for testing, available from https://github.com/microbinfie-hackathon2020/CSIS.


Assuntos
Biologia Computacional , Software , Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Reprodutibilidade dos Testes , Estados Unidos
16.
Math Biosci Eng ; 19(6): 6124-6140, 2022 04 14.
Artigo em Inglês | MEDLINE | ID: mdl-35603394

RESUMO

In software engineering, testing has long been a research area of software maintenance. Testing is extremely expensive, and there is no guarantee that all defects will be found within a single round of testing. Therefore, fixing defects that are not discovered by a single round of testing is important for reducing the test costs. During the software maintenance process, testing is conducted within the scope of a set of test cases called a test suite. Mutation testing is a method that uses mutants to evaluate whether the test cases of the test suite are appropriate. In this paper, an approach is proposed that uses the mutants of a mutation test to identify defects that are not discovered through a single round of testing. The proposed method simultaneously applies two or more mutants to a single program to define and record the relationships between different lines of code. In turn, these relationships are examined using the defects that were discovered by a single round of testing, and possible defects are recommended from among the recorded candidates. To evaluate the proposed method, a comparative study was conducted using the fault localization method, which is commonly employed in defect prediction, as well as the Defects4J defect prediction dataset, which is widely used in software defect prediction. The results of the evaluation showed that the proposed method achieves a better performance than seven other fault localization methods (Tarantula, Ochiai, Opt2, Barinel, Dstar2, Muse, and Jaccard).


Assuntos
Software , Mutação
17.
PeerJ Comput Sci ; 8: e720, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35494846

RESUMO

Combinatorial interaction testing, which is a technique to verify a system with numerous input parameters, employs a mathematical object called a covering array as a test input. This technique generates a limited number of test cases while guaranteeing a given combinatorial coverage. Although this area has been studied extensively, handling constraints among input parameters remains a major challenge, which may significantly increase the cost to generate covering arrays. In this work, we propose a mathematical operation, called "weaken-product based combinatorial join", which constructs a new covering array from two existing covering arrays. The operation reuses existing covering arrays to save computational resource by increasing parallelism during generation without losing combinatorial coverage of the original arrays. Our proposed method significantly reduce the covering array generation time by 13-96% depending on use case scenarios.

18.
PeerJ Comput Sci ; 7: e563, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34150999

RESUMO

Software Fault Prediction (SFP) assists in the identification of faulty classes, and software metrics provide us with a mechanism for this purpose. Besides others, metrics addressing inheritance in Object-Oriented (OO) are important as these measure depth, hierarchy, width, and overriding complexity of the software. In this paper, we evaluated the exclusive use, and viability of inheritance metrics in SFP through experiments. We perform a survey of inheritance metrics whose data sets are publicly available, and collected about 40 data sets having inheritance metrics. We cleaned, and filtered them, and captured nine inheritance metrics. After preprocessing, we divided selected data sets into all possible combinations of inheritance metrics, and then we merged similar metrics. We then formed 67 data sets containing only inheritance metrics that have nominal binary class labels. We performed a model building, and validation for Support Vector Machine(SVM). Results of Cross-Entropy, Accuracy, F-Measure, and AUC advocate viability of inheritance metrics in software fault prediction. Furthermore, ic, noc, and dit metrics are helpful in reduction of error entropy rate over the rest of the 67 feature sets.

19.
Data Brief ; 31: 105962, 2020 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-32671160

RESUMO

The paper presents a dataset on software tests generated using the Microsoft Pex (IntelliTest) test generator tool for 10 open source projects. The projects were selected randomly from popular GitHub repositories written in C#. The selected projects contain 7187 methods from which Pex was able to generate tests for 2596 methods totaling 38,618 lines of code. Data collection was performed on a cloud virtual machine. The dataset presents metrics about the attributes of the selected projects (e.g., cyclomatic complexity or number of external method calls) and the test generation (e.g., statement and branch coverage, number of warnings). This data is compared to an automated isolation technique in the paper Automated Isolation for White-box Test Generation [1]. To the best of our knowledge, this is the largest public dataset about the test generation performance of Microsoft Pex on open source projects. The dataset highlights current practical challenges and can be used as a baseline for new test generation techniques.

20.
JMIR Form Res ; 3(4): e14372, 2019 Oct 24.
Artigo em Inglês | MEDLINE | ID: mdl-31651406

RESUMO

BACKGROUND: Attention is turning toward increasing the quality of websites and quality evaluation to attract new users and retain existing users. OBJECTIVE: This scoping study aimed to review and define existing worldwide methodologies and techniques to evaluate websites and provide a framework of appropriate website attributes that could be applied to any future website evaluations. METHODS: We systematically searched electronic databases and gray literature for studies of website evaluation. The results were exported to EndNote software, duplicates were removed, and eligible studies were identified. The results have been presented in narrative form. RESULTS: A total of 69 studies met the inclusion criteria. The extracted data included type of website, aim or purpose of the study, study populations (users and experts), sample size, setting (controlled environment and remotely assessed), website attributes evaluated, process of methodology, and process of analysis. Methods of evaluation varied and included questionnaires, observed website browsing, interviews or focus groups, and Web usage analysis. Evaluations using both users and experts and controlled and remote settings are represented. Website attributes that were examined included usability or ease of use, content, design criteria, functionality, appearance, interactivity, satisfaction, and loyalty. Website evaluation methods should be tailored to the needs of specific websites and individual aims of evaluations. GoodWeb, a website evaluation guide, has been presented with a case scenario. CONCLUSIONS: This scoping study supports the open debate of defining the quality of websites, and there are numerous approaches and models to evaluate it. However, as this study provides a framework of the existing literature of website evaluation, it presents a guide of options for evaluating websites, including which attributes to analyze and options for appropriate methods.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa