Pesquisa | Biblioteca Virtual em Saúde

1.

Quick remedy commits and their impact on mining software repositories.

Wen, Fengcai; Nagy, Csaba; Lanza, Michele; Bavota, Gabriele.

Empir Softw Eng ; 27(1): 14, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-34744487

RESUMO

Most changes during software maintenance and evolution are not atomic changes, but rather the result of several related changes affecting different parts of the code. It may happen that developers omit needed changes, thus leaving a task partially unfinished, introducing technical debt or injecting bugs. We present a study investigating "quick remedy commits" performed by developers to implement changes omitted in previous commits. With quick remedy commits we refer to commits that (i) quickly follow a commit performed by the same developer, and (ii) aim at remedying issues introduced as the result of code changes omitted in the previous commit (e.g., fix references to code components that have been broken as a consequence of a rename refactoring) or simply improve the previously committed change (e.g., improve the name of a newly introduced variable). Through a manual analysis of 500 quick remedy commits, we define a taxonomy categorizing the types of changes that developers tend to omit. The taxonomy can (i) guide the development of tools aimed at detecting omitted changes and (ii) help researchers in identifying corner cases that must be properly handled. For example, one of the categories in our taxonomy groups the reverted commits, meaning changes that are undone in a subsequent commit. We show that not accounting for such commits when mining software repositories can undermine one's findings. In particular, our results show that considering completely reverted commits when mining software repositories accounts, on average, for 0.07 and 0.27 noisy data points when dealing with two typical MSR data collection tasks (i.e., bug-fixing commits identification and refactoring operations mining, respectively).

2.

One-off events? An empirical study of hackathon code creation and reuse.

Mahmoud, Ahmed Samir Imam; Dey, Tapajit; Nolte, Alexander; Mockus, Audris; Herbsleb, James D.

Empir Softw Eng ; 27(7): 167, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36159898

RESUMO

Context: Hackathons have become popular events for teams to collaborate on projects and develop software prototypes. Most existing research focuses on activities during an event with limited attention to the evolution of the hackathon code. Objective: We aim to understand the evolution of code used in and created during hackathon events, with a particular focus on the code blobs, specifically, how frequently hackathon teams reuse pre-existing code, how much new code they develop, if that code gets reused afterwards, and what factors affect reuse. Method: We collected information about 22,183 hackathon projects from Devpost and obtained related code blobs, authors, project characteristics, original author, code creation time, language, and size information from World of Code. We tracked the reuse of code blobs by identifying all commits containing blobs created during hackathons and identifying all projects that contain those commits. We also conducted a series of surveys in order to gain a deeper understanding of hackathon code evolution that we sent out to hackathon participants whose code was reused, whose code was not reused, and developers who reused some hackathon code. Result: 9.14% of the code blobs in hackathon repositories and 8% of the lines of code (LOC) are created during hackathons and around a third of the hackathon code gets reused in other projects by both blob count and LOC. The number of associated technologies and the number of participants in hackathons increase reuse probability. Conclusion: The results of our study demonstrates hackathons are not always "one-off" events as the common knowledge dictates and it can serve as a starting point for further studies in this area.

3.

Entropy Churn Metrics for Fault Prediction in Software Systems.

Kaur, Arvinder; Chopra, Deepti.

Entropy (Basel) ; 20(12)2018 Dec 13.

Artigo em Inglês | MEDLINE | ID: mdl-33266687

RESUMO

Fault prediction is an important research area that aids software development and the maintenance process. It is a field that has been continuously improving its approaches in order to reduce the fault resolution time and effort. With an aim to contribute towards building new approaches for fault prediction, this paper proposes Entropy Churn Metrics (ECM) based on History Complexity Metrics (HCM) and Churn of Source Code Metrics (CHU). The study also compares performance of ECM with that of HCM. The performance of both these metrics is compared for 14 subsystems of 5different software projects: Android, Eclipse, Apache Http Server, Eclipse C/C++ Development Tooling (CDT), and Mozilla Firefox. The study also analyses the software subsystems on three parameters: (i) distribution of faults, (ii) subsystem size, and (iii) programming language, to determine which characteristics of software systems make HCM or ECM more preferred over others.

4.

Dependency management bots in open-source systems-prevalence and adoption.

Erlenhov, Linda; de Oliveira Neto, Francisco Gomes; Leitner, Philipp.

PeerJ Comput Sci ; 8: e849, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35494797

RESUMO

Bots have become active contributors in maintaining open-source repositories. However, the definitions of bot activity in open-source software vary from a more lenient stance encompassing every non-human contributions vs frameworks that cover contributions from tools that have autonomy or human-like traits (i.e., Devbots). Understanding which of those definitions are being used is essential to enable (i) reliable sampling of bots and (ii) fair comparison of their practical impact in, e.g., developers' productivity. This paper reports on an empirical study composed of both quantitative and qualitative analysis of bot activity. By analysing those two bot definitions in an existing dataset of bot commits, we see that only 10 out of 54 listed tools (mainly dependency management) comply with the characteristics of Devbots. Moreover, five of those Devbots have similar patterns of contributions over 93 projects, such as similar proportions of merged pull-requests and days until issues are closed. Our analysis also reveals that most projects (77%) experiment with more than one bot before deciding to adopt or switch between bots. In fact, a thematic analysis of developers' comments in those projects reveal factors driving the discussions about Devbot adoption or removal, such as the impact of the generated noise and the needed adaptation in development practices within the project.

5.

GrimoireLab: A toolset for software development analytics.

Dueñas, Santiago; Cosentino, Valerio; Gonzalez-Barahona, Jesus M; Del Castillo San Felix, Alvaro; Izquierdo-Cortazar, Daniel; Cañas-Díaz, Luis; Pérez García-Plaza, Alberto.

PeerJ Comput Sci ; 7: e601, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34307858

RESUMO

BACKGROUND: After many years of research on software repositories, the knowledge for building mature, reusable tools that perform data retrieval, storage and basic analytics is readily available. However, there is still room to improvement in the area of reusable tools implementing this knowledge. GOAL: To produce a reusable toolset supporting the most common tasks when retrieving, curating and visualizing data from software repositories, allowing for the easy reproduction of data sets ready for more complex analytics, and sparing the researcher or the analyst of most of the tasks that can be automated. METHOD: Use our experience in building tools in this domain to identify a collection of scenarios where a reusable toolset would be convenient, and the main components of such a toolset. Then build those components, and refine them incrementally using the feedback from their use in both commercial, community-based, and academic environments. RESULTS: GrimoireLab, an efficient toolset composed of five main components, supporting about 30 different kinds of data sources related to software development. It has been tested in many environments, for performing different kinds of studies, and providing different kinds of services. It features a common API for accessing the retrieved data, facilities for relating items from different data sources, semi-structured storage for easing later analysis and reproduction, and basic facilities for visualization, preliminary analysis and drill-down in the data. It is also modular, making it easy to support new kinds of data sources and analysis. CONCLUSIONS: We present a mature toolset, widely tested in the field, that can help to improve the situation in the area of reusable tools for mining software repositories. We show some scenarios where it has already been used. We expect it will help to reduce the effort for doing studies or providing services in this area, leading to advances in reproducibility and comparison of results.

6.

Studying the impact of CI on pull request delivery time in open source projects-a conceptual replication.

Guo, Yunfang; Leitner, Philipp.

PeerJ Comput Sci ; 5: e245, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-33816896

RESUMO

Nowadays, continuous integration (CI) is indispensable in the software development process. A central promise of adopting CI is that new features or bug fixes can be delivered more quickly. A recent repository mining study by Bernardo, da Costa & Kulesza (2018) found that only about half of the investigated open source projects actually deliver pull requests (PR) faster after adopting CI, with small effect sizes. However, there are some concerns regarding the methodology used by Bernardo et al., which may potentially limit the trustworthiness of this finding. Particularly, they do not explicitly control for normal changes in the pull request delivery time during a project's lifetime (independently of CI introduction). Hence, in our work, we conduct a conceptual replication of this study. In a first step, we replicate their study results using the same subjects and methodology. In a second step, we address the same core research question using an adapted methodology. We use a different statistical method (regression discontinuity design, RDD) that is more robust towards the confounding factor of projects potentially getting faster in delivering PRs over time naturally, and we introduce a control group of comparable projects that never applied CI. Finally, we also evaluate the generalizability of the original findings on a set of new open source projects sampled using the same methodology. We find that the results of the study by Bernardo et al. largely hold in our replication. Using RDD, we do not find robust evidence of projects getting faster at delivering PRs without CI, and we similarly do not see a speed-up in our control group that never introduced CI. Further, results obtained from a newly mined set of projects are comparable to the original findings. In conclusion, we consider the replication successful.

7.

The SEOSS 33 dataset - Requirements, bug reports, code history, and trace links for entire projects.

Rath, Michael; Mäder, Patrick.

Data Brief ; 25: 104005, 2019 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-31198827

RESUMO

This paper provides a systematically retrieved dataset consisting of 33 open-source software projects containing a large number of typed artifacts and trace links between them. The artifacts stem from the projects' issue tracking system and source version control system to enable their joint analysis. Enriched with additional metadata, such as time stamps, release versions, component information, and developer comments, the dataset is highly suitable for empirical research, e.g., in requirements and software traceability analysis, software evolution, bug and feature localization, and stakeholder collaboration. It can stimulate new research directions, facilitate the replication of existing studies, and act as benchmark for the comparison of competing approaches. The data is hosted on Harvard Dataverse using DOI 10.7910/DVN/PDDZ4Q accessible via https://bit.ly/2wukCHc.

8.

Studying the laws of software evolution in a long-lived FLOSS project.

Gonzalez-Barahona, Jesus M; Robles, Gregorio; Herraiz, Israel; Ortega, Felipe.

J Softw (Malden) ; 26(7): 589-612, 2014 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-25893093

RESUMO

Some free, open-source software projects have been around for quite a long time, the longest living ones dating from the early 1980s. For some of them, detailed information about their evolution is available in source code management systems tracking all their code changes for periods of more than 15 years. This paper examines in detail the evolution of one of such projects, glibc, with the main aim of understanding how it evolved and how it matched Lehman's laws of software evolution. As a result, we have developed a methodology for studying the evolution of such long-lived projects based on the information in their source code management repository, described in detail several aspects of the history of glibc, including some activity and size metrics, and found how some of the laws of software evolution may not hold in this case. © 2013 The Authors. Journal of Software: Evolution and Process published by John Wiley & Sons Ltd.

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA