Description:
The advancement of artificial intelligence and the implementation of machine learning capabilities in programming languages such as Python, along with cloud services, allow researchers to apply methods to cluster and predict behaviors and patterns in software engineering data. On the other hand, these methods need a large amount of data in order to work with high accuracy in different contexts. This paper introduces Sonarlizer Xplorer: a tool that captures a large number of technical debt items and code metrics from public GitHub projects. Sonarlizer Xplorer is composed of two sub-tools. The first is Github Xplorer, responsible for mining public Github repositories from an initial project. The second is Sonarlizer, responsible for taking projects and analyzing them using SonarQube. We used the tool over four months, collecting technical debt items and code metrics on almost 46,000 public Java projects. In addition, we mined over 57 million repositories and 4 million users.