Treffer: A new methodology for datascience automation in javaKlass: PTDD
Weitere Informationen
Data science research is a multidisciplinary activity where people with different backgrounds and skills (mathematicians, physicists, computer scientists, etc.) often work together to design and build software that implements research results. Long-term projects face stability, maintainability, scalability, and reproducibility challenges when a large number of developers are involved, when very old code coexists with new code, and when the development team faces high volatility due to the inherent nature of research teams, often related to financial issues. JavaKLASS is a data science software developed after 30 years of research under the leadership of Karina Gibert and her team of more than 25 researchers and developers from different backgrounds. The system is a Java desktop application that needs to evolve to a new version where it can be used from different interfaces, including batch usage. In this thesis, we will design and build an end-to-end scripting language for javaKLASS, so that scripts can be used to execute the various data science processes supported by javaKLASS in different ways: either called from the current javaKLASS graphical interface, or from a batch process. By implementing this scripting language, we’re also opening the door to defining a set of scripts that can also be used to test intensively the stability of the code as new developers extend the functionality of the system. These test scripts will provide a mechanism for a comprehensive and automated testing process, introducing a new methodology we’ll call Process Testing Driven Development (PTDD). This new methodology is intended to ensure that new developments do not break existing functionality and to add robustness to future developments and software upgrades. These tests will also be used in the long term to support software refactoring activities.