Treffer: Improving Maintainability in Python Code using Large Language Models
Weitere Informationen
Dataset Files- Training data: - `commit_train1.csv` (197.85 MB) - CommitPackFT training dataset - `python_train1.csv` (94.75 MB) - CodeAlpaca training dataset - Validation data: - `commit_valid1.csv` (28.31 MB) - CommitPackFT validation set - `python_valid1.csv` (12.06 MB) - CodeAlpaca validation set - Test data: - `commit_test1.csv` (27.98 MB) - CommitPackFT test set - `python_test1.csv` (12.17 MB) - CodeAlpaca test set Model Files- `GPT3.5.zip` (16.54 MB) - Fine-tuned GPT-3.5 model files- `Wizard_maintain_commit.zip` (777.96 MB) - WizardCoder model fine-tuned on CommitPackFT artifacts- `Wizard_maintain_python.zip` (778.34 MB) - WizardCoder model fine-tuned on CodeAlpaca artifacts- `Evaluation.zip` (12.81 MB) - Model evaluation results and metrics Analysis Notebooks- `Analysis.ipynb` - General analysis notebook- `Analysis_python_data.ipynb` - Analysis of CodeAlpaca dataset- `Analysis_commit_data.ipynb` - Analysis of CommitPackFT dataset- `Commit_code_optimisation.ipynb` - Training notebook for CommitPackFT - `python_optimisation_train_sft.ipynb` - Training notebook for CodeAlpaca Python The project focuses on improving python code maintainability using LLMs. The files include raw datasets, fine-tuned models, and comprehensive analysis notebooks.
This study aims to enhance the maintainability of code generated by Large Language Models (LLMs), with a focus on the Python programming language. As the use of LLMs for coding assistance grows, so do concerns about the maintainability of the code they produce. Previous research has mainly concentrated on the functional accuracy and testing success of generated code, overlooking aspects of maintainability. Our approach involves the use of a specifically designed dataset for training and evaluating the model, ensuring a thorough assessment of code maintainability. At the heart of our work is the fine-tuning of an LLM for code refactoring, aimed at enhancing code readability, reducing complexity, and improving overall maintainability. After fine-tuning an LLM to prioritize code maintainability, our evaluations indicate that this model significantly improves code maintainability standards, suggesting a promising direction for the future of AI-assisted software development.