Treffer: Exploring Inefficiencies in Implementations Utilizing GPUs for Novel View Synthesis of Dynamic Scenes : Limitations of Modern Computer Vision Models and Possible Enhancements ; Utforskning av ineffektivitet i GPU-baserade implementationer för syntes av tidigare osedda vyer i dynamiska scener : Begränsningar i Moderna Datorseendemodeller och Möjliga Förbättringar
Weitere Informationen
This thesis investigates the computational inefficiencies in existing machine learning models for novel view synthesis, which is the task of generating images of observed scenes from new view points. Modern models are analyzed, and three models are selected for a detailed examination of their implementation. The goal is to identify factors that limit the efficiency of these models during both inference and training phases and to optimize them. Inefficiencies can arise from poor implementations or suboptimal resource usage, especially when memory is not properly reused across training iterations or when hardware, particularly Graphics Processing Units (GPUs), are not fully utilized. The thesis addresses the question: What are the limiting factors in current implementations of dynamic scene novel view synthesis, and how can they be mitigated? While many studies present unoptimized models to demonstrate capabilities, this research focuses on improving computational efficiency without altering the underlying model architecture, which would require extensive retraining and benchmarking—beyond the scope of this project. This problem was addressed by utilizing tools such as the PyTorch profiler to measure the time spent in various functions, helping to identify performance bottlenecks. Additionally, custom kernels were analyzed using the NVIDIA Nsight suite to uncover inefficiencies in their execution. These insights allowed for targeted optimizations that significantly improved runtime performance. The findings indicate substantial improvements when tensor operations, typically written in PyTorch, are translated into custom CUDA kernels, yielding up to an 80% reduction in runtime. However, implementing a backward function for integration with PyTorch’s automatic differentiation engine presents a challenge. Additionally, the optimization of a specific CUDA kernel resulted in a 75% reduction in its runtime, translating into a nearly 20% reduction in total training time for the model. These results highlight that even ...