Optimzation of Knapsack on GPU
My first paper “Optimization of Multi-Class 0/1 Knapsack Problem on GPUs by Improving Memory Access Efficiency” has been accepted by Journal of Supercomputing (IF: 2.60).
The following is the abstract:
This work aims to improve the GPU performance for solving the 0/1 knapsack problem, which is a well-known combinatorial optimization problem found in many practical applications, including cryptography, financial decision, electronic design automation, computing resource management, etc. The knapsack problem is NP-hard, but it can be solved efficiently by dynamic programming(DP) algorithms in pseudo-polynomial runtime. The DP knapsack algorithm on GPUs has been presented. However, as the modern GPU architecture provides much higher computing throughput than its memory bandwidth, previous work is bounded by the data access time on GPU memory because its CGMA (Compute to Global Memory Access) ratio is 1, which means every computing operation involves one memory access on average. To address the problem, an innovative approach called Multi-Class 0/1 Knapsack Problem (MCKP), whose items can be classified into groups with equal values or weights is proposed in this paper. By reconstructing the DP equations for solving MCKP, it is able to explore data parallelism and reusability across threads. This made it possible to optimize the computation across iterations (i.e., items), and significantly improve the CGMA ratio by 5-fold after exploring the use of GPU shared memory and registers for reused data. We extensively analyze the performance of our approach on two modern GPU models, NVIDIA Tesla V100 and RTX 3070. Compared to the runtime of previous work, our approach achieves up to 8x and 18x speedup on V100 and RTX 3070 respectively, the latter one being a GPU with lower memory bandwidth. In addition, by comparing the two speedups, we found that we are able to achieve more efficient computing usage when the memory bandwidth is limited such as RTX 3070.