Abstract:K-means clustering algorithm is an excellent algorithm which has been widely used in the image processing and data mining. However, the algorithm arouses a high computational complexity. This paper made a parallel analysis of K-means algorithm in detail, and proposed a partitioning and parallel K-means algorithm based on CUDA (Compute unified device architecture). In addition, some optimization strategies, e.g., coalesced memory access, parallel reduction, load balance and instruction optimization, were discussed to obtain the higher performance. Experimental results show that the parallel K-means algorithm achieves 560x speedup over the sequential C codes, while maintains the same effect. Hence it solves the bottleneck of the algorithm perfectly, which is an attractive alternative to the sequential K-means algorithm for image segmentation and clustering analysis.