This document discusses using machine learning to detect copied code submissions. It proposes using unsupervised learning via k-means clustering and dimensionality reduction with principal component analysis (PCA) to group similar codes and reduce complexity from O(n^2) to O(n). Key steps include extracting features from codes, applying PCA to reduce dimensions, running k-means to cluster codes, and detecting copies between clusters. This approach could help identify cheating in online programming contests and evaluate student code submissions.