Recently, unmanned aerial vehicle (UAV) networks have been widely used in military and civilian scenarios; however, they suffer various attacks. Time-delay attacks maliciously delay the transmission of packets without tampering with the contents or significantly affecting the transmission pattern, making detection difficult. In this paper, a holistic cross-layer time-delay attack detection framework (HOTD) is proposed for UAV networks. A holistic selection of the delay-related features available at all layers is performed, before adopting supervised learning to build a consistency model between these features and the corresponding forwarding delay to calculate the degree of consistency of each node. Finally, the clustering method is used to distinguish malicious from benign nodes according to their degree of consistency. Experimental results show that the performance of HOTD is superior to that of state-of-the-art detection methods, and it achieves a detection accuracy higher than 85% with less than 2.5% additional overhead.