Hyperspectral target detection is consistently restricted by the difficulty of obtaining pixel-level accurate labels. This paper proposes a semantic multiple instance neural network (Semantic MINN) with contrastive and sparse attention fusion. Semantic MINN relaxes the requirement for precise pixel-wise labels needing only patch-level labels. This network models the hyperspectral pixels as semantic signals, captures the spatial semantic information of the potential target regions with weak labels, i.e., imprecise knowledge of target presence, and adopts a sparse normalization strategy using attention mechanism to refine the prime target information. Notably, the siamese structure and a feature similarity metric constraint are utilized to promote discriminative high-level prime target representations. The proposed method is more effective on both simulated and real-field hyperspectral target detection with weak labels, compared with the classical and state-of-the-art weakly supervised techniques.