We propose a visual food recognition framework that integrates the inherent semantic relationships among fine-grained classes. Our method learns semantics-aware features by formulating a multi-task loss function on top of a convolutional neural network (CNN) architecture. It then refines the CNN predictions using a random walk based smoothing procedure, which further exploits the rich semantic information. We evaluate our algorithm on a large \food-in-the-wild" benchmark , as well as a challenging dataset of restaurant food dishes with very few training images. The proposed method achieves higher classification accuracy than a baseline which directly fine-tunes a deep learning network on the target dataset. Furthermore, we analyze the consistency of the learned model with the inherent semantic relationships among food categories. Results show that the proposed approach provides more semantically meaningful results than the baseline method, even in cases of mispredictions. Categories and Subject Descriptors H.3.3 [Information Systems]Information Storage and Retrieval-Content Analysis and Indexing; I.4 [ComputingMethodologies]:Image Processing and Computer Vision General Terms Hierarchical Deep Learning.