We investigate the problem of optimal request routing and content caching in a heterogeneous network supporting in-network content caching with the goal of minimizing average content access delay. Here, content can either be accessed directly from a back-end server (where content resides permanently) or be obtained from one of multiple in-network caches. To access a piece of content, a user must decide whether to route its request to a cache or to the back-end server. Additionally, caches must decide which content to cache. We investigate the problem complexity of two problem formulations, where the direct path to the back-end server is modeled as i) a congestion-sensitive or ii) a congestion-insensitive path, reflecting whether or not the delay of the uncached path to the back-end server depends on the user request load, respectively. We show that the problem is NP-complete in both cases. We prove that under the congestion-insensitive model the problem can be solved optimally in polynomial time if each piece of content is requested by only one user, or when there are at most two caches in the network. We also identify a structural property of the user-cache graph that potentially makes the problem NP-complete. For the congestion-sensitive model, we prove that the problem remains NP-complete even if there is only one cache in the network and each content is requested by only one user. We show that approximate solutions can be found for both models within a (1 - 1/e) factor of the optimal solution, and demonstrate a greedy algorithm that is found to be within 1% of optimal for small problem sizes. Through trace-driven simulations we evaluate the performance of our greedy algorithms, which show up to a 50% reduction in average delay over solutions based on LRU content caching.