Recent advances in hardware, such as systems with multiple GPU and their availability in the cloud, are enabling deep learning various domains including health care, autonomous vehicles, and I ternet of Things. Multi-GPU systems exhibit complex connectivi among GPUs and between GPUs and CPUs. Workload schedule must consider hardware topology and workload communication r quirements in order to allocate CPU and GPU resources for optim execution time and improved utilization in shared cloud enviro ments. This paper presents a new topology-aware workload placeme strategy to schedule deep learning jobs on multi-GPU systems. Th placement strategy is evaluated with a prototype on a Power8 m chine with Tesla P100 cards, showing speedups of up to ≈1.30 compared to state-of-the-art strategies; the proposed algorith achieves this result by allocating GPUs that satisfy workload requirements while preventing interference. Additionally, a large-scale simulation shows that the proposed strategy provides higher resource utilization and performance in cloud systems.