Memory caching has long been used to fill up the performance gap between processor and disk for reducing the data access time of data-intensive computations. Previous studies on caching mostly focus on optimizing the hit rate of a single machine. But in this paper, we argue that the caching decision of a distributed memory system should be performed in a cooperative manner for the parallel data analytic applications, which are commonly used by emerging technologies, such as Big Data and AI (Artificial Intelligence), to perform data mining and sophisticated analytics on larger data volume in a shorter time. A parallel data analytic job consists of multiple parallel tasks. Hence, the completion time of a job is bounded by its slowest task, meaning that the job cannot benefit from caching until all inputs of its tasks are cached. To address the problem, we proposed a cooperative caching design that periodically rearranges the cache placement among nodes according to the data access pattern while taking the task dependency and network locality into account. Our approach is evaluated by a trace-driven simulator using both synthetic workload and real-world traces. The results show that we can reduce the average completion times up to 33% compared to a non-collaborative caching polices and 25% compared to other start-of-the-art collaborative caching policies.