The emergence of delay sensitive and computationally demanding data analytic applications has burdened the core network with huge data transfers and increased computation load. Furthermore, the increasing number of Internet of Things deployments rely significantly on the execution of such applications. We propose an architecture where devices collaboratively execute data analytic tasks in order to improve their execution delay and accuracy. This is possible by exploiting the aggregate computation capabilities of the abundance of small devices. We design an optimization framework where the nodes decide where their data analytic tasks will be executed, in order to jointly optimize their average execution delay and accuracy, while respecting power consumption constraints. We propose a distributed dual ascent solution to the formulated convex problem, so that the nodes can make the outsourcing decisions by exchanging local information. The results indicate that the nodes can achieve better performance when collaborating than when they locally compute the tasks, depending on the network load.