State of the art fabrication technology for integrating numerous hardware resources such as Processors/DSPs and memory arrays into a single chip enables the emergence of Multiprocessor System-on-Chip (MPSoC). Stream programming paradigm based on MPSoC is highly efficient for single functionality scenario due to its dedicated and predictable data supply system. However, when memory traffic is heavily shared among parallel tasks in applications with multiple interrelated functionalities, performance suffers through task interferences and shared memory congestions which lead to poor parallel speedups and memory bandwidth utilizations. This paper proposes a framework of stream processing based on-chip data supply system for task-parallel MPSoCs. In this framework, stream address generations and data computations are decoupled and parallelized to allow full utilization of on-chip resources. Task granularities are dynamically tuned to jointly optimize the overall application performance. Experiments show that proposed framework as well as the tuning scheme are effective for joint optimization in task-parallel MPSoCs. © 2011 Published by the IEEE Computer Society.