Distributed data stream processing is a data analysis paradigm where massive amounts of data produced by various sources are analyzed online within real-time constraints. Execution performance of a stream program/query executed on such middleware is largely dependent on the ability of the programmer to fine tune the program to match the topology of the stream processing system. However, manual fine tuning of a stream program is a very difficult, error prone process that demands huge amounts of programmer time and expertise which are expensive to obtain. We describe an automated process for stream program performance optimization that uses semantic preserving automatic code transformation to improve stream processing job performance. We first identify the structure of the input program and represent the program structure in a Directed Acyclic Graph. We transform the graph using the concepts of Tri-OP Transformation and Bi-Op Transformation. The resulting sample program space is pruned using both empirical as well as profiling information to obtain a ranked list of sample programs which have higher performance compared to their parent program. We successfully implemented this methodology on a prototype stream program performance optimization mechanism called Hirundo. The mechanism has been developed for optimizing SPADE programs which run on System S stream processing run-time. Using five real world applications (called VWAP, CDR, Twitter, Apnoea, and Bargain) we show the effectiveness of our approach. Hirundo was able to identify a 31.1 times higher performance version of the CDR application within seven minutes time on a cluster of 4 nodes. © 2013 Springer Science+Business Media New York.