With the emergence of data deluge, the energy footprint of global data movement has surpassed 100 terawatt hours, costing more than 20 billion US dollars to the world economy. During an active data transfer, depending on the number of hops between the source and destination, the networking infrastructure consumes between 10%-75% of the total energy, and the rest is consumed by the end systems. Even though there has been extensive research on reducing the power consumption at the networking infrastructure, the work focusing on saving energy at the end systems has been limited to the tuning of a few application-level parameters. In this paper, we introduce a novel cross-layer optimization framework which jointly considers application-level and kernel-level parameters to minimize the energy consumption without sacrificing from the transfer throughput. We present three different algorithms which can dynamically tune the CPU frequency level, number of active CPU cores, number of active transfer threads, number of parallel TCP streams, and the level of transfer command pipelining to achieve different user-set goals. Experimental results show that our proposed algorithms outperform the state-of-the-art solutions, achieving up to 80% higher throughput while consuming 48% less energy.