Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressionsQingda LuXiaoyang Gaoet al.2012JPDC