Temporal cross-selling optimization using action proxy-driven reinforcement learning

Nan Li; Naoki Abe

doi:10.1109/ICDMW.2011.163

ICDMW 2011

Conference paper

01 Dec 2011

Temporal cross-selling optimization using action proxy-driven reinforcement learning

View publication

Abstract

Customer lifetime value modeling and cross-selling pattern mining are two important areas of data mining applications in marketing sciences. In this paper, we propose a novel approach that can address both of these problems in a unified manner. We propose a variant of reinforcement learning, enhanced with the notion of "action proxy", which is applicable to the cross-selling pattern discovery even in the absence of actions. For action proxies, we consider the target reward (changes) across product categories. The motivation is to optimize the target values of immediate rewards to maximize the expected overall long-term reward. Since the changes are directly tied to the reward, unconstrained formulation would result in unbounded behavior, leading us to constrain the learned policy. The goal is to optimize the target values while keeping their effects on the overall immediate rewards constrained. Experiments on real world data not only verify the effectiveness of our framework, but also provide qualitative study of allocation behavior, with particular emphasis on temporal cross-selling optimization. © 2011 IEEE.

Conference paper