Discovering discontinuity in big financial transaction data

Suppawong Tuarob; Ray Strong; Anca Chandra; Conrad S. Tucker

doi:10.1145/3159445

ACM TMIS

Paper

01 Feb 2018

Discovering discontinuity in big financial transaction data

View publication

Abstract

Business transactions are typically recorded in the company ledger. The primary purpose of such financial information is to accompany a monthly or quarterly report for executives to make sound business decisions and strategies for the next business period. These business strategies often result in transitions that cause underlying infrastructures and components to change, including alteration in the nomenclature system of the business components. As a result, a transaction stream of an affected component would be replaced by another stream with a different component name, resulting in discontinuity of a financial stream of the same component. Recently, advancement in large-scale data mining technologies has enabled a set of critical applications to utilize knowledge extracted from a vast amount of existing data that would otherwise have been unused or underutilized. In financial and services computing domains, recent studies have illustrated that historical financial data could be used to predict future revenues and profits, optimizing costs, among other potential applications. These prediction models rely on long-term availability of the historical data that traces back for multiple years. However, the discontinuity of the financial transaction stream associated with a business component has limited the learning capability of the prediction models. In this article, we propose a set of machine learning-based algorithms to automatically discover component name replacements, using information available in general ledger databases. The algorithms are designed to be scalable for handling massive data points, especially in large companies. Furthermore, the proposed algorithms are generalizable to other domains whose data is time series and shares the same nature as the financial data available in business ledgers. A case study of real-world IBM service delivery retrieved from four different geographical regions is used to validate the efficacy of the proposed methodology.

Conference paper