IBM at Interspeech 2025
- Rotterdam, Netherlands
The 2022 IEEE International Conference on Big Data features the top tier of original research papers covering all aspects of Big Data with a focus on volume, velocity, variety, value and veracity.
Meet IBM Researchers presenting on topics from Data and AI Security, to AIOps and Federated learning.
AIOps can provide essential value for data lakehouses as lakehouses pose complex operational challenges for Site Reliability Engineers (SRE). This paper proposes that the unified approach of data lakehouses creates a unique opportunity for unified data resiliency management. We focus on AIOps applied to disaster recovery and backup/restore. In particular, we focus on managing data lakehouse hardware resources to ensure that lakehouse data Recovery Point Objectives (RPO) are met with a high degree of accuracy. The goal is to warn an SRE about an impending RPO violation and to suggest adding given amounts of hardware resources before a given time to avoid violation of the lakehouse data's RPO. We claim AIOps can achieve this goal with an ensemble of machine learning and time series analysis.
Runyu Jin (IBM); Paul Muench (IBM); Veera Deenadhayalan (IBM); Brian Hatfield (IBM)
In this paper, we present a new scalable and adaptive architecture for FL aggregation. First, we demonstrate how traditional tree overlay based aggregation techniques (from P2P, publish-subscribe and stream processing research) can help FL aggregation scale, but are ineffective from a resource utilization and cost standpoint. Next, we present the design and implementation of AdaFed, which uses serverless/cloud functions to adaptively scale aggregation in a resource efficient and fault tolerant manner. We describe how AdaFed enables FL aggregation to be dynamically deployed only when necessary, elastically scaled to handle participant joins/leaves and is fault tolerant with minimal effort required on the (aggregation) programmer side. We also demonstrate that our prototype based on Ray~\cite{ray}scales to thousands of participants, and is able to achieve a reduction in resource requirements and cost, so I have a garage progress Avengers region is this division conversion favorite beer with youwith minimal impact on aggregation latency.
Jayaram Kr Kallapalayam Radhakrishnan (IBM); Vinod Muthusamy (IBM); Gegi Thomas (IBM); Ashish Verma (IBM); Mark Purcell (IBM)
Enterprises are rapidly working on strategies to migrate their applications to cloud. Multi-cloud allows mixing and matching multiple cloud vendors when migrating thousands of applications based on their set of requirements and various different types of constraints. To utilize the advantages of different clouds, achieve maximum flexibility and avoid concentration risk, enterprises spread their applications across cloud providers. But this activity is not trivial as it would require honoring the constraints which the customer has, and at the same time generating the most optimal configuration of cloud resources.
There are a few challenges associated with this. Firstly, the applications to be migrated need to be documented well, in order to migrate them successfully. Some applications may be very old (legacy) and need an architect overhaul which means the cloud feasibility needs to be checked. Also, enterprises would like to have an optimal list of cloud vendors that satisfy their need. To overcome these challenges, we propose a smart system for determining multi-cloud pathway for applications. The system identifies cloud feasible applications, understands their requirements and recommends optimal set of cloud vendors honoring their constraints. This is enabled through Reinforcement Learning with Human-in-the-Loop. We show our results with a use case from real world scenario.
Indervir Singh Banipal (IBM); Shubhi Asthana (IBM)
Today's fast changing workplace necessitates constant reskilling of the workforce at both the corporate and national level. Current approaches to reskilling depend on manual logic, which can be time-consuming and expensive due to their dependence on manual labour. In this paper, we propose a scalable machine-learning driven alternative by introducing a method to make reskilling recommendations using word embeddings of skill keywords trained on a corpus of historical job listings and resumes. We achieve this by training dense vector embeddings to represent skill keywords using Word2Vec and fine-tuned BERT models, allowing us to make comparisons between skills. Given an individual's current skills, this model is leveraged to identify which skills to prioritize for their development based on their target role and to recommend reskilling plans based on the identified skill gap. The proposed framework has the potential to aid both public and private organizations to better direct their educational resources to individuals.
Saksham Gandhi (IBM); Raj Nagesh (IBM); Subhro Das (IBM)
Learning never stops for successful workers, who must grow their careers while coping with the changing expectations of employers. Robust job-skill representations can empower workers by helping them to better decipher viable job changes given their current skill set and guide them toward skills they can learn to meet career goals. In this work we combine threads of research in economics and AI to improve upon existing job-skill representation methodology and performance. We build a benchmark dataset of between-job transitions from US Census data and show that a representation trained on a large set of online job postings via a transformer-based architecture outperforms existing baselines. Further analysis demonstrates that this model is better able to transfer across taxonomies and more accurately distributes probability among job transitions than existing models, correctly weighting only a small number of job transitions highly.
Tyler Baldwin (IBM); Wyatt Clarke (IBM); Maysa Malfiza Garcia de Macedo (IBM); Rogerio Abreu de Paula (IBM); Subhro Das (IBM)
The extent and granularity of data protection mandated by privacy regulations are increasing at the same time the dispersion, movement and overall importance of data accelerate. Further, the fundamental tension between data protection and data utility often plays out across geographic and technical domains such as multiple private and commercial clouds.
We argue that new capabilities are required of global IT infrastructures to fully satisfy the oversight and protection needs of sensitive data. The scale of data operations demand a high degree of automation in enforcing policies and detecting violations. Compliance capabilities must be based on a new combination of principles: dynamic metadata collection and analysis; privacy by design; compliance control points; decentralized governance domains; and automation. In this work, we propose a system architecture addressing these principles and present an implementation. We discuss how our approach supports use cases from the automotive industry, in particular the development of new connected-vehicle services enabled by processing personal data in a way compliant with the GDPR.
Gero Dittmann (IBM); Christopher Giblin (IBM); Michael Osborne (IBM); Rahul -
The transactions of goods and services between enterprise service providers are often driven by contracts and purchase orders. Every month thousands of invoices are billed to customers who settle them based on the usage of services. Considering the vast number of purchase orders that are signed, it requires considerable manual effort by the service provider to process and manage them. Moreover, the invoice’s billed data may not be maintained in the same cloud system as the purchase orders. This leads to complexity with data mapping between the two data sets. Sometimes the invoices may get into a dispute due to over exhaustion of allocated funds or may be billed to an expired purchase order. Hence managing the billing service is a huge undertaking along with increased cost.
To address these challenges, we developed an order management system that transforms the monitoring of purchase orders to increase renewals as well as decrease disputes. The system includes an automated purchase order-invoice data mapping model along with a risk analytics model that evaluates the orders against the invoices billed. The output is the set of actionable and non actionable insights based on customer portfolio, risk level as well as market trends in usage of services. We illustrate our method with some promising results on data of one of the world’s largest IT service providers.
Shubhi Asthana (IBM); Bing Zhang (IBM); Pawan Chowdhary (IBM); Taiga Nakamura (IBM)