Data quality in AI


Data is unquestionably the central piece in any enterprise’s journey to AI. Bad data not only results in sub-optimal models but also in opportunity loss and generating far less value from the data. Data preparation in an AI life cycle has been called out as the most time consuming (80%) step. One of the reasons for it being the most time-consuming step is that it is an iterative debugging process. It is an iterative cycle as the challenges in data are not known at the start of the data science lifecycle but discovered during the process. In addition, the data also goes through multiple personas like Data Steward, Data Engineer, Quality Analyst, AI Scientist, Governance Officer, Business SME, etc before it can be used for building production level models.

Our group works on solving challenges in this area by building low code/no code tools and techniques to automate data transformations, build novel data quality assessment and remediation metrics, automated exploratory data analysis techniques, learn automated data quality rules as well as infuse trust in the data part of data and AI lifecycle.

The mission of our group is to:

“Develop tools and techniques for automated data preparation using state of art AI techniques, to generate exceptional productivity and reduce time to value for data users by enabling them to build better, faster AI pipelines.”

We have build several tools and techniques and made a subset of our algorithms available for free as trial APIs. You can try these out by accessing the trial APIs and join the slack community to discuss and share challenges with other like-minded practitioners. If you are interested to learn further or collaborate with us, please feel free to reach out to any of us below.