Tabular data, semantics and AI seminar series
This seminar series will address a broad range of topics related to achieving novel capabilities on tabular data through semantics and AI. Attendees will learn about and discuss the following topics:
- Better data understanding, exploration, explanations through semantics and AI
- Table augmentation and search with semantics
- Semantic data management and organization
- Human Computer Interaction enabling users to effectively leverage a semantically driven system for data/predictive tasks
- AI and semantics: from model building using semantics (e.g., Trusted AI) to using semantics in RL or AI planning
- Benchmarks and evaluations of approaches
Upcoming: Our next seminar is on January 26, and will feature Craig Knoblock, Keston Executive Director, USC Information Sciences Institute. Craig Knoblock is the Keston Executive Director of the Information Sciences Institute, Research Professor of both Computer Science and Spatial Sciences, and Vice Dean of Engineering at the University of Southern California. His research focuses on techniques for describing, acquiring, and exploiting the semantics of data. He has worked extensively on source modeling, schema and ontology alignment, entity and record linkage, data cleaning and normalization, extracting data from the web, and combining these techniques to build knowledge graphs. He has published more than 400 journal articles, book chapters, and conference and workshop papers on these topics and has received 7 best paper awards on this work. He also co-authored a recent book titled Knowledge Graphs Fundamentals, Techniques, and Applications, which was published in 2021 by MIT Press. Dr. Knoblock received his master’s and Ph.D. from Carnegie Mellon University in computer science. He is a Fellow of the Association for the Advancement of Artificial Intelligence (AAAI), the Association of Computing Machinery (ACM), and the Institute of Electrical and Electronic Engineers (IEEE). He is also past President of the International Joint Conference on Artificial Intelligence (IJCAI) and winner of the Robert S. Engelmore Award.
Seminar title: Exploiting the Semantics of Tables to Clean and Align Data to Build Knowledge Graphs
Abstract: Creating knowledge graphs from data provides a way of combining sources of information in ways that can then be exploited to build various applications. However, a key challenge in building knowledge graphs is the task of ingesting data. In this talk I will highlight techniques we have developed for rapidly ingesting data into a knowledge graph. First, I will describe an approach to finding semantic errors in tables by comparing the contents of a table with textual information describing the contents. Then I will present an automatic approach to modeling the contents of such tables by exploiting related information stored in an existing knowledge graph to understand the semantics of a table and align it to a common ontology. The combination of these techniques enables the rapid ingestion of new sources of data to build knowledge graphs and their applications.
Previous: Our first seminar is on November 3, and will feature Fatemeh Nargesian. Fatemeh Nargesian is an assistant professor in the Department of Computer Science, at the University of Rochester. She got her PhD at the University of Toronto and was a research intern at IBM Watson in 2014 and 2016. Before the University of Toronto, she worked at Clinical Health and Informatics Group at McGill University. Her primary research interests are in data intelligence focused on data for ML as well as time-series analysis.
Seminar title: Semantic Set Overlap for Join Search
Abstract: Set overlap has been extensively considered as a column joinability measure. However, search techniques based on vanilla overlap fail for semantic search since similar set elements may be unrelated at the character level. In this talk, first, I will introduce semantic overlap and its application to join search. While vanilla overlap requires exact matches between set elements, semantic overlap allows elements that are syntactically different but semantically related to increase the overlap. The semantic overlap is the maximum matching score of a bipartite graph, where an edge weight between two set elements is defined by a user-defined similarity function, e.g., cosine similarity between embeddings. Next, I will present KOIOS, an exact and efficient algorithm that solves the top-k set similarity search problem using semantic overlap. KOIOS is a filter-verification framework including powerful and cheap-to-update filters that prune sets during both the refinement and post-processing phases. Finally, I will discuss the empirical evaluation of KOIOS on web data and open data.
Please email Kavitha Srinivas with any questions about this event: email@example.com