Category-Aware API Clustering and Distributed Recommendation for Automatic Mashup Creation
Abstract
Mashup has emeraged as a promising way to allow developers to compose existed APIs (services) to create new or value-added services. With the rapid increasing number of services published on the Internet, service recommendation for automatic mashup creation gains a lot of momentum. Since mashup inherently requires services with different functions, the recommendation result should contain services from various categories. However, most existing recommendation approaches only rank all candidate services in a single list, which has two deficiencies. First, ranking services without considering to which categories they belong may lead to meaningless service ranking and affect the recommendation accuracy. Second, mashup developers are not always clear about which service categories they need and services in which categories cooperate better for mashup creation. Without explicitly recommending which service categories are relevant for mashup creation, it remains difficult for mashup developers to select proper services in a mixed ranking list, which lower the user friendliness of recommendation. To overcome these deficiencies, a novel category-aware service clustering and distributed recommending method is proposed for automatic mashup creation. First, a Kmeans variant (vKmeans) method based on topic model Latent Dirichlet Allocation is introduced for enhancing service categorization and providing a basis for recommendation. Second, on top of vKmeans, a service category relevance ranking (SCRR ) model, which combines machine learning and collaborative filtering, is developed to decompose mashup requirements and explicitly predict relevant service categories. Finally, a category-aware distributed service recommendation (CDSR) model, which is based on a distributed machine learning framework, is developed for predicting service ranking order within each category. Experiments on a real-world dataset have proved that the proposed approach not only gains significant improvement at precision rate but also enhances the diversity of recommendation results.