This page summarizes possible Matchmaking algorithms and collects information about their usage in Cloud4All, their evaluation or reasons why they got discarded.
- Knowledge on different matchmaking approaches and how well they work in different application areas.
- Implementation of an extensible set of matchmaking algorithms that can be employed as external services in the cloud.
- 1 General Matchmaking
- 2 Typical Matchmaking Problems
- 3 Rule-Based
- 4 Statistical
The Matchmaker is an important component of cloud for all. One of its main purposes is to infer unknown preferences or to transfer preferences from one usage scenario to another. Let's say user Anton bought a brand new smartphone and logs in for the first time. The Cloud4All software installed on the smartphone will query the server for Anton's preferences for the current usage context. Obviously, as Anton never used this type of smartphone before, his preference set does not include information that matches the query context. In this example, the Matchmaker might have to translate the preferences Anton had for his old smartphone to preferences for Anton's new smartphone. Let us inspect the different aspects of this example a bit further:
The preference set is the list of preferences that a user expressed, entered or otherwise confirmed. A user's preference set does only include preferences that are specific to a certain context. Increasing contrast in the sun on the beach should not also increase contrast on the home-TV. This preference is very context specific, with the context being "in the sun on the beach".
The context is a specific situation and can include any information that is currently available. This could be hardware information, like the current device or the screen size, information gathered from global sensors, like the current time of the day, or information gathered from local sensors, like the environmental noise or lighting conditions. Every preference in a preference set is specific to a context it was validated in and every query to the system contains the current context for which preferences should be returned. Transforming preferences from one context to another is a common Matchmaker scenario.
The process of creating a new preference for the query context from preferences for other contexts from the preference set is called inference. The resulting preferences are often called inferred preferences.
Typical Matchmaking Problems
There are certain scenarios that make matchmaking particularly difficult. This section gives an overview.
Sparse Preference Set
In some situations, especially for new users, a preference set might include very few preferences or even not a single one, in case of a newly registered user. This scenario represents that the system only knows very little about the current user, which will make it hard to come up with meaningful and fitting inferences. Yet, this scenario is of particular importance, because a bad performance for new users might encourage them not to use Cloud4All in the future.
As the information available in the context is at least partially derived from available sensors, there might be situations were there are no sensors available at all. In such situations, the context is reduced to the very general information, like the current target device. In this case, Matchmakers will have to either guess the current context based in past data, or derive abstract inferred preferences which are more or less independent of the missing information in the context.
This problem also occurs the other way round: A preference stored in the preference profile might have a very sparse context, which makes it difficult to base inference on this preference, as the system does not know all the contextual factors that were in effect when the preference was confirmed.
When using hardware or software that a user did never see before, the system might encounter queries for preferences that the user never had in their profile. In this case, the Matchmaker would have to solve a similar problem as already mentioned in the Sparse Preference Set scenario. There might even be a situation where the requested property is completely independent of any preference we already have in the preference set, which makes inferring such a preference a difficult topic.
The system might also encounter a request with a context that is very different from any context the user confirmed settings in in the past. This leads to a problem similar to the Sparse Context scenario, just that this time the new context might be so different, that the system can't even do any meaningful mappings from past contexts. Yet, it is quite unlikely that a completely new context will appear, except if the profile is already very sparse.
---> WIP <---
See deliverable D204.3.
Statistical approaches use machine learning and data mining algorithms to find and infer relations between preference sets without an expert defining rules for that. This preference makes them very scalable and robust to changes in the data sets, like new devices or new preferences to capture.
The general approach of a statistical Matchmaker is to exploit that users are actually confirming preferences that they see fitting for themselves, without the need to call for an external expert or framework. A statistical Matchmaker tries to identify similarities between preference sets to come up with expectations or recommendations for a user, based on what user users - identified as 'similar' by a distance function - expressed as their preference in the target context. The statistical Matchmaker has two important inputs that greatly influence its performance: the sum of all existing user preference sets that form a data-set to find similar preference sets or other relations, and the algorithms used to work on this data. This makes the statistical Matchmaker easy to maintain as, given the algorithms are set up well, it can deal with a wide array of queries, including completely new contexts or preferences without any human maintenance. Yet, this also implies that a statistical Matchmaker will perform bad if there is only few information available to base inference on, which could be the case because there are only very few preference sets available in the whole system, or because the given context was only seen in very few preference sets.
Unlike the other matchmaking approaches, the statistical approach is divided into two main modules: The Statistical Analysis Module and the Run-Time Matchmaking Module.
Statistical Analysis Module
This module takes all preference sets of all users using Cloud4All as an input. Note that the preference sets do not have to carry personal information, so user names or similar information could be removed or replaced of anonymous tokens. The goal of the Statistical Analysis Module is to construct data structures that allow the Run-Time Matchmaking Module to infer new preferences without having to analyze all preference sets of all users for every single query. The Statistical Analysis Module will just run a few times per day and can take considerable time to complete.
The sum of all user preference sets is called the preference set space and can be seen as a high dimensional space. Each preference set could then be represented as a point in this space. Furthermore, distance functions can be used to calculate how similar two preference sets are. This allows clustering algorithms to find groups of similar preference sets or areas in the preference set space where many preference sets are found. This information can be used if unseen preferences are to be inferred. In order to calculate meaningful distances, it might be good to normalize certain value spaces.
Run-Time Matchmaking Module
The Run-Time Matchmaking Module fulfills the role of the actual matchmaker as described above. In addition to the current user preference set and the context, it may also use information stored by the Statistical Analysis Module to locate the current preference set in the preference set space and infer unseen preferences from similar preference sets or clusters.
- Expectation–maximization (EM) algorithm.
- Hierarchical Clustering.
- k nearest neighbor.
- Density-based spatial clustering (example implementation in C#.)
- DBSCAN (Density-based spatial clustering of applications with noise).
- (Wikipedia Category: Data clustering algorithms.)
Criteria for Selecting Clustering Algorithms
The main criteria for selecting clustering algorithms are the following:
- The algorithm should not need to know in advance the number of cluster at which it should stop.
- The algorithm should not only form spherical or ellipsoid shapes; it should be able to form a cluster of an arbitrary shape.
- The algorithm should be able to identify and handle noise (single preference sets that are very different from all others).
The family of density-based clustering algorithms fits these criteria very well.
This wiki page represents Cloud4all deliverable ID204.1.