Affinity Propagation Algorithm


Affinity Propagation was first published in 2007 by Brendan Frey and Delbert Dueck and It is only getting more and more popular due to its simplicity, general applicability, and performance.

Affinity Propagation is an unsupervised machine learning algorithm unlike clustering algorithms such as K means clustering. The main drawbacks of K-Means and similar algorithms are selecting the number of clusters and choosing the initial set of points. The affinity propagation algorithm overcomes this drawback, It does not require you to specify the number of clusters before running the algorithm. In this algorithm, All data points send messages to each other informing their targets of their relative attractiveness. After receiving messages from all other senders, each target responds to each sender with a reply, informing each sender that it is willing to become associated with them. According to the message received from all targets, the sender replies to each target with the revised relative attractiveness of the target to the sender, which is based on the available messages it has received from all targets. There is a message-passing procedure that continues until a consensus is reached. As long as the sender is associated with one of its targets, the target becomes the point’s exemplar. In a cluster, all points with the same exemplar are grouped together.

Affinity propagation needs two pieces of information:

  • Similarities between data points: s(i,k): The similarity between data points measures how well one point can be used as an example of another. The similarity can be omitted or set to Infinity if two points do not belong to the same cluster.
  • Preferences: s(k,k): The preference represents the suitability of each data point as an exemplar. It is possible that we have some a priori information on which points might be favored in this role, so we are able to represent it through preferences.
  • The algorithm then runs through a number of iterations, until it converges.
  •  Each iteration has two message-passing steps:

Calculating Responsibilities: Messages sent from cluster members (data points) to candidate exemplars (data points), indicating how well-suited the data point would be as a member of the candidate exemplar’s cluster.

Calculating Availabilities: Messages sent from candidate exemplars (data points) to potential cluster members (data points), indicating how appropriate that candidate would be as an exemplar.

How does Affinity Propagation work?

So first let’s assume we have a given dataset A and we have a matrix As an NxN matrix, s(i, j) represents the similarity between Ai and Aj. The negative squared distance between two points was used as s, i.e. for points xi and xj,

s(i, j) = -||xi-xj||2.

S(i, i) represents the input preference, that is, how likely a particular input is to become an exemplar. This parameter controls how many classes the algorithm produces when the same value is applied to all inputs. There are fewer classes produced by a value close to the minimum possible similarity, while there are many classes produced by a value close to or greater than the maximum possible similarity. Typically, it is set to the median similarity of all input pairs. 

The algorithm proceeds by alternating two message passing steps, to update two matrices:

Responsibility Matrix (r):

Our first step is to construct a matrix of availability with all elements set to zero. In order to calculate each cell in the responsibility matrix, we use the formula below:

Calculate Responsibility

where i refer to the row and k is the column of the associated matrix. So the “responsibility” matrix r has values r(i, k) that quantify the suitability of xk as an exemplar for xi, compared to other candidate exemplars.

Availability Matrix (a):

The availability matrix diagonal elements are updated by a separate equation from the availability matrix diagonal elements. So according to the “availability” matrix A, values a(i, k) indicate how appropriate it is for xi to pick xk as its exemplar, taking into account other points’ preferences.

For the diagonal elements, use the following formula:

Calculate Availability

Then, availability is updated per

Calculate updated availability

After that As the cluster boundaries remain unchanged over a number of iterations, or after a predetermined number of iterations, iterations are performed. Exemplars are selected from the final matrices as those with positive ‘responsibility + availability’ for this we have a criterion matrix so each cell in the criterion matrix is simply the sum of the availability matrix and responsibility matrix at that location. We can calculate it with the help of the given formula:

Calculate Criterion Matrix

For Example:

Experimental Results:

Types of Affinity Propagation:

1. Adaptive Affinity Propagation:

There is no doubt that affinity propagation’s clustering results are influenced by the initial value of the preference parameter (diagonal values of similarity matrix). If the minimum value of preference is set, there will be a small number of cluster centers, while if the median value is set, there will be a moderate number of cluster centers. So how do we know what the optimal preference value should be to produce the best clustering result? The purpose of this algorithm is to overcome this problem.

  • Adaptive Affinity Propagation is divided into three main parts. The first part is called Adaptive Damping which is the process of adjusting the damping factor to eliminate oscillations adaptively when the oscillations occur.
  • The second part is called Adaptive Escape which is the process of escaping the oscillations by decreasing the value of the preference when the adaptive damping method fails.
  • And the last part is called Adaptive Preference Scanning which is the process of searching the space of preferences to find out the optimal clustering solution to the data set.

2. Partition Affinity Propagation:

Partition Affinity Propagation is an extension of affinity propagation that can reduce the number of iterations effectively and has the same accuracy as the original affinity Propagation. Partition affinity propagation passes messages in the subsets of data first and then merges them as the number of the initial step of iterations, it can effectively reduce the number of iterations of clustering.

3. Soft Constraint Affinity Propagation:

This is a development of affinity propagation algorithms based on the fact that the original affinity propagation algorithm has a number of drawbacks. Because each cluster must have exactly one exemplar, AP can only be applied to classes of regularly shaped clusters and achieves suboptimal results

A cluster on AP refers to exemplars, and exemplars refer to themselves as self-exemplars. A cluster must appear as a star of radius one due to this hard constraint. All nodes are directly connected to the central node. In AP, the hard constraint relies heavily on the regularity of cluster shapes, so data with elongated or irregular cluster shapes may have more than one simple cluster center.

 The original optimization task of AP may be modified to solve this problem. A finite penalty term is used to soften hard constraints.

4. Fuzzy Statistical Affinity Propagation:

Fuzzy Statistic Affinity Propagation is one of the developments of affinity propagation algorithms that are developed based on fuzzy statistics.

  • Fuzzy statistics is a combination of statistical methods and fuzzy set theory.
  • Fuzzy set theory is the basis for studying membership relationships from the fuzziness of the phenomena.
  • Fuzzy statistics are used to estimate the degree of membership of a pattern to a class according to the use of a membership function.
  • At the beginning of Fuzzy Statistic, Affinity Propagation computes fuzzy mean deviation and then develops a fuzzy statistical similarity measure (FSS) in evaluating the similarity between two-pixel vectors before running the AP iterative process.


Affinity propagation is the first method to make use of the idea of message passing’ to solve the fundamental problem of clustering data. Because of its simplicity and performance, it will prove to be of broad value in science and engineering.



Similar Posts

Leave a Reply

Your email address will not be published.