- Jeff, Kai-Tai Tang itjeff[at]cityu.edu.hk
- Howard Leung howard[at]cityu.edu.hk
- Taku Komura itjeff[at]cityu.edu.hk
- Hubert Pak-Ho Shum itjeff[at]cityu.edu.hk

Evaluating the similarity of motions is useful for motion retrieval, motion blending, and performance analysis of dancers and athletes. Euclidean distance between corresponding joints has been widely adopted in measuring similarity of postures and hence motions. However, such a measure does not necessarily conform to the human perception of motion similarity. In this paper, we propose a new similarity measure based on machine learning techniques.We make use of the results of questionnaires from subjects answering whether arbitrary pairs of motions appear similar or not. Using the relative distance between the joints as the basic features, we train the system to compute the similarity of arbitrary pair of motions. Experimental results show that our method outperforms methods based on Euclidean distance between corresponding joints. Our method is applicable to content-based motion retrieval of human motion for large-scale database systems. It is also applicable to e-Learning systems which automatically evaluates the performance of dancers and athletes by comparing the subjects’ motions with those by experts.

In this work, a new method to evaluate the similarity of human postures based on human perception is proposed. The approach is general enough to be applied to all sorts of motions. We use the relative distance between joints as the basic measure to evaluate the similarity of postures. Based on the questionnaire results of whether two motions are similar or not, we find out which set of relative distances affects the motion similarity in human perception the most. We compare our method with other measures based on Euclidean distances to show that our method outperforms them.

**Similarity measure based on Joint Relative Distance**

In this work, the Joint Relative Distance (JRD) is proposed. And hence motion features are derived from JRD. JRD measures the local distance between any two joints (e.g. the wrist and the head). Figure 1 shows the human hierarchy that we have considered. Some JRD may have contain the same joint on both (left and right) hand sides. In this case they are symmetric and hence they will be combined into one JRD feature vector. Otherwise, one JRD will form a single JRD feature vector.

Figure 1. The human hierarchy (with 20 joints and 5 end-sites)

We use the weighted sum of the joint relative distance to emulate the human perception of motion similarity.
The weights of each combination of joints are computed using the tagged set of postures.
Figure 2 shows the interface for us to obtain the user's perception. It is done by showing them
a pair of short motions and asking them to click yes for
similar motion and no for dissimilar motions.
We have n pair of postures which are tagged either similar or dissimilar .
A vector of labels **Y** based on manual tagging by users. Then the expected weights (in a vector **W**) are calculated by multiplying
the inverse of feature matrix **A** with the **Y** such that W = *inv*(**A**)**Y**. But since **A** is not
necessary a square matrix, hence the pseodoinverse of **A** is used. The weight of each feature with minimized error
is hence determined. The JRD features are combined linearly with **W**. And it comes up with the posture similarity cost
of *C = Sum( W(i) JRD(i) )*

Figure 2. The interface for the users to tag similar/dissimilar motion pairs

The similarity of motions is calculated by counting the number of corresponding postures which are perceptually similar. The corresponding postures are found by normalizing the duration of the two motions and resampling. We adopt this method because it is reported that scaling is more efficient and accurate than dynamic time warping in many cases. If the ratio of similar postures in the motions is above a cutoff threshold, the motion is evaluated as perceptually similar.

The joint relative distance can capture features to which humans are sensitive, such as the contacts between the end effectors and other parts of the body. For example, when we see a person putting his/her hands together, we will focus more on the fact the hands are attached to each other rather than the joint angles of the elbows or the shoulders. Therefore, in this research, we search for the most influential combination of joints which humans are more sensitive to, and adjust their weights according to their importance.

Some experiment results are shown in Figure 3 and 4. Figure 3 shows the motions which are perceptually similar but are dissimilar under the Euclidean distance scheme. In Figure 3(a), the arms are bending in different extent but show a small perceived difference; while in Figure 3(b), the lower sequence shows a slight delay with variations in limb positions. Figure 4 shows the motions which are similar under the Euclidean distance scheme but are perceptually dissimilar. in Figure 4(a), the trajectories of right arm are different; while in Figure 4(b), the legs and arms are bending differently in two sequences, but the joints are at similar positions.

(a)

(b)

Figure 3. Motions which are perceptually similar but are dissimilar under the Euclidean distance scheme.

(a)

(b)

Figure 4. Motions which are similar under the Euclidean distance scheme but are perceptually dissimilar.

**Jeff K.-T. Tang**, Howard Leung, Taku Komura and Hubert P.-H. Shum, "Emulating human perception of motion similarity", Computer Animation and Virtual Worlds (CAVW) 19, 3-4 (Sep. 2008), 211-221.

Any suggestions or comments are welcome. Please send them to Jeff Tang

This website is maintained by Jeff Tang

Copyright @ 2008 3D Motion Capture Laboratory. All rights reserved.

Last update: 25 Nov, 2008 .