The Similarity Ontology

Working Draft


Use of is discouraged in favour of

This version:
Latest version:
Last Update:
Date: 17:28:32 06/12/10 CDT
Kurt Jacobson, Centre for Digital Music Queen Mary University of London
Kurt Jacobson, Centre for Digital Music Queen Mary University of London
Yves Raimond, BBC
See acknowledgements


This specification defines a small ontology for similarity called MuSim. In MuSim, the association between two (or more) Things is a class to be reified rather than a property. This allows us to embrace the complexity of musical associations and accommodate the subjectivity and context-dependence of musical similarity. Although this ontology was designed with music similarity in mind, it can readily be applied to other domains.

Status of this Document

This is a work in progress! This document is subject to change. Comments are very welcome, please send them to kurtjx gmail. Thank you.

Table of Contents

  1. Introduction
    1. Use Cases
      1. Basic Similarity
      2. Signal-based similarity
      3. Rivalry
      4. Influence
      5. Personal Associations
    2. Similarity as a Concept
    3. Distinguishability and Extensibility
  2. Sim ontology at a glance
  3. Sim ontology overview
    1. Example
  4. Cross-reference for Sim classes and properties


  1. Normative References
  2. Changes in this version (Non-Normative)
  3. Acknowledgements (Non-Normative)

1 Introduction

The MuSim ontology aims to describe the associations between musical items. Generally we would imagine these items to be music artists (a mo:MusicArtist) or music tracks (a mo:Track) although there are definitely other possibilities. With MuSim, we envision the creation of a distributed music information system that uses data about associations between musical items to allow for recommendation and discovery in a wide variety of contexts.

While the Music Ontology deals primarily with objective attributes of music-related items such as authorship, composition, performance, etc., MuSim aims to capture more subjective attributes. This means that the provenance of an association concept is important as well as the context. TODO: expand this

In section 1.1 we describe various use cases that motivate the creation of MuSim. In section 1.2 we elaborate on the reification of similarity and association as concepts rather than object properties.

1.1 Use Cases

Let us explore some usage scenarios that have motivated the creation of this ontology.

1.1.1 Basic Similarity

Some instances of musical similarity are rather obvious and straight-forward. For example, Simply Red's track "Sunrise" uses essentially the same melodies and instrumentation as Haul and Oates "I Can't Go for That" however the lyrical content and song structure are totally different. We can quite easily call these two songs similar and even consider this statement to be somewhat objective - the similarity is not so much a matter of opinion but based in the facts of the musical make-up of the respective songs.

Just a Gigilo is an adaptation by Irving Caesar in 1929 from the Austrian song "Schöner Gigolo", written in 1928 by Leonello Casucci (music) and Julius Brammer (lyrics).

In such instances the property-based approach to similarity (mo:similar_to) might be sufficient. But perhaps we want to say more or know more. Is this a case of sampling or just quoting? How does this relate to the interests of the various copyright holders (sigh)? Can we easily or automatically find other tracks that re-use "I Can't Go For That" in an analogous way?

1.1.2 Signal-based Similarity

Signal analysis or "computer listening" is sometimes used to determine inter-song similarities. For example, we might extract Mel-frequency cepstral coefficients (MFCCs) from a set of audio signals, use these to create Gaussian mixture models for each song, and determine inter-song dissimilarities using an earth mover's distance function. Alternatively, we might start with the same MFCCs but then calculate mean and variance vectors and use KL-divergence between these vectors and the vectors of other audio signals to create inter-song dissimilarity measures.

We would then like to express associations between audio signals referencing the actual numerical results of the calculation as well as the various algorithms that were used to obtain the results. Perhaps we would also like to specify additional metadata about the association such as date of creation and references to the exact audio files used. What is the most appropriate way to represent this data?

1.1.3 Rivalry

One might be interested in exploring music in terms of historical rivalries such as hip-hop feuds or beefs - a classic example being the feud between two music groups of the 1980's Boogie Down Productions and The Juice Crew. A series of tracks were released by various members of the respective groups contesting the origins of hip-hop and launching personal attacks against members of the opposing group. This series of answer records eventually became known as the Bridge Wars and was drawn out over a number of years and through a number of proxies.

We can use existing properties to describe group memberships (foaf:member) and track authorship (foaf:made) but how do we associate these two feuding groups? How do we associate the various releases (answer records) that provide a musical record of this feud?

1.1.4 Influence

A music critic might remark The unique sense of lyricism and unparalleled melodic genius of Fredrick Chopin had a profound influence on the later work of Claude Debussy. Such statements maybe quite useful for music discovery or recommendation.

Exactly how should we express such a subjective statement? We would definitely want to know who is making this statement and what sort of credentials she might hold. Also what kind of consistency checks can we provide for such statements? Of course Debussy could not have influenced the work of Chopin as Chopin died nearly 20 years before Debussy was born.

1.1.5 Personal Associations

Associations between musical things are often highly personal and might be tied to a time, place, or person. Consider the following example:

When a first year student at college, I dated a girl who loved Bob Marley and David Bowie.

The narrator in the above passage may wish to create an association between Bob Marley and David Bowie that references a time (first year of college), a place (college), and a person (previous girlfriend). How can we allow for this level of provenance-tracking and expressiveness?

1.2 Similarity as a Concept

To accomadate the afore mentioned use cases we will treat musical associations as concepts rather than properties. Previous ontologies related to the domain of music (and ontologies in other domains) treat associations and similarity as properties. Using the Music Ontology we might say:

		@prefix mo <>

		:Sunrise a mo:Signal .
		:CantGoForThat a mo:Signal .
		:Sunrise mo:similar_to :CantGoForThat .


However we have no additional information about this similarity - is it based on acoustic features? chronological proximity? expert opinion? are the signals similar in terms of timbre? melodic content?

The property based approach to similarity is illustrated in figure 1.

Similarity as a property

In MuSim we propose a concept-based approach to association. Similarity and association are concepts to be described and contextualised with supporting concepts and properties. This is illustrated in figure 2.

Similarity as a concept

In a sense, we treat association as a property-like concept where properties of the concept specify elements in the undirected case, or a subject and an object in the directed case. That is, our association concepts act as a sort-of predicate such that instead of having the more familiar concept-property-concept triple we have concept-concept-concept triples. Of course in a strict sense, we actually have two (or more) traditional triples in this case. Is everyone confused yet? I am ;-)

1.3 Distinguishability and Extensibility

We need a clear method for distinguishing between various types of associations to ensure the resulting distributed network of musical things can be effectively navigated and reasoned over by music discovery agents. We also want to allow for some extensibility as it is likely impossible to accommodate every type of musical association one might want to express. This motivates the hierarchical class sub-class structure of association types described in the overview.

2. MuSim ontology at a glance

An alphabetical index of MuSim terms, by class (concepts) and by property (relationships, attributes), are given below. All the terms are hyper-linked to their detailed description for quick reference.

3. MuSim ontology overview

TODO: more to come...

Various levels of transparency are achievable using the similarity ontology

Various levels of transparency are achievable using the similarity ontology

UPDATE-2009-05-02: We are now in favor of a simpler hierarchy of similarity that does not include the lowest level concepts shown in the picture below. Instead we purpose using the AssociationMethod concept to reify the process used to derive association statements following the named graphs paradigm.

class hierarchy

The current design philosophy is to keep this hierarchy rather simple and encourage sub-classing of concepts where appropriate. It is conceivable that in typical usage the asserter property would be more useful than the structure of this hierarchy for deciding what similarity statements are interesting.

3.1. Example

Here is a very basic document describing the similarity between "Sunrise" and "I Can't Go for That":

	@prefix mo <>
	@prefix sim <>       
	@prefix rdfs <>	
	@prefix dc <>	

	:Sunrise a mo:Track .
	:ICantGoForThat a mo:Track .
	:MySimilarity a sim:Similarity;
		sim:element :Sunrise;
		sim:element :ICantGoForThat;
		rdfs:comment """these tracks use the same backing rhythm and melody"""@en;
		sim:asserter <> .


Note that we've included the sim:asserter property to establish that kurtjx is the agent asserting this similarity relationship. This is a matter of best practice when using sim:Association and it's various sub-concepts. We use the sim:asserter property when the derivation of the association statement is non-transparent - we know that kurtjx made this assertion but not how or why. In general, providing as much information as possible about who is making an assertion and why is a good idea - it allows agents re-using this data to make an informed decision about how to re-use it.

Also note the above coded association is undirected in that we are stating that :Sunrise is equally similar to :ICantGoForThat as :ICantGoForThat is to :Sunrise. In undirected associations we use the element property to specify which MusicalItems that are involved in a particular association.

Now imagine that another agent might want to make a directed statement about association as in :ICantGoForThat influenced :Sunrise but not the reverse. We would the say:

	@prefix mo <>
	@prefix sim <>
	@prefix rdfs <>       
	@prefix dc <>	

	:Sunrise a mo:Track .
	:ICantGoForThat a mo:Track .
	:MyInfluence a sim:Influence;
		sim:subject :ICantGoForThat;
		sim:object :Sunrise;
		rdfs:comment """the track :ICantGoForThat influenced the creation of :Sunrise"""@en;
		sim:asserter <> .


Note that the use of subject and object implies a directed influence relationship. In a sense we are using the subject-predicate-object paradigm that is so familiar in RDF but the predicate is a concept instead of an object property.

Of course some purists might take exception to stating that one track influenced another - an artist is influenced not her creation. However in MuSim we allow minor inconsistencies like this to provide (hopefully) the greatest flexibility and usability.

Now consider the named graph approach which is most appropriate when there is some level of transparency in the similarity/association determination process.
	@prefix mo <>
	@prefix sim <>
	@prefix rdfs <>       
	@prefix dc <>

   		a sim:Similarity;
   		sim:element :Track01;
   		sim:element :Track02;
   		sim:method :MyAssociationMethod.

   		a sim:AssociationMethod;
   		dc:creator <>
   		sim:description :MyJustification.

   		a rdfg:Graph.

In the above example, :MyJustification is a named graph that illucidates the process used for deriving this similarity statement. TODO: more here - actually write the graph example :-)

A DBTune wrapper service demonstrates how MuSim can be used to describe artist similarity on

4. Cross-reference for MuSim classes and properties



A References

Music Ontology
Music Ontology a thorough and mature ontology for describing music-related data. sim:MusicalThing is an owl:unionOf several concepts in the Music Ontology

B Changes in this version (Non-Normative)

C Acknowledgements (Non-Normative)

Thanks to the Music Ontology specification mailing list especially Yves Raimond and Samer Abdallah for their helpful input. Special thanks to Antoine Zimmermann for pointing out numerous issues and proposing fixes!

This work is part of the OMRAS 2 project supported by EPSRC grant EP/E017614/1 at the Centre for Digital Music Queen Mary University of London.