The Similarity Ontology

Working Draft

This version:
http://grasstunes.net/ontology/similarity/0.2/musim.owl
Latest version:
http://purl.org/ontology/similarity/
Last Update:
Date: 14:04:00 07/12/09 BST
Editors:
Kurt Jacobson, Centre for Digital Music Queen Mary University of London
Authors:
Kurt Jacobson, Centre for Digital Music Queen Mary University of London
Yves Raimond, BBC
Contributors:
See acknowledgements

Abstract

This specification defines a small ontology for similarity sometimes called MuSim. In MuSim, the association between two (or more) Things is a class to be reified rather than a property. This allows us to embrace the complexity of musical associations and accommodate the subjectivity and context-dependence of musical similarity. Although this ontology was designed with music similarity in mind, it can readily be applied to other domains.

Status of this Document

This is a work in progress! This document is changing on a daily if not hourly basis. Comments are very welcome, please send them to kurtjx gmail. Thank you.

Table of Contents

  1. Introduction
    1. Use Cases
      1. Basic Similarity
      2. Signal-based similarity
      3. Rivalry
      4. Influence
      5. Personal Associations
    2. Similarity as a Concept
    3. Distinguishability and Extensibility
  2. Sim ontology at a glance
  3. Sim ontology overview
    1. Example
  4. Cross-reference for Sim classes and properties

Appendixes

  1. Normative References
  2. Changes in this version (Non-Normative)
  3. Acknowledgements (Non-Normative)

1 Introduction

The Sim ontology aims to describe the associations between musical items. Generally we would imagine these items to be music artists (a mo:MusicArtist) or music tracks (a mo:Track) although there are definitely other possibilities. With Sim, we envision the creation of a distributed music information system that uses data about associations between musical items to allow for recommendation and discovery in a wide variety of contexts.

While the Music Ontology deals primarily with objective attributes of music-related items such as authorship, composition, performance, etc., Sim aims to capture more subjective attributes. This means that the provenance of an association concept is important as well as the context. TODO: expand this

In section 1.1 we describe various use cases that motivate the creation of Sim. In section 1.2 we elaborate on the reification of similarity and association as concepts rather than object properties.

1.1 Use Cases

Let us explore some usage scenarios that have motivated the creation of this ontology.

1.1.1 Basic Similarity

Some instances of musical similarity are rather obvious and straight-forward. For example, Simply Red's track "Sunrise" uses essentially the same melodies and instrumentation as Haul and Oates "I Can't Go for That" however the lyrical content and song structure are totally different. We can quite easily call these two songs similar and even consider this statement to be somewhat objective - the similarity is not so much a matter of opinion but based in the facts of the musical make-up of the respective songs.

Just a Gigilo is an adaptation by Irving Caesar in 1929 from the Austrian song "Schöner Gigolo", written in 1928 by Leonello Casucci (music) and Julius Brammer (lyrics).

In such instances the property-based approach to similarity (mo:similar_to) might be sufficient. But perhaps we want to say more or know more. Is this a case of sampling or just quoting? How does this relate to the interests of the various copyright holders (sigh)? Can we easily or automatically find other tracks that re-use "I Can't Go For That" in an analogous way?

1.1.2 Signal-based Similarity

Signal analysis or "computer listening" is sometimes used to determine inter-song similarities. For example, we might extract Mel-frequency cepstral coefficients (MFCCs) from a set of audio signals, use these to create Gaussian mixture models for each song, and determine inter-song dissimilarities using an earth mover's distance function. Alternatively, we might start with the same MFCCs but then calculate mean and variance vectors and use KL-divergence between these vectors and the vectors of other audio signals to create inter-song dissimilarity measures.

We would then like to express associations between audio signals referencing the actual numerical results of the calculation as well as the various algorithms that were used to obtain the results. Perhaps we would also like to specify additional metadata about the association such as date of creation and references to the exact audio files used. What is the most appropriate way to represent this data?

1.1.3 Rivalry

One might be interested in exploring music in terms of historical rivalries such as hip-hop feuds or beefs - a classic example being the feud between two music groups of the 1980's Boogie Down Productions and The Juice Crew. A series of tracks were released by various members of the respective groups contesting the origins of hip-hop and launching personal attacks against members of the opposing group. This series of answer records eventually became known as the Bridge Wars and was drawn out over a number of years and through a number of proxies.

We can use existing properties to describe group memberships (foaf:member) and track authorship (foaf:made) but how do we associate these two feuding groups? How do we associate the various releases (answer records) that provide a musical record of this feud?

1.1.4 Influence

A music critic might remark The unique sense of lyricism and unparalleled melodic genius of Fredrick Chopin had a profound influence on the later work of Claude Debussy. Such statements maybe quite useful for music discovery or recommendation.

Exactly how should we express such a subjective statement? We would definitely want to know who is making this statement and what sort of credentials she might hold. Also what kind of consistency checks can we provide for such statements? Of course Debussy could not have influenced the work of Chopin as Chopin died nearly 20 years before Debussy was born.

1.1.5 Personal Associations

Associations between musical things are often highly personal and might be tied to a time, place, or person. Consider the following example:

When a first year student at college, I dated a girl who loved Bob Marley and David Bowie.

The narrator in the above passage may wish to create an association between Bob Marley and David Bowie that references a time (first year of college), a place (college), and a person (previous girlfriend). How can we allow for this level of provenance-tracking and expressiveness?

1.2 Similarity as a Concept

To accomadate the afore mentioned use cases we will treat musical associations as concepts rather than properties. Previous ontologies related to the domain of music (and ontologies in other domains) treat associations and similarity as properties. Using the Music Ontology we might say:

		@prefix mo <http://purl.org/ontology/mo/>

		:Sunrise a mo:Signal .
		:CantGoForThat a mo:Signal .
		:Sunrise mo:similar_to :CantGoForThat .

	      

However we have no additional information about this similarity - is it based on acoustic features? chronological proximity? expert opinion? are the signals similar in terms of timbre? melodic content?

The property based approach to similarity is illustrated in figure 1.

Similarity as a property

In Sim we propose a concept-based approach to association. Similarity and association are concepts to be described and contextualised with supporting concepts and properties. This is illustrated in figure 2.

Similarity as a concept

In a sense, we treat association as a property-like concept where properties of the concept specify elements in the undirected case, or a subject and an object in the directed case. That is, our association concepts act as a sort-of predicate such that instead of having the more familiar concept-property-concept triple we have concept-concept-concept triples. Of course in a strict sense, we actually have two (or more) traditional triples in this case. Is everyone confused yet? I am ;-)

1.3 Distinguishability and Extensibility

We need a clear method for distinguishing between various types of associations to ensure the resulting distributed network of musical things can be effectively navigated and reasoned over by music discovery agents. We also want to allow for some extensibility as it is likely impossible to accommodate every type of musical association one might want to express. This motivates the hierarchical class sub-class structure of association types described in the overview.

2. Sim ontology at a glance

An alphabetical index of Sim terms, by class (concepts) and by property (relationships, attributes), are given below. All the terms are hyper-linked to their detailed description for quick reference.

Classes: AcousticSimilarity, Association, AssociationMethod, CompositionalSimilarity, ContextualSimilarity, Influence, Similarity,

Properties: asserter, description, distance, element, grounding, method, object, subject, weight,

3. Sim ontology overview

TODO: more to come...

Various levels of transparency are achievable using the similarity ontology

Various levels of transparency are achievable using the similarity ontology

UPDATE-2009-05-02: We are now in favor of a simpler hierarchy of similarity that does not include the lowest level concepts shown in the picture below. Instead we purpose using the AssociationMethod concept to reify the process used to derive association statements following the named graphs paradigm.

class hierarchy

The current design philosophy is to keep this hierarchy rather simple and encourage sub-classing of concepts where appropriate. It is conceivable that in typical usage the asserter property would be more useful than the structure of this hierarchy for deciding what similarity statements are interesting.

3.1. Example

Here is a very basic document describing the similarity between "Sunrise" and "I Can't Go for That":

	@prefix mo <http://purl.org/ontology/mo/>
	@prefix sim <http://purl.org/ontology/similarity/>       
	@prefix rdfs <http://www.w3.org/2000/01/rdf-schema#>	
	@prefix dc <http://purl.org/dc/elements/1.1/>	

	:Sunrise a mo:Track .
	:ICantGoForThat a mo:Track .
	:MySimilarity a sim:Similarity;
		sim:element :Sunrise;
		sim:element :ICantGoForThat;
		rdfs:comment """these tracks use the same backing rhythm and melody"""@en;
		sim:asserter <http://kurtisrandom.com/foaf.rdf#kurtjx> .

      

Note that we've included the sim:asserter property to establish that kurtjx is the agent asserting this similarity relationship. This is a matter of best practice when using sim:Association and it's various sub-concepts. We use the sim:asserter property when the derivation of the association statement is non-transparent - we know that kurtjx made this assertion but not how or why. In general, providing as much information as possible about who is making an assertion and why is a good idea - it allows agents re-using this data to make an informed decision about how to re-use it.

Also note the above coded association is undirected in that we are stating that :Sunrise is equally similar to :ICantGoForThat as :ICantGoForThat is to :Sunrise. In undirected associations we use the element property to specify which MusicalItems that are involved in a particular association.

Now imagine that another agent might want to make a directed statement about association as in :ICantGoForThat influenced :Sunrise but not the reverse. We would the say:

	@prefix mo <http://purl.org/ontology/mo/>
	@prefix sim <http://purl.org/ontology/similarity/>
	@prefix rdfs <http://www.w3.org/2000/01/rdf-schema#>       
	@prefix dc <http://purl.org/dc/elements/1.1/>	

	:Sunrise a mo:Track .
	:ICantGoForThat a mo:Track .
	:MyInfluence a sim:Influence;
		sim:subject :ICantGoForThat;
		sim:object :Sunrise;
		rdfs:comment """the track :ICantGoForThat influenced the creation of :Sunrise"""@en;
		sim:asserter <http://kurtisrandom.com/foaf.rdf#kurtjx> .

      

Note that the use of subject and object implies a directed influence relationship. In a sense we are using the subject-predicate-object paradigm that is so familiar in RDF but the predicate is a concept instead of an object property.

Of course some purists might take exception to stating that one track influenced another - an artist is influenced not her creation. However in Sim we allow minor inconsistencies like this to provide (hopefully) the greatest flexibility and usability.

Now consider the named graph approach which is most appropriate when there is some level of transparency in the similarity/association determination process.

http://www.w3.org/2004/03/trix/
	@prefix mo <http://purl.org/ontology/mo/>
	@prefix sim <http://purl.org/ontology/similarity/>
	@prefix rdfs <http://www.w3.org/2000/01/rdf-schema#>       
	@prefix dc <http://purl.org/dc/elements/1.1/>

	:MySimilarityStatement
   		a sim:Similarity;
   		sim:element :Track01;
   		sim:element :Track02;
   		sim:method :MyAssociationMethod.

	:MyAssociationMethod
   		a sim:AssociationMethod;
   		dc:creator <http://kurtisrandom.com/foaf.rdf#kurtjx>
   		sim:description :MyJustification.

	:MyJustification
   		a rdfg:Graph.
      

In the above example, :MyJustification is a named graph that illucidates the process used for deriving this similarity statement. TODO: more here - actually write the graph example :-)

A DBTune wrapper service demonstrates how Sim can be used to describe artist similarity on last.fm

4. Cross-reference for Sim classes and properties

...

Class: sim:AcousticSimilarity

URI: http://purl.org/ontology/similarity/AcousticSimilarity

Acoustic Similarity - DEPRICATED - please use Similarity and reify using an AssociationMethod A similarity statement about the acoustic character of two or more things. This maybe related to audio-based signal analysis results.

sub-class-of:
sim:Similarity
status:
depricated

[back to top]

Class: sim:Association

URI: http://purl.org/ontology/similarity/Association

Association - An abstract class to define some association between things.

sub-class-of:
owl:Thing
in-domain-of:
sim:asserter
sim:element
sim:grounding
sim:method
sim:object
sim:subject
sim:distance
sim:weight
status:
testing

[back to top]

Class: sim:AssociationMethod

URI: http://purl.org/ontology/similarity/AssociationMethod

Association Method - Class for representing the method used to derive association or similarity statements.

sub-class-of:
owl:Thing
in-domain-of:
sim:description
in-range-of:
sim:method
status:
testing

[back to top]

Class: sim:CompositionalSimilarity

URI: http://purl.org/ontology/similarity/CompositionalSimilarity

Compositional Similarity - DEPRICATED - please use Similarity and reify using an AssociationMethod A similarity statement about the compositional similarity between two or more things. Such statements would generally relate to the music-theoretic properties associated said things, perhaps asserted by musicologists, music critics, or algorithmic analysis.

sub-class-of:
sim:Similarity
status:
depricated

[back to top]

Class: sim:ContextualSimilarity

URI: http://purl.org/ontology/similarity/ContextualSimilarity

Contextual Similarity - DEPRICATED - please use Similarity and reify using an AssociationMethod A similarity statement about the contextual similarity of two or more MusicalThings stating "some MusicalThings are similar because of this Context" for example the MusicalThings share a geographic location, online associations, sub culture affiliations, a time period or other.

sub-class-of:
sim:Similarity
status:
depricated

[back to top]

Class: sim:Influence

URI: http://purl.org/ontology/similarity/Influence

Influence - A concept indicating the influence of a subject musical thing on an object musical thing.

sub-class-of:
sim:Association
status:
testing

[back to top]

Class: sim:Similarity

URI: http://purl.org/ontology/similarity/Similarity

Similarity - An abstract class to define similarity between two or more things.

sub-class-of:
sim:Association
status:
testing

[back to top]

Property: sim:asserter

URI: http://purl.org/ontology/similarity/asserter

asserter - Specifies what agent is asserting the give musical association is true. The range is unspecified for maximum flexibility but would likely include a Person foaf:Person or some other agent, entity, or algorithm. The asserter role should be used when the method for deriving the association is not disclosed. If the method is transparent, use the method role to specify an .

OWL Type:
ObjectProperty
Domain:
sim:Association
status:
testing

[back to top]

Property: sim:description

URI: http://purl.org/ontology/similarity/description

description - Specifies a named graph that discloses the process or set of processes used to derive association statements for the given AssociationMethod

OWL Type:
ObjectProperty
Domain:
sim:AssociationMethod
status:
testing

[back to top]

Property: sim:distance

URI: http://purl.org/ontology/similarity/distance

distance - A numeric weighting value for an Association where a value of 0 implies two elements are the same individual. TODO: restrict numeric values to positve, create a normalized version of distance and a reciporical verison

OWL Type:
DatatypeProperty
Domain:
sim:Association
Range:
xsd:double
xsd:float
xsd:int
status:
testing

[back to top]

Property: sim:element

URI: http://purl.org/ontology/similarity/element

element - Specifies the given undirected Association has the given MusicalThing as an element.

OWL Type:
ObjectProperty
Domain:
sim:Association
status:
testing

[back to top]

Property: sim:grounding

URI: http://purl.org/ontology/similarity/grounding

grounding - associates a similarity statement with the instantiated N3-Tr formulae which enabled its derivation

OWL Type:
ObjectProperty
Domain:
sim:Association
status:
testing

[back to top]

Property: sim:method

URI: http://purl.org/ontology/similarity/method

method - Specifies the AssociationMethod used to derive a particular Association statement. This should be used when the process for deriving association statements is transparent and can be further reified using a named graphs approach. If the association derivation process is non-transparent (i.e. 'black box') use the asserter role.

OWL Type:
ObjectProperty
Domain:
sim:Association
Range:
sim:AssociationMethod
status:
testing

[back to top]

Property: sim:object

URI: http://purl.org/ontology/similarity/object

object - Specifies the object (MusicalThing) of an Association implying a directed association where "subject is associated to object"

OWL Type:
ObjectProperty
sub-property-of:
sim:element
Domain:
sim:Association
status:
testing

[back to top]

Property: sim:subject

URI: http://purl.org/ontology/similarity/subject

subject - Specifies the subject of an Association implying a directed association where "subject is associated to object"

OWL Type:
ObjectProperty
sub-property-of:
sim:element
Domain:
sim:Association
status:
testing

[back to top]

Property: sim:weight

URI: http://purl.org/ontology/similarity/weight

weight - A numeric weighting value for an Association where a value of 0 implies two elements are not at all associated and a higher value implies a closer association. TODO: restrict numeric values to positve, create a normalized version of distance and a reciporical verison

OWL Type:
DatatypeProperty
Domain:
sim:Association
Range:
xsd:double
xsd:float
xsd:int
status:
testing

[back to top]

...

A References

Music Ontology
Music Ontology a thorough and mature ontology for describing music-related data. sim:MusicalThing is an owl:unionOf several concepts in the Music Ontology

B Changes in this version (Non-Normative)

C Acknowledgements (Non-Normative)

Thanks to the Music Ontology specification mailing list especially Yves Raimond and Samer Abdallah for their helpful input.

This work is part of the OMRAS 2 project supported by EPSRC grant EP/E017614/1 at the Centre for Digital Music Queen Mary University of London.