The Similarity Ontology - MuSim

Working Draft

Latest version:
http://purl.org/ontology/similarity/ (RDF/XML, Turtle)
Last Update:
Date: 23:06:39 10/03/10 BST
Editors:
Kurt Jacobson, Centre for Digital Music Queen Mary University of London
Authors:
Kurt Jacobson, Centre for Digital Music Queen Mary University of London
Yves Raimond, BBC
Thomas Gängler, Technishe Universität Dresden
Contributors:
See acknowledgements


Abstract

This specification defines a small ontology for similarity called MuSim. In MuSim, the association between two (or more) Things is a class to be reified rather than a property. This allows us to embrace the complexity of associations and accommodate the subjectivity and context-dependence of musical and multimedia similarity. Although this ontology was designed with music similarity in mind, it can readily be applied to other domains.

Status of this Document

This is a work in progress and as such is subject to change. Comments are very welcome, please send them to kurtjx gmail.

Table of Contents

  1. Introduction
  2. MuSim ontology at a glance
  3. MuSim ontology overview
    1. Simple Examples
      1. Undirected Similarity
      2. Directed Similarity
    2. Using sim:AssociationMethod for Provenance
      1. Domain, range and scope
    3. Workflow Graphs and Transparency
    4. MuSim SPARQL Queries
    5. Example Implementations
  4. Cross-reference for MuSim classes and properties

Appendixes

  1. Normative References
  2. Changes in this version (Non-Normative)
  3. Acknowledgements (Non-Normative)

1 Introduction

The MuSim ontology aims to describe the associations between musical items and is motivated by a variety of use cases. Generally we would imagine these items to be music artists (a mo:MusicArtist) or music tracks (a mo:Track) although there are definitely other possibilities. With MuSim, we envision the creation of a distributed music information system that uses data about associations between musical items to allow for recommendation and discovery in a wide variety of contexts. Sometimes MuSim RDF would be stored directly in triple stores, sometimes it would be generated on demand (eg using some similarit system based on vector space).

While the Music Ontology deals primarily with objective attributes of music-related items such as authorship, composition, performance, etc., MuSim aims to capture more subjective attributes. This means that the provenance of an association concept is important as well as the context.

1.2 Similarity as a Concept

In MuSim we treat musical associations as concepts rather than properties. Previous ontologies related to the domain of music (and ontologies in other domains) treat associations and similarity as properties. Using the Music Ontology we might say:

		@prefix mo <http://purl.org/ontology/mo/>

		:TrackA a mo:Track .
		:TrackB a mo:Track .
		:TrackA mo:similar_to :TrackB .
	          
A basic property-based similarity in MO

However we have no additional information about this similarity - is it based on acoustic features? chronological proximity? expert opinion? are the signals similar in terms of timbre? melodic content?

The property based approach to similarity is illustrated in figure 1.

Similarity as a property
Similarity as a property - the MO way

While it is entirely possible to make additional statements about this triple using a named graphs approach, MuSim provides a reification paradigm that allows for SPARQL queries that are more intuitive and more efficient in at least some triple store implementations (eg 4Store)

In MuSim we propose a concept-based approach to association. Similarity and association are concepts to be described and contextualized with supporting concepts and properties. This is illustrated in figure 2.

Similarity as a concept
A graph visualization of how MuSim concepts are intended to be used. A similarity statement of type sim:Similarity is associated with an instance of sim:AssociationMethod which in turn leads to more information about the similarity derivation process

2. MuSim ontology at a glance

An alphabetical index of MuSim terms, by class (concepts) and by property (relationships, attributes), are given below. All the terms are hyper-linked to their detailed description for quick reference.

Classes: Association, AssociationMethod, Influence, Network, Similarity,

Properties: association, description, distance, domain, edge, element, grounding, method, object, range, scope, subject, weight, workflow,

3. MuSim ontology overview

Instead of treating similarity or, to use a broader term, association as a property, we treat association as a class concept. This allows for easy reification of similarity statements. We introduce the class sim:Association and a sub-class sim:Similarity as the key concepts in our ontology. We then define the class sim:AssociationMethod for describing a method for determining similarity. By associating a similarity statement of type sim:Similarity with an instance of sim:AssociationMethod we are describing in what sense the elements involved in our statement are similar. We can attach additional provenance and/or transparency data to the sim:AssociationMethod as desired.

3.1. Simple Examples

In this section we provide some basic examples for describing undirected and directed similarity.

3.1.1 Undirected similarity

Here is a very basic document describing an undirected similarity between two tracks:

	@prefix mo <http://purl.org/ontology/mo/>
	@prefix sim <http://purl.org/ontology/similarity/>       
	@prefix foaf <http://xmlns.com/foaf/0.1/>	

	:track01 a mo:Track .  
	:track02 a mo:Track .  

	:me a foaf:Person .  

	:mySimilarity a
	  sim:Similarity ; 
	  sim:element :track01 ; 
	  sim:element :track02 ;
	  sim:weight "0.90" ; 
	  foaf:maker :me .
      
A very basic symmetric (undirected) similarity statement with some reification

First we define two tracks using the corresponding Music Ontology concept mo:Track. The identifiers of these tracks can give entry points to additional information in other data sets (i.e. linking to dbpedia.org URIs or MusicBrainz identifiers). We define :mySimilarity as the actual similarity statement. The sim:element property is used to refer to the tracks involved in this similarity and the foaf:maker property refers to the agent which asserted this similarity. Also note we can assign a numerical weight value to the similarity using the sim:weight property.

Now we have a method for asserting a similarity statement and reifying that statement to some extent. However, in the above example we only know who is making the similarity statement, we do not know how or why.

3.1.2 Directed similarity

Note that to specify a directed association we could use sim:subject and sim:object instead of sim:element.

	@prefix mo <http://purl.org/ontology/mo/>
	@prefix sim <http://purl.org/ontology/similarity/>       
	@prefix foaf <http://xmlns.com/foaf/0.1/>	

	:track01 a mo:Track .  
	:track02 a mo:Track .  

	:me a foaf:Person .  

	:mySimilarity a
	  sim:Similarity ; 
	  sim:subject :track01 ; 
	  sim:object :track02 ;
	  sim:weight "0.90" ; 
	  foaf:maker :me .
      
A very basic directed similarity statement with some reification

The use of sim:subject and sim:object implies a directed similarity relationship while the use of sim:element implies an undirected association. In a sense we are using the subject-predicate-object paradigm that is so familiar in RDF but the predicate is a concept instead of an object property. This makes our modeling reminiscent of that used in the rdf:Statement reification process with the rdf:predicate implied. However, a directed sim:Association does not inherently imply any additional triple while the rdf:Statement does imply a "simple" triple exists (causing a duplication problem among other problems). Also note that despite syntactic similarities, sim:subject is not a subclass of rdf:subject and sim:object is not a subclass of rdf:object and sim:Association is not a subclass of rdf:Statement.

3.2. Using sim:AssociationMethod for Provenance

We introduce the sim:AssocationMethod concept to identify the process used to derive a similarity statement. This enables some interesting functionality when consuming the associations data - a consumer application can elect to include only similarity statements that are derived by a particular process. This is discussed further in the SPARQL query examples section . Let us consider the following N3 listing which describes a symmetric undirected similarity:

	@prefix mo <http://purl.org/ontology/mo/>
	@prefix sim <http://purl.org/ontology/similarity/>       
	@prefix foaf <http://xmlns.com/foaf/0.1/>

	:timbreSimilarityStatement
	  a sim:Similarity ;
	  sim:subject :track01 ;
	  sim:object :track02 ;
	  sim:weight "0.9" ;
	  sim:method :timbreBasedSimilarity .

	:timbreBasedSimilarity
	  a sim:AssociationMethod ;
	  foaf:maker :me ;
     
using the sim:AssociationMethod concept for reification

Here :timbreBasedSimilarity is the entity that describes our process for deriving similarity statements. Note that this entity is only described by two triples - its class type and a property for the creator.

By including an identifier for the similarity derivation process we enable the use of more advanced provenance frameworks. The sim:AssociationMethod can be considered a Process - some action or series of actions performed on or caused by artifacts and resulting in new artifacts. In the MuSim context artifacts are sim:Associations. With this simple mapping paradigm we can potentially plug MuSim into nearly any provenance framework including provenance the Open Provenance Model, the Provenir Ontology, the Provenance Vocabulary, or any other provenance framework that uses the concepts of processes and artifacts.

Note mappings to provenance ontologies are currently not made explicit. Explicit mappings might be included in the future perhaps based on the recommendations of the W3C Provenance Incubator Group.

3.2.1 Domain, range, and scope

Sometimes it is desirable to specify explicitly the types of entities to which a particular association method applies. For the directed case, this is done by binding sim:domain and sim:range predicates to a sim:AssociationMethod.

	@prefix mo <http://purl.org/ontology/mo/>
	@prefix sim <http://purl.org/ontology/similarity/>       

        :MyDirectedMethod a sim:AssociationMethod;
          sim:domain mo:MusicArtist;
          sim:range mo:Genre.

        :MyDirectedSim a sim:Genre ;
          sim:subject [a mo:MusicArtist] ;
          sim:object [a mo:Label] .
    
specifying a sim:domain and sim:range for a sim:AssociationMethod

The use of the sim:domain and sim:range implies the given association method produces directed association statements. For the undirected case we bind the sim:scope predicate to an association method.

	@prefix mo <http://purl.org/ontology/mo/>
	@prefix sim <http://purl.org/ontology/similarity/>       

        :MyDirectedMethod a sim:AssociationMethod;
          sim:scope mo:MusicArtist .

        :MyDirectedSim a sim:Association ;
          sim:element [a mo:MusicArtist] ;
          sim:element [a mo:MusicArtist] .
    
specifying a sim:scope for a sim:AssociationMethod

3.3. Workflow Graphs and Transparency

We now have some level of provenance - at least an entry point to some given provenance framework, but we can also include some level of transparency by disclosing a association derivation workflow.

Now consider we have a workflow that describes a particular association derivation process. We can bind this workflow to an appropriate sim:AssociationMethod using the sim:workflow property.

	@prefix mo <http://purl.org/ontology/mo/>
	@prefix sim <http://purl.org/ontology/similarity/>       
	@prefix foaf <http://xmlns.com/foaf/0.1/>

	:timbreSimilarityStatement
	  a sim:Similarity ;
	  sim:element :track01 ;
	  sim:element :track02 ;
	  sim:weight "0.9" ;
	  sim:method :timbreBasedSimilarity .

	:timbreBasedSimilarity
	  a sim:AssociationMethod ;
	  foaf:maker :me ;
	  sim:workflow :algorithm .

	:algorithm = {
	{ { ?signal1 mo:published_as ?track01 .
	    ?signal1 sig:mfcc ?mfcc1 . 
	    ?mfcc1 sig:gaussian ?model1 } 
	      ctr:cc
	  { ?signal2 mo:published_as ?track02 .
	    ?signal2 sig:mfcc ?mfcc2 . 
	    ?mfcc2 sig:gaussian ?model2 } . 
	  (?model1 ?model2) sig:emd ?div .
	  ?div math:lessThan 0.2 } =>
	  { _:timbreSimilarityStatement
	      a sim:Similarity ;
	      sim:element ?track01 ;
	      sim:element ?track02 } 
	}
      
An example of a similarity statement with some provenance and transparency.

In the above example, :algorithm is a named graph that elucidates the process used for deriving this similarity statement. We use the N3-Tr syntax as described by Raimond in his thesis. Alternative workflow frameworks could be used as well. Additional details of specifying similarity derivation workflows are left to future work.

This :algorithm graph provides a disclosure of the algorithm used in the similarity derivation process. In this case, Mel-frequency cepstral coefficients (MFCCs) are extracted and Gaussian mixture models are created concurrently for the two signals, and an earth mover's distance is calculated between models. This is a method commonly used in audio-based signal analysis to derive some measure of timbre similarity. Depending on that distance, we output a similarity statement. If more details are needed about a particular computational step, e.g.~if we want to gather more information about the MFCC extraction step, we can look-up the corresponding web identifier, in this case sig:mfcc.

Various levels of transparency are achievable using the similarity ontology. This is illustrated in the following figure.

Various
    levels of transparency are achievable using the similarity
    ontology
Levels of transparency in MuSim

3.4. MuSim SPARQL Queries

Queries in this similarity ecosystem would be made using the SPARQL query language. The SPARQL specification is a W3C recommendation and the preferred method for querying RDF graphs. As mentioned before, the design of the Similarity Ontology allows for the construction of simple queries to retrieve similarity information. The following query retrieves artists similar to a target artist as stated by a specific trusted method:

	PREFIX sim: <http://purl.org/ontology/similarity/>	

	SELECT ?artists WHERE {
	  ?statement sim:method <http://trusted.method/uri> .
	  ?statement sim:element <http://target.artist/uri> .
	  ?statement sim:element ?artists .
	}
A SPARQL query for similarity statements specifying a trusted agent

Notice we only have to include a triple pattern for our target resource, a triple pattern for our trusted agent, and a triple pattern to select the similar artists. Of course this is a very simple example and in real-world applications we include additional optional patterns and conjunctions for a more expressive query.

In an initialization step, an application could query available data sources to determine exactly what association methods and asserting agents are available. The application would use the following query:

	PREFIX sim: <http://purl.org/ontology/similarity/>

	SELECT DISTINCT ?method WHERE{
	  ?method a sim:AssociationMethod .
	}
A SPARQL query to retrieve a list of available association methods.

The application could then filter through the results and, perhaps with some input from the end-user, decide which similarity agents to trust.

3.5. Implementations

This list of systems that leverage MuSim is not necissarly up-to-date or complete.

4. Cross-reference for MuSim classes and properties

Class: sim:Association

URI: http://purl.org/ontology/similarity/Association

Association - An abstract class to define some association between things. Entities share an association if they are somehow inter-connected. Generally a directed association should have at lease one sim:subject property and one sim:object property or an undirected association should have at least two sim:element properties, however this is not a requirement and intentionally left out of the model.

sub-class-of:
owl:Thing
in-domain-of:
sim:element
sim:grounding
sim:method
sim:object
sim:subject
sim:weight
sim:distance
in-range-of:
sim:edge
sim:association
status:
testing

[back to top]

Class: sim:AssociationMethod

URI: http://purl.org/ontology/similarity/AssociationMethod

Association Method - A concept for representing the method used to derive association or similarity statements.

sub-class-of:
owl:Thing
in-domain-of:
sim:domain
sim:range
sim:scope
sim:description
sim:workflow
in-range-of:
sim:method
status:
testing

[back to top]

Class: sim:Influence

URI: http://purl.org/ontology/similarity/Influence

Influence - An abstract class indicating a directed association of influence where the subject entity has influenced the object entity.

sub-class-of:
sim:Association
status:
testing

[back to top]

Class: sim:Network

URI: http://purl.org/ontology/similarity/Network

Network - A network is a grouping of sim:Associations. The associations that comprise a network are specified using a series of sim:edge predicates.

sub-class-of:
owl:Thing
in-domain-of:
sim:edge
status:
testing

[back to top]

Class: sim:Similarity

URI: http://purl.org/ontology/similarity/Similarity

Similarity - An abstract class to define similarity between two or more things. Entities share a similarity if they share some common characteristics of interest. A similarity is a special type of association.

sub-class-of:
sim:Association
status:
testing

[back to top]

Property: sim:association

URI: http://purl.org/ontology/similarity/association

association - Binds a sim:Association to an arbitrary thing.

OWL Type:
ObjectProperty
Domain:
owl:Thing
Range:
sim:Association
status:
testing

[back to top]

Property: sim:description

URI: http://purl.org/ontology/similarity/description

description - Specifies some description that discloses the process or set of processes used to derive association statements for the given AssociationMethod. This property is depricated in favor of the more appropriately named sim:workflow property.

OWL Type:
ObjectProperty
Domain:
sim:AssociationMethod
status:
depricated

[back to top]

Property: sim:distance

URI: http://purl.org/ontology/similarity/distance

distance - A weighting value for an Association where a value of 0 implies two elements are the same individual.

OWL Type:
DatatypeProperty
Domain:
sim:Association
status:
testing

[back to top]

Property: sim:domain

URI: http://purl.org/ontology/similarity/domain

domain - Specifies appropriate object types for the sim:subject predicate for sim:Associations bound to the given sim:AssociationMethod. The presence of this predicate implies the given sim:AssociationMethod begets directed associations.

OWL Type:
ObjectProperty
Domain:
sim:AssociationMethod
Range:
owl:Thing
status:
testing

[back to top]

Property: sim:edge

URI: http://purl.org/ontology/similarity/edge

edge - Specifies an edge in a sim:Network

OWL Type:
ObjectProperty
Domain:
sim:Network
Range:
sim:Association
status:
testing

[back to top]

Property: sim:element

URI: http://purl.org/ontology/similarity/element

element - Specifies an entity involved in the given sim:Association and implies the given association is undirected.

OWL Type:
ObjectProperty
Domain:
sim:Association
status:
testing

[back to top]

Property: sim:grounding

URI: http://purl.org/ontology/similarity/grounding

grounding - Binds an sim:Association statement directly instantiated N3-Tr formulae or some other workflow graph which enabled the association derivation.

OWL Type:
ObjectProperty
Domain:
sim:Association
status:
testing

[back to top]

Property: sim:method

URI: http://purl.org/ontology/similarity/method

method - Specifies the sim:AssociationMethod used to derive a particular Association statement. This should be used when the process for deriving association statements can be described further.

OWL Type:
ObjectProperty
Domain:
sim:Association
Range:
sim:AssociationMethod
status:
testing

[back to top]

Property: sim:object

URI: http://purl.org/ontology/similarity/object

object - Specifies the object of a sim:Association implying a directed association where "subject is associated to object" but the reverse association does not necessarily exist, and if it does exist, it is not an equivalent association.

OWL Type:
ObjectProperty
sub-property-of:
sim:element
Domain:
sim:Association
status:
testing

[back to top]

Property: sim:range

URI: http://purl.org/ontology/similarity/range

domain - Specifies appropriate object types for the sim:object predicate for sim:Associations bound to the given sim:AssociationMethod. The presence of this predicate implies the given sim:AssociationMethod begets directed associations.

OWL Type:
ObjectProperty
Domain:
sim:AssociationMethod
Range:
owl:Thing
status:
testing

[back to top]

Property: sim:scope

URI: http://purl.org/ontology/similarity/scope

domain - Specifies appropriate object types for the sim:element predicate for sim:Associations bound to the given sim:AssociationMethod. The presence of this predicate implies the given sim:AssociationMethod begets undirected associations.

OWL Type:
ObjectProperty
Domain:
sim:AssociationMethod
Range:
owl:Thing
status:
testing

[back to top]

Property: sim:subject

URI: http://purl.org/ontology/similarity/subject

subject - Specifies the subject of an sim:Association implying a directed association where "subject is associated to object" but the reverse association does not necessarily exist, and if it does exist, it is not an equivalent association.

OWL Type:
ObjectProperty
sub-property-of:
sim:element
Domain:
sim:Association
status:
testing

[back to top]

Property: sim:weight

URI: http://purl.org/ontology/similarity/weight

weight - A weighting value bound to a sim:Association where a value of 0 implies two elements are not at all associated and a higher value implies a closer association.

OWL Type:
DatatypeProperty
Domain:
sim:Association
status:
testing

[back to top]

Property: sim:workflow

URI: http://purl.org/ontology/similarity/workflow

workflow - Specifies a workflow that discloses the process or set of processes used to derive association statements for the given sim:AssociationMethod

OWL Type:
ObjectProperty
Domain:
sim:AssociationMethod
status:
testing

[back to top]

A References

Music Ontology
Music Ontology a thorough and mature ontology for describing music-related data. sim:MusicalThing is an owl:unionOf several concepts in the Music Ontology

B Changes in this version (Non-Normative)

2010-08-02

2010-07-24

2010-07-23

2010-07-20

2010-06-13

2009-12-07

2009-03-26

2009-03-25

2009-03-07

C Acknowledgements (Non-Normative)

Thanks to the Music Ontology specification mailing list especially Yves Raimond and Samer Abdallah for their helpful input. Special thanks to Antoine Zimmermann for pointing out numerous issues and proposing fixes!

This work is part of the OMRAS 2 project supported by EPSRC grant EP/E017614/1 at the Centre for Digital Music Queen Mary University of London.