You are granted a license to use, reproduce
and create derivative works of this document under Creative
Commons Attribution 3.0 Unported License. This copyright applies to The Similarity Ontology specification and RDF.
This specification defines a small ontology for similarity called MuSim. In MuSim, the association between two (or more) Things is a class to be reified rather than a property. This allows us to embrace the complexity of associations and accommodate the subjectivity and context-dependence of musical and multimedia similarity. Although this ontology was designed with music similarity in mind, it can readily be applied to other domains.
This is a work in progress and as such is subject to change. Comments are very welcome, please send them to kurtjx gmail.
The MuSim ontology aims to describe the associations between
musical items and is motivated by a variety of use cases. Generally we
would imagine these items to be music artists (a
mo:MusicArtist) or music tracks (a
mo:Track) although there are definitely
other possibilities. With MuSim, we envision the
creation of a distributed music information system
that uses data about associations between musical
items to allow for recommendation and discovery in a
wide variety of contexts. Sometimes MuSim RDF would be
stored directly in triple stores, sometimes it would
be generated on demand (eg using some similarit system
based on vector space).
While the Music Ontology deals primarily with objective attributes of music-related items such as authorship, composition, performance, etc., MuSim aims to capture more subjective attributes. This means that the provenance of an association concept is important as well as the context.
In MuSim we treat musical associations as concepts rather than properties. Previous ontologies related to the domain of music (and ontologies in other domains) treat associations and similarity as properties. Using the Music Ontology we might say:
@prefix mo <http://purl.org/ontology/mo/> :TrackA a mo:Track . :TrackB a mo:Track . :TrackA mo:similar_to :TrackB .
However we have no additional information about this similarity - is it based on acoustic features? chronological proximity? expert opinion? are the signals similar in terms of timbre? melodic content?
The property based approach to similarity is illustrated in figure 1.

While it is entirely possible to make additional statements about this triple using a named graphs approach, MuSim provides a reification paradigm that allows for SPARQL queries that are more intuitive and more efficient in at least some triple store implementations (eg 4Store)
In MuSim we propose a concept-based approach to association. Similarity and association are concepts to be described and contextualized with supporting concepts and properties. This is illustrated in figure 2.

sim:Similarity is
associated with an instance of
sim:AssociationMethod which in turn leads
to more information about the similarity derivation
processAn alphabetical index of MuSim terms, by class (concepts) and by property (relationships, attributes), are given below. All the terms are hyper-linked to their detailed description for quick reference.
Classes: Association, AssociationMethod, Influence, Network, Similarity,
Properties: association, description, distance, domain, edge, element, grounding, method, object, range, scope, subject, weight, workflow,
Instead of treating similarity or, to use a broader term,
association as a property, we treat association
as a class concept. This allows for easy
reification of similarity statements. We introduce the
class sim:Association and a sub-class
sim:Similarity as the key concepts in our
ontology. We then define the class
sim:AssociationMethod for describing a method
for determining similarity. By associating a
similarity statement of type sim:Similarity
with an instance of sim:AssociationMethod we
are describing in what sense the elements involved in
our statement are similar. We can attach additional
provenance and/or transparency data to the
sim:AssociationMethod as desired.
Here is a very basic document describing the similarity between two tracks:
@prefix mo <http://purl.org/ontology/mo/>
@prefix sim <http://purl.org/ontology/similarity/>
@prefix foaf <http://xmlns.com/foaf/0.1/>
:track01 a mo:Track .
:track02 a mo:Track .
:me a foaf:Person .
:mySimilarity a
sim:Similarity ;
sim:element :track01 ;
sim:element :track02 ;
sim:weight "0.90" ;
foaf:maker :me .
First we define two tracks using the corresponding Music
Ontology concept mo:Track. The identifiers of these
tracks can give entry points to additional information in other
data sets (i.e. linking to dbpedia.org URIs or MusicBrainz
identifiers). We define :mySimilarity as the actual
similarity statement. The sim:element property is
used to refer to the tracks involved in this similarity and the
foaf:maker property refers to the agent which
asserted this similarity. Also note we can assign a numerical
weight value to the similarity using the sim:weight
property.
Now we have a method for asserting a similarity statement and reifying that statement to some extent. However, in the above example we only know who is making the similarity statement, we do not know how or why.
Note that to specify a directed association we could
use sim:subject and sim:object instead
of sim:element. The use of sim:subject
and sim:object implies a directed similarity
relationship while the use of sim:element implies an
undirected association. In a sense we are using the
subject-predicate-object paradigm that is so familiar in RDF but
the predicate is a concept instead of an object property. This
makes our modeling reminiscent of that used in
the rdf:Statement reification process with
the rdf:predicate implied. However, a
directed sim:Association does not inherently imply
any additional triple while the rdf:Statement does
imply a "simple" triple exists (causing
a duplication problem
among other problems). Also note that despite syntactic
similarities, sim:subject is not a subclass
of rdf:subject and sim:object
is not a subclass of rdf:object
and sim:Association is not a subclass
of rdf:Statement.
We introduce the sim:AssocationMethod concept to
identify the process used to derive a similarity statement. This
enables some interesting functionality when consuming the
associations data - a consumer application can elect to include
only similarity statements that are derived by a particular
process. This is discussed further in the SPARQL query examples section .
For now let us consider the following N3 listing:
@prefix mo <http://purl.org/ontology/mo/>
@prefix sim <http://purl.org/ontology/similarity/>
@prefix foaf <http://xmlns.com/foaf/0.1/>
:timbreSimilarityStatement
a sim:Similarity ;
sim:subject :track01 ;
sim:object :track02 ;
sim:weight "0.9" ;
sim:method :timbreBasedSimilarity .
:timbreBasedSimilarity
a sim:AssociationMethod ;
foaf:maker :me ;
sim:AssociationMethod
concept for reification Here :timbreBasedSimilarity is the entity that
describes our process for deriving similarity statements. Note
that this entity is only described by two triples - its class
type and a property for the creator.
By including an identifier for the similarity derivation
process we enable the use of more advanced provenance frameworks.
The sim:AssociationMethod can be considered a
Process - some action or series of actions performed on or
caused by artifacts and resulting in new artifacts. In the MuSim
context artifacts are sim:Associations. With this
simple mapping paradigm we can potentially plug MuSim into nearly
any provenance framework including provenance the Open Provenance
Model, the Provenir Ontology, the Provenance Vocabulary, or any other provenance
framework that uses the concepts of processes and artifacts.
Note mappings to provenance ontologies are currently not made explicit. Explicit mappings might be included in the future perhaps based on the recommendations of the W3C Provenance Incubator Group.
We now have some level of provenance - at least an entry point to some given provenance framework, but we can also include some level of transparency by disclosing a association derivation workflow.
Now consider we have a workflow that describes a particular
association derivation process. We can bind this workflow to
an appropriate sim:AssociationMethod using
the sim:workflow property.
@prefix mo <http://purl.org/ontology/mo/>
@prefix sim <http://purl.org/ontology/similarity/>
@prefix foaf <http://xmlns.com/foaf/0.1/>
:timbreSimilarityStatement
a sim:Similarity ;
sim:element :track01 ;
sim:element :track02 ;
sim:weight "0.9" ;
sim:method :timbreBasedSimilarity .
:timbreBasedSimilarity
a sim:AssociationMethod ;
foaf:maker :me ;
sim:workflow :algorithm .
:algorithm = {
{ { ?signal1 mo:published_as ?track01 .
?signal1 sig:mfcc ?mfcc1 .
?mfcc1 sig:gaussian ?model1 }
ctr:cc
{ ?signal2 mo:published_as ?track02 .
?signal2 sig:mfcc ?mfcc2 .
?mfcc2 sig:gaussian ?model2 } .
(?model1 ?model2) sig:emd ?div .
?div math:lessThan 0.2 } =>
{ _:timbreSimilarityStatement
a sim:Similarity ;
sim:element ?track01 ;
sim:element ?track02 }
}
In the above example, :algorithm is a named
graph that elucidates the process used for deriving
this similarity statement. We use the N3-Tr syntax as
described by Raimond in his thesis.
Alternative workflow frameworks could be used as
well. Additional details of specifying similarity
derivation workflows are left to future work.
This :algorithm graph provides a disclosure of
the algorithm used in the similarity derivation process. In this
case, Mel-frequency cepstral coefficients (MFCCs) are extracted
and Gaussian mixture models are created concurrently for the two
signals, and an earth mover's distance is calculated between
models. This is a method commonly used in audio-based signal
analysis to derive some measure of timbre similarity. Depending
on that distance, we output a similarity statement. If more
details are needed about a particular computational step, e.g.~if
we want to gather more information about the MFCC extraction step,
we can look-up the corresponding web identifier, in this case
sig:mfcc.
Various levels of transparency are achievable using the similarity ontology. This is illustrated in the following figure.

Queries in this similarity ecosystem would be made using the SPARQL query language. The SPARQL specification is a W3C recommendation and the preferred method for querying RDF graphs. As mentioned before, the design of the Similarity Ontology allows for the construction of simple queries to retrieve similarity information. The following query retrieves artists similar to a target artist as stated by a specific trusted method:
PREFIX sim: <http://purl.org/ontology/similarity/>
SELECT ?artists WHERE {
?statement sim:method <http://trusted.method/uri> .
?statement sim:element <http://target.artist/uri> .
?statement sim:element ?artists .
}
Notice we only have to include a triple pattern for our target resource, a triple pattern for our trusted agent, and a triple pattern to select the similar artists. Of course this is a very simple example and in real-world applications we include additional optional patterns and conjunctions for a more expressive query.
In an initialization step, an application could query available data sources to determine exactly what association methods and asserting agents are available. The application would use the following query:
PREFIX sim: <http://purl.org/ontology/similarity/>
SELECT DISTINCT ?method WHERE{
?method a sim:AssociationMethod .
}
The application could then filter through the results and, perhaps with some input from the end-user, decide which similarity agents to trust.
URI: http://purl.org/ontology/similarity/Association
Association - An abstract class to define some association between things. Entities share an association if they are somehow inter-connected. Generally a directed association should have at lease one sim:subject property and one sim:object property or an undirected association should have at least two sim:element properties, however this is not a requirement and intentionally left out of the model.
URI: http://purl.org/ontology/similarity/AssociationMethod
Association Method - A concept for representing the method used to derive association or similarity statements.
URI: http://purl.org/ontology/similarity/Influence
Influence - An abstract class indicating a directed association of influence where the subject entity has influenced the object entity.
URI: http://purl.org/ontology/similarity/Network
Network - A network is a grouping of sim:Associations. The associations that comprise a network are specified using a series of sim:edge predicates.
URI: http://purl.org/ontology/similarity/Similarity
Similarity - An abstract class to define similarity between two or more things. Entities share a similarity if they share some common characteristics of interest. A similarity is a special type of association.
URI: http://purl.org/ontology/similarity/association
association - Binds a sim:Association to an arbitrary thing.
URI: http://purl.org/ontology/similarity/description
description - Specifies some description that discloses the process or set of processes used to derive association statements for the given AssociationMethod. This property is depricated in favor of the more appropriately named sim:workflow property.
URI: http://purl.org/ontology/similarity/distance
distance - A weighting value for an Association where a value of 0 implies two elements are the same individual.
URI: http://purl.org/ontology/similarity/domain
domain - Specifies appropriate object types for the sim:subject predicate for sim:Associations bound to the given sim:AssociationMethod. The presence of this predicate implies the given sim:AssociationMethod begets directed associations.
URI: http://purl.org/ontology/similarity/edge
edge - Specifies an edge in a sim:Network
URI: http://purl.org/ontology/similarity/element
element - Specifies an entity involved in the given sim:Association and implies the given association is undirected.
URI: http://purl.org/ontology/similarity/grounding
grounding - Binds an sim:Association statement directly instantiated N3-Tr formulae or some other workflow graph which enabled the association derivation.
URI: http://purl.org/ontology/similarity/method
method - Specifies the sim:AssociationMethod used to derive a particular Association statement. This should be used when the process for deriving association statements can be described further.
URI: http://purl.org/ontology/similarity/object
object - Specifies the object of a sim:Association implying a directed association where "subject is associated to object" but the reverse association does not necessarily exist, and if it does exist, it is not an equivalent association.
URI: http://purl.org/ontology/similarity/range
domain - Specifies appropriate object types for the sim:object predicate for sim:Associations bound to the given sim:AssociationMethod. The presence of this predicate implies the given sim:AssociationMethod begets directed associations.
URI: http://purl.org/ontology/similarity/scope
domain - Specifies appropriate object types for the sim:element predicate for sim:Associations bound to the given sim:AssociationMethod. The presence of this predicate implies the given sim:AssociationMethod begets undirected associations.
URI: http://purl.org/ontology/similarity/subject
subject - Specifies the subject of an sim:Association implying a directed association where "subject is associated to object" but the reverse association does not necessarily exist, and if it does exist, it is not an equivalent association.
URI: http://purl.org/ontology/similarity/weight
weight - A weighting value bound to a sim:Association where a value of 0 implies two elements are not at all associated and a higher value implies a closer association.
URI: http://purl.org/ontology/similarity/workflow
workflow - Specifies a workflow that discloses the process or set of processes used to derive association statements for the given sim:AssociationMethod
sim:scope, sim:domain, and sim:range to allow type restrictions on sim:Associations by applying the aforementioned predicates to the sim:AssociationMethodsim:method to 1 meaning an association must be bound to no more than one association methodsim:Network concept and the sim:edge property to specify networks of associationssim:workflow property to (eventually) replace the poorly named sim:description propertysim:Association to allow flexible subclassing in external ontologies (by request)sim:association propertysim:asserter,sim:ContextualSimilarity and sim:AcousticSimilarityadded concepts and
properties to enable a named
graph approach for transparent similarity
derivation. Added concepts include
AssociationMethod and added properties
include description and
method.
depricated
AcousticSimilarity,
ContextualSimilarity, and
musim:CompositionalSimilarity in favor of named
graph approaches
added part of a new example to this document involving the named graph approach
added the distance data
property
added
term_status to each element in MuSim as
"testing"
added some more example code to the html description
added has_subject and has_object properties to support directed Associations
removed has_association and association_of in favor of element_of and has_element respectively for the undirected association case because this more intuitively matches the has_subject / has_object paradigm
added Association as the top class
removed similar_to and sub properties altogether to avoid confusion
removed has_similarity and similarity_of and replaced with has_association and association_of respectively
added Influence concept but not sure about placement in hierarchy
Thanks to the Music Ontology specification mailing list especially Yves Raimond and Samer Abdallah for their helpful input. Special thanks to Antoine Zimmermann for pointing out numerous issues and proposing fixes!
This work is part of the OMRAS 2 project supported by EPSRC grant EP/E017614/1 at the Centre for Digital Music Queen Mary University of London.