Amit Mathew
Semantic Web Processes
Analysis of Four Ontologies
WordNet
WordNet is a semi-formal, general
purpose ontology for languages. It is
general purpose because it is designed to accommodate any application that
requires analyzing language and the relationships of words to accomplish such
tasks as natural language processing. It
is an attempt to coax computers to "understand" human language. This understanding is built through forming
relationships between words and not through having the computer understand the
intrinsic meaning of a word. The
ontology is semi-formal because the actual definitions of the words are not
machine-processable.
Instead, the ontology is designed so that it can analyze a variety of
ways in which words are related.
Since WordNet is designed to categorize words by straightforward
relationships, its expressiveness is limited, but adequate. In addition to "is-a" and
"has-a" relationships (hypernyms and
hyponyms, respectively), WordNet offers a few other
linguistics-inspired relationships such as derivationally related forms (for
different senses of a word) and domain (for words whose definitions varies by
domain). WordNet
favors a hierarchical structure, since this is conducive to the classification
of vocabularies, and thus its expressiveness is not complex.
WordNet's success, as with all ontologies, depends on one's
measure of success. In terms of
real-world applications, there seem to be few, but there are several research
applications and papers based on WordNet. Some of these involve multilingual projects
such as the Mimida Project (http://www.gittens.nl/SemanticNetworks.html). Also, substantial research has been done in
the area of disambiguation, where the context and WordNet's
sense classification becomes important.
Important Papers
GO (Gene Ontology)
GO is a
formal, domain-specific ontology, specialized for the field of biology. GO actually refers to three ontologies: the
molecular function ontology, the biological process ontology, and the cellular
component ontology. GO is unique in that
the domain it is describes evolves fairly rapidly, so annotation tools are
necessary to accommodate this aspect.
The GO ontologies are formalized by collaboration, but it is not an
industry-wide standard.
GO's descriptive power is basic and is limited to a
directed acyclic graph organizational structure, where each child can have many
parents. The GO ontologies preserve the
correctness of their structures by mandating that if a child term describes the
gene product, then all its parent terms must also apply to that gene product.
Go seems to
be targeted for the research community, so in that regard, it is quite
successful. GO's
power comes from its large data sets, its data manipulation tools, and it
annotation tools. GO is used in popular
projects such as FlyBase, which is the gene database
for a fruitfly, and the Mouse Genome Database.
Important Papers
CYC
Cyc is an ambitious project, intended to map the whole of
human common sense into an ontology. Cyc aims to create
a basis for other ontologies and knowledge representations, allowing computers
to understand the knowledge that we take for granted. Cyc is a formal, general purpose ontology because it formalizes all
of its data so that it is machine understandable and is applicable to all
fields. Cyc is
different than other projects in that it attempts to successfully create
"intelligent" AI by using a large dataset and a complex framework.
Cyc's descriptive power is very powerful, and it allows flexible
inferencing capabilities through modus ponens, modus tollens (contrapositive) inferencing, universal
and existential qualification, mathematical inferencing
and special purpose inferencing through collection
membership, subsethood, and disjointness
[from http://www.cyc.com/cycdoc/ref/inferencing.html]. The complex inference system is managed
through microtheories, which restrict the search
domain to optimize queries.
Given Cyc's lofty goals and massive research funding, it is
difficult to say that Cyc is a successful
project. Although Cyc
is found in such diverse applications as search engines (Lycos) and natural
language processing, it fails to provide genuine AI. This is because it fails to achieve a genuine
base of machine processable common sense, which, I
believe, stems from the fact that common sense is more complex then simple
entities and their relationships. An
example is given that Cyc can infer that a container
holding a liquid should not be held upside down. But does the ontology correctly map the
reasons? What if it was a solid? What if the solid is glued to the
container? These are simple, everyday
matters that are difficult to express semantically, and I believe is naïve to
attempt to accomplish this, even through a multi-million dollar project. Cyc does make some
grand claims of its AI system, such as it asking if it is human and other such
hubris, but these can be dismissed because the machine doesn't really understand what it is asking.
Important Papers
CIDOC/CRM
CIDOC/CRM is a semi-formal, domain-specific ontology for cultural, historical, and geopolitical entities and their relationships. It is semi-formal because it doesn't attempt to define the concepts it relates, such as people, wars, etc. It is specific to the domain of culture libraries and museums, although it could possibly be extended or merged with other ontologies to provide a historical context.
CIDOC/CRM defines such relationships as "refine," "participate in," "within," and "identity/name." It can make inferences based on similarity and hierarchies, so it is not exceptionally expressive, but it is adequate for the ontology's limited scope. The fuzziness of the semantics make the domain a difficult one to model. Even the term "cultural heritage" is contested, and abstract theories such as politics and cultures have proven to be some of the most difficult to model.
It is hard
to judge the success of CIDOC/CRM because of its
limited scope, but it seems to be making respectable progress. It is on track to receive ISO
standardization, which is part of one of its projects,
Important Papers
Interoperability of Metadata,” AI Magazine - Special Issue on Ontologies, March,
2002.
CIDOC Reference Model,” presented at ICHIM'99,