Amit Mathew

Semantic Web Processes

2/4/04

Analysis of Four Ontologies

 

WordNet

 

            WordNet is a semi-formal, general purpose ontology for languages.  It is general purpose because it is designed to accommodate any application that requires analyzing language and the relationships of words to accomplish such tasks as natural language processing.  It is an attempt to coax computers to "understand" human language.  This understanding is built through forming relationships between words and not through having the computer understand the intrinsic meaning of a word.  The ontology is semi-formal because the actual definitions of the words are not machine-processable.  Instead, the ontology is designed so that it can analyze a variety of ways in which words are related.

 

            Since WordNet is designed to categorize words by straightforward relationships, its expressiveness is limited, but adequate.  In addition to "is-a" and "has-a" relationships (hypernyms and hyponyms, respectively), WordNet offers a few other linguistics-inspired relationships such as derivationally related forms (for different senses of a word) and domain (for words whose definitions varies by domain).  WordNet favors a hierarchical structure, since this is conducive to the classification of vocabularies, and thus its expressiveness is not complex.

 

            WordNet's success, as with all ontologies, depends on one's measure of success.  In terms of real-world applications, there seem to be few, but there are several research applications and papers based on WordNet.  Some of these involve multilingual projects such as the Mimida Project (http://www.gittens.nl/SemanticNetworks.html).  Also, substantial research has been done in the area of disambiguation, where the context and WordNet's sense classification becomes important.

 

Important Papers

 

GO (Gene Ontology)

 

            GO is a formal, domain-specific ontology, specialized for the field of biology.  GO actually refers to three ontologies: the molecular function ontology, the biological process ontology, and the cellular component ontology.  GO is unique in that the domain it is describes evolves fairly rapidly, so annotation tools are necessary to accommodate this aspect.  The GO ontologies are formalized by collaboration, but it is not an industry-wide standard.

 

            GO's descriptive power is basic and is limited to a directed acyclic graph organizational structure, where each child can have many parents.  The GO ontologies preserve the correctness of their structures by mandating that if a child term describes the gene product, then all its parent terms must also apply to that gene product.

 

            Go seems to be targeted for the research community, so in that regard, it is quite successful.  GO's power comes from its large data sets, its data manipulation tools, and it annotation tools.  GO is used in popular projects such as FlyBase, which is the gene database for a fruitfly, and the Mouse Genome Database.

 

Important Papers

  • The Gene Ontology Consortium. 2000. Gene Ontology: tool for the unification of biology. Nat Genet 25: 25-29. [PubMed] [PDF]
  • The Gene Ontology Consortium. 2001. Creating the gene ontology resource: design and implementation. Genome Res 11: 1425-1433. [ABSTRACT] [FULL TEXT]
  • The Gene Ontology Consortium. 2004. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 32: D258-D261. [ABSTRACT] [FULL TEXT]

 

CYC

 

            Cyc is an ambitious project, intended to map the whole of human common sense into an ontology.  Cyc aims to create a basis for other ontologies and knowledge representations, allowing computers to understand the knowledge that we take for granted.  Cyc is a formal, general purpose ontology because it formalizes all of its data so that it is machine understandable and is applicable to all fields.  Cyc is different than other projects in that it attempts to successfully create "intelligent" AI by using a large dataset and a complex framework.

 

            Cyc's descriptive power is very powerful, and it allows flexible inferencing capabilities through modus ponens, modus tollens (contrapositive) inferencing, universal and existential qualification, mathematical inferencing and special purpose inferencing through collection membership, subsethood, and disjointness [from http://www.cyc.com/cycdoc/ref/inferencing.html].  The complex inference system is managed through microtheories, which restrict the search domain to optimize queries.

 

            Given Cyc's lofty goals and massive research funding, it is difficult to say that Cyc is a successful project.  Although Cyc is found in such diverse applications as search engines (Lycos) and natural language processing, it fails to provide genuine AI.  This is because it fails to achieve a genuine base of machine processable common sense, which, I believe, stems from the fact that common sense is more complex then simple entities and their relationships.  An example is given that Cyc can infer that a container holding a liquid should not be held upside down.  But does the ontology correctly map the reasons?  What if it was a solid?  What if the solid is glued to the container?  These are simple, everyday matters that are difficult to express semantically, and I believe is naïve to attempt to accomplish this, even through a multi-million dollar project.  Cyc does make some grand claims of its AI system, such as it asking if it is human and other such hubris, but these can be dismissed because the machine doesn't really understand what it is asking.

 

Important Papers

  • Guha, R. V., D. B. Lenat, K. Pittman, D. Pratt, and M. Shepherd. "Cyc: A Midterm Report." Communications of the ACM 33 , no. 8 (August 1990).
  • Guha, R. V. and D. B. Lenat. "Cyc: A Midterm Report." AI Magazine (Fall 1990).
  • Lenat, D. B. "Cyc: A Large-Scale Investment in Knowledge Infrastructure."

 

 

CIDOC/CRM

 

            CIDOC/CRM is a semi-formal, domain-specific ontology for cultural, historical, and geopolitical entities and their relationships.  It is semi-formal because it doesn't attempt to define the concepts it relates, such as people, wars, etc.  It is specific to the domain of culture libraries and museums, although it could possibly be extended or merged with other ontologies to provide a historical context. 

 

            CIDOC/CRM defines such relationships as "refine," "participate in," "within," and "identity/name."  It can make inferences based on similarity and hierarchies, so it is not exceptionally expressive, but it is adequate for the ontology's limited scope.  The fuzziness of the semantics make the domain a difficult one to model.  Even the term "cultural heritage" is contested, and abstract theories such as politics and cultures have proven to be some of the most difficult to model.

 

            It is hard to judge the success of CIDOC/CRM because of its limited scope, but it seems to be making respectable progress.  It is on track to receive ISO standardization, which is part of one of its projects, CHIOS (http://cidoc.ics.forth.gr/chios_iso.html).

 

Important Papers

 

  • M. Doerr, “The CIDOC CRM - an Ontological Approach to Semantic

Interoperability of Metadata,” AI Magazine - Special Issue on Ontologies, March,

2002.

  • M. Doerr and N. Crofts, “Electronic Esperanto: The Role of the Object-Oriented

CIDOC Reference Model,” presented at ICHIM'99, Washington, DC, 1999.