You are on page 1of 32

Swoogle: A Semantic Web Search and Metadata Engine

Li Ding, Tim Finin, Anupam Joshi, Rong Pan, Pavan Reddivari, Vishal Doshi, R. Scott Cost, Joel Sachs, Yun Peng Department of Computer Science and Electronic Engineering University of Maryland Baltimore County, Baltimore MD 21250, USA

Presented by

Adhitya Bhawiyuga (201183583)

Content

 Introduction  Semantic Web Document  Swoogle Architecture  Finding Semantic Web Document  Semantic Web Document Metadata  Ranking  Indexing and Retrieval  Current Status  Conclusion and Future Work

Swoogle : Introduction

Are you familiar with this?

Introduction : What is Swoogle

Swoogle is search engine Crawler-based indexing and retrieval system Intended for Semantic web Extract metadata for each deocument Computes relation between document

Introduction : Related Work

Ontology based annotation system i.e. SHOE, Ontobroker, WebKB, QuizRDF based on annotation rather than on entire document Ontology repositories i.e. DAMLOntologyLibrary, SEM Web Central do not automatically discover semantic web document Semantics web browser i.e. Ontaria only focus on RDF storing, rather than on metadata

Swoogle : Semantic Web Document

Semantic Web Document : SWD

a document in a semantic web language that is online and accessible to web users and software agents...

SWD : Classification SWD is divided into :


 Semantics Web Ontology (SWO) define significant proportion of statement which makes new term (i.e. class, property)  Semantics Web Database doesn't define or extend significant number of terms or we can say as individuals

in the case of Swoogle, SWD is classified by using a threshold formulation

SWD : Classification Example SWO


<rdfs:Class rdf:about="http://xmlns.com /foaf/0.1/LabelProperty" vs:term_status="unstable"> <rdfs:label>Label Property</rdfs:label> <rdf:type rdf:resource="http://www.w 3.org/2002/07/owl#Class"/> <rdfs:isDefinedBy rdf:resource="http://xmlns.c om/foaf/0.1/"/> </rdfs:Class>

SWDB
<foaf:Person rdf:about="http://umbc.edu/~finin/finin.rdf#Tim Finin"> <owl:sameAs rdf:resource="http://ebiquity.umbc.edu/person/f oaf/Tim/Finin/foaf.rdf"/> <foaf:name>Tim Finin</foaf:name> <foaf:firstName>Tim</foaf:firstName> <foaf:mbox_sha1sum>9da08e2b4dc670d9254ab4 a4b4d61637fed3b18f</foaf:mbox_sha1sum> <foaf:mbox_sha1sum>49953f47b9c33484a753eaf 14102af56c0148d37</foaf:mbox_sha1sum> </foaf:Person>

Swoogle : Architecture

Swoogle : Architecture (1)


Data IR analyzer analysis SWD analyzer interface Web server Metadata creation SWD cache SWD metadata Web service Agent service

SWD discovery

SWD Reader

candidate url

Web crawler

web

Swoogle : Architecture (2)  SWD Discovery discover potential SWD through web  Metadata creation cache snapshot of SWD and generate objective metada of SWD  Data analysis build analytical report based on cached SWD and created metadata  Interface providing data services to Semantic Web Community

Swoogle : Finding SWD

Finding SWD
Google based Crawler
Utilizing Google webservice

Web crawler

Focused Crawler
Give url address user Verify and discover SWD based on Its relation i.e import web

web crawler

Swoogle : SWD Metadata

SWD Metadata : About

Basic Metadata syntactic and semantic features of SWD Relations relation between SWD Analytical Result describe SWD ranking

Basic Metadata (1)

 Language Features properties describing syntactic and semantic data. i.e encoding (xml/rdf), language (owl,daml), owl species (owl-dl,owl-lite)  RDF Statistics properties summarizing the node distribution. containing information about statistics of rdf:Class, rdf:Property or individuals and obtain the ontology ratio  Ontology annotation properties describing a SWD as an ontology. Swoogle record instance of OWL:Ontology properties. i.e label, comment, versionInfo

Basic Metadata : Determining Ontology Ratio

ontology-ratio

amount of class

amount of properties

| C ( foo) |  | P ( foo) | R ( foo) ! | C ( foo) |  | P ( foo) |  I ( foo)


amount of individuals

if ontology-ratio = 1, pure SWO if ontology-ratio = 0, pure SWDB if 0 < ontology-ratio < 1, determine a threshold

Relations Metadata (1) Swoogle captures following SWD relation

TM/IN captures term reference bewtween two SWD IM captures ontology import relation i.e. owl:imports, daml:imports EX captures ontology extends relation i.e. rdfs:subClassOf, rdfs:subPropertyOf, PV shows that an ontology is prior version of another i.e. owl:priorVersion

Relations Metadata (2) Swoogle captures following SWD relation

CPV shows that an ontology is prior version and compatible with another i.e. owl:DeprecatedProperty, owl:DeprecatedClass IPV shows that an ontology is prior version and incompatible with another i.e. owl:incompatibleWith

Swoogle : Ranking SWD

Ranking SWD : Google Page Rank Concept

Google introduce Page Rank concept to evaluate relative importance of web documents (probability) Probability calculated based on direct access probability and probability of following one links pointing to it

Ranking SWD : Swoogle Page Rank Concept (1)

Google page rank use uniform probability means all web document are treated with same manner In SWD, there are some different way to link the document with different manner. i.e. import, uses-term, extends Different term should be treated with different manner (give different weight) Therefore, Swoogle uses rational random surfing model

Ranking SWD : Rational Random Surfing Model


sum all link from x to a random page ranking
f ( x, a ) !

weigth(l )
l links ( x , a )

f ( x, a ) rawPR (a ) ! (1  d )  d rawPR( x) f ( x) x L ( a ) direct


probability sum all outlink f ( x) !

f ( x, a )
a T ( x )

Swoogle : Indexing and Retrieval

Information Retrieval

 Using Traditional Information Retrieval method  Work well with SWD document and text document with embedded markup
<?xml version="1.0"?> Here is I describe the rdf:Description syntax : <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:si="http://www.w3schools.com/rdf/"> <rdf:Description rdf:about="http://www.w3schools.com"> <si:title>W3Schools</si:title> <si:author>Jan Egil Refsnes</si:author> </rdf:Description> </rdf:RDF> <rdf:Description rdf:about="http://www.w3schools.com"> <si:title>W3Schools</si:title> <si:author>Jan Egil Refsnes</si:author> </rdf:Description>

Pure SWD Document

Text with embedded markup

Traditional Information Retrieval


N-Gram based matching Matched sample with URIref Given word Slide n character Find matched sample With probability

Word based Matching Reduce RDF to triple Extract URI from SWD Matched with given word

Indexing

 After retrieving some information, each SWD is indexed based on Page Ranking formula
Rank 1 2 3 4 5 URL http://www.w3.org/1999/02/22-rdf-syntax-ns http://www.w3.org/2000/01/rdf-schema http://www.daml.org/2001/03/daml+oil http://www.w3.org/2002/07/owl http://www.w3.org/2000/10/rdftests/rdfcore/testSchema Value 2845.97 2814.21 311.65 192.18 59.82

Current Status

Page 30

Conclusion and Future Work

 Powerful search and indexing systems are needed by Semantic Web developers and researchers to help them find and analyze SWDs  Current web search engines such as Google and AlltheWeb do not work well with SWDs, as they are designed to work with natural languages  Swoogle runs multiple crawlers to discover SWDs through meta-search and link-following

Thank you Terima kasih

You might also like