Professional Documents
Culture Documents
Li Ding, Tim Finin, Anupam Joshi, Rong Pan, Pavan Reddivari, Vishal Doshi, R. Scott Cost, Joel Sachs, Yun Peng Department of Computer Science and Electronic Engineering University of Maryland Baltimore County, Baltimore MD 21250, USA
Presented by
Content
Introduction Semantic Web Document Swoogle Architecture Finding Semantic Web Document Semantic Web Document Metadata Ranking Indexing and Retrieval Current Status Conclusion and Future Work
Swoogle : Introduction
Swoogle is search engine Crawler-based indexing and retrieval system Intended for Semantic web Extract metadata for each deocument Computes relation between document
Ontology based annotation system i.e. SHOE, Ontobroker, WebKB, QuizRDF based on annotation rather than on entire document Ontology repositories i.e. DAMLOntologyLibrary, SEM Web Central do not automatically discover semantic web document Semantics web browser i.e. Ontaria only focus on RDF storing, rather than on metadata
a document in a semantic web language that is online and accessible to web users and software agents...
SWDB
<foaf:Person rdf:about="http://umbc.edu/~finin/finin.rdf#Tim Finin"> <owl:sameAs rdf:resource="http://ebiquity.umbc.edu/person/f oaf/Tim/Finin/foaf.rdf"/> <foaf:name>Tim Finin</foaf:name> <foaf:firstName>Tim</foaf:firstName> <foaf:mbox_sha1sum>9da08e2b4dc670d9254ab4 a4b4d61637fed3b18f</foaf:mbox_sha1sum> <foaf:mbox_sha1sum>49953f47b9c33484a753eaf 14102af56c0148d37</foaf:mbox_sha1sum> </foaf:Person>
Swoogle : Architecture
SWD discovery
SWD Reader
candidate url
Web crawler
web
Swoogle : Architecture (2) SWD Discovery discover potential SWD through web Metadata creation cache snapshot of SWD and generate objective metada of SWD Data analysis build analytical report based on cached SWD and created metadata Interface providing data services to Semantic Web Community
Finding SWD
Google based Crawler
Utilizing Google webservice
Web crawler
Focused Crawler
Give url address user Verify and discover SWD based on Its relation i.e import web
web crawler
Basic Metadata syntactic and semantic features of SWD Relations relation between SWD Analytical Result describe SWD ranking
Language Features properties describing syntactic and semantic data. i.e encoding (xml/rdf), language (owl,daml), owl species (owl-dl,owl-lite) RDF Statistics properties summarizing the node distribution. containing information about statistics of rdf:Class, rdf:Property or individuals and obtain the ontology ratio Ontology annotation properties describing a SWD as an ontology. Swoogle record instance of OWL:Ontology properties. i.e label, comment, versionInfo
ontology-ratio
amount of class
amount of properties
if ontology-ratio = 1, pure SWO if ontology-ratio = 0, pure SWDB if 0 < ontology-ratio < 1, determine a threshold
TM/IN captures term reference bewtween two SWD IM captures ontology import relation i.e. owl:imports, daml:imports EX captures ontology extends relation i.e. rdfs:subClassOf, rdfs:subPropertyOf, PV shows that an ontology is prior version of another i.e. owl:priorVersion
CPV shows that an ontology is prior version and compatible with another i.e. owl:DeprecatedProperty, owl:DeprecatedClass IPV shows that an ontology is prior version and incompatible with another i.e. owl:incompatibleWith
Google introduce Page Rank concept to evaluate relative importance of web documents (probability) Probability calculated based on direct access probability and probability of following one links pointing to it
Google page rank use uniform probability means all web document are treated with same manner In SWD, there are some different way to link the document with different manner. i.e. import, uses-term, extends Different term should be treated with different manner (give different weight) Therefore, Swoogle uses rational random surfing model
weigth(l )
l links ( x , a )
f ( x, a )
a T ( x )
Information Retrieval
Using Traditional Information Retrieval method Work well with SWD document and text document with embedded markup
<?xml version="1.0"?> Here is I describe the rdf:Description syntax : <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:si="http://www.w3schools.com/rdf/"> <rdf:Description rdf:about="http://www.w3schools.com"> <si:title>W3Schools</si:title> <si:author>Jan Egil Refsnes</si:author> </rdf:Description> </rdf:RDF> <rdf:Description rdf:about="http://www.w3schools.com"> <si:title>W3Schools</si:title> <si:author>Jan Egil Refsnes</si:author> </rdf:Description>
Word based Matching Reduce RDF to triple Extract URI from SWD Matched with given word
Indexing
After retrieving some information, each SWD is indexed based on Page Ranking formula
Rank 1 2 3 4 5 URL http://www.w3.org/1999/02/22-rdf-syntax-ns http://www.w3.org/2000/01/rdf-schema http://www.daml.org/2001/03/daml+oil http://www.w3.org/2002/07/owl http://www.w3.org/2000/10/rdftests/rdfcore/testSchema Value 2845.97 2814.21 311.65 192.18 59.82
Current Status
Page 30
Powerful search and indexing systems are needed by Semantic Web developers and researchers to help them find and analyze SWDs Current web search engines such as Google and AlltheWeb do not work well with SWDs, as they are designed to work with natural languages Swoogle runs multiple crawlers to discover SWDs through meta-search and link-following