Skip to main content.

Public Profiles

Articles

R. Delbru, S. Campinas, G. Tummarello. Searching Web Data: an Entity Retrieval and High-Performance Indexing Model. In Journal of Web Semantics, 2011.
More and more (semi) structured information is becoming available on the Web in the form of documents embedding metadata (e.g., RDF, RDFa, Microformats and others). There are already hundreds of millions of such documents accessible and their number is growing rapidly. This calls for large scale systems providing effective means of searching and retrieving this semi-structured information with the ultimate goal of making it exploitable by humans and machines alike.
This article examines the shift from the traditional web document model to a web data object (entity) model and studies the challenges faced in implementing a scalable and high performance system for searching semi-structured data objects over a large heterogeneous and decentralised infrastructure. Towards this goal, we define an entity retrieval model, develop novel methodologies for supporting this model and show how to achieve a high-performance entity retrieval system. We introduce an indexing methodology for semi-structured data which offers a good compromise between query expressiveness, query processing and index maintenance compared to other approaches. We address high-performance by optimisation of the index data structure using appropriate compression techniques. Finally, we demonstrate that the resulting system can index billions of data objects and provides keyword-based as well as more advanced search interfaces for retrieving relevant data objects in sub-second time.
This work has been part of the Sindice search engine project at the Digital Enterprise Research Institute (DERI), NUI Galway. The Sindice system currently maintains more than 200 million pages downloaded from the Web and is being used actively by many researchers within and outside of DERI.
G. Tummarello, R. Cyganiak, M. Catasta, S. Danielczyk, R. Delbru, S. Decker. Sig.ma : Live views on the Web of Data. In Journal of Web Semantics, 2010.
We present Sig.ma, both a service and an end user application to access the Web of Data as an integrated information space. Sig.ma uses an holistic approach in which large scale semantic web indexing, logic reasoning, data aggregation heuristics, ad-hoc ontology consolidation, external services and responsive user interaction all play together to create rich entity descriptions. These consolidated entity descriptions then form the base for embeddable data mashups, machine oriented services as well as data browsing services. Finally, we discuss Sig.ma's peculiar characteristics and report on lessons learned and ideas it inspires.
E. Oren, R. Delbru, M. Catasta, R. Cyganiak, H. Stenzhorn, and G. Tummarello. Sindice.com: A document-oriented lookup index for open linked data. In International Journal of Metadata, Semantics and Ontologies, 3(1), 2008.
Developers of Semantic Web applications face a challenge with respect to the decentralised publication model: how and where to find statements about encountered resources. The "linked data" approach mandates that resource URIs should be de-referenced to return resource metadata. But for data discovery linkage itself is not enough, and crawling and indexing of data is necessary. Existing Semantic Web search engines are focused on database-like functionality, compromising on index size, query performance and live updates. We present Sindice, a lookup index over resources crawled on the Semantic Web. Our index allows applications to automatically locate documents containing information about a given resource. In addition, we allow resource retrieval through uniquely identifying inverse-functional properties, offer a full-text search and index SPARQL endpoints. Finally we introduce an extension to the sitemap protocol which allows us to efficiently index large Semantic Web datasets with minimal impact on the data providers.
^ TOP

Books

Renaud Delbru. Searching Web Data: an Entity Retrieval Model. Ph.D Thesis at Digital Enterprise Research Institute, National University of Ireland, Galway. September 2010. [slides] [video]
More and more (semi) structured information is becoming available on the Web in the form of documents embedding metadata (e.g., RDF, RDFa, Microformats and others). There are already hundreds of millions of such documents accessible and their number is growing rapidly. This calls for large scale systems providing effective means of searching and retrieving this semi-structured information with the ultimate goal of making it exploitable by humans and machines alike.
This dissertation examines the shift from the traditional web document model to a web data object (entity) model and studies the challenges and issues faced in implementing a scalable and high performance system for searching semi-structured data objects on a large heterogeneous and decentralised infrastructure. Towards this goal, we define an entity retrieval model, develop novel methodologies for supporting this model, and design a web-scale retrieval system around this model. In particular, this dissertation focuses on the following four main aspects of the system: reasoning, ranking, indexing and querying. We introduce a distributed reasoning framework which is tolerant against low data quality. We present a link analysis approach for computing the popularity score of data objects among decentralised data sources. We propose an indexing methodology for semi-structured data which o ers a good compromise between query expressiveness, query processing and index maintenance compared to other approaches. Finally, we develop an index compression technique which increase both the update and query throughput of the system. The resulting system can index billions of data objects and provides keyword-based as well as more advanced search interfaces for retrieving the most relevant data objects.
This work has been part of the Sindice search engine project at the Digital Enterprise Research Institute (DERI), NUI Galway. The Sindice system currently maintains more than 100 million pages downloaded from the Web and is being used actively by many researchers within and outside of DERI. The reasoning, ranking, indexing and querying components of the Sindice search engine is a direct result of this dissertation research.
^ TOP

Book Chapters

A. Polleres, A. Hogan, R. Delbru, and J. Umbrich. RDFS & OWL Reasoning for Linked Data. Chapter in Lecture Notes for the Reasoning Web Summer School. Springer, July 2013 (to appear).
Linked Data promises that a large portion of Web Data will be usable as one big interlinked RDF database against which structured queries can be answered. In this lecture we will show how reasoning -- using RDF Schema (RDFS) and the Web Ontology Language (OWL) -- can help to obtain more complete answers for such queries over Linked Data. We first look at the extent to which RDFS and OWL features are being adopted on the Web. We then introduce two high-level architectures for query answering over Linked Data and outline how these can be enriched by (lightweight) RDFS and OWL reasoning, enumerating the main challenges faced and discussing reasoning methods that make practical and theoretical trade-offs to address these challenges. In the end, we also ask whether or not RDFS and OWL are enough and discuss numeric reasoning methods that are beyond the scope of these standards but that are often important when integrating Linked Data from several, heterogeneous sources.
M. Catasta, R. Delbru, N. Toupikov and G. Tummarello. Managing Terabytes of Web Semantics Data. Invited paper in R. De Virgilio, F. Giunchiglia, and L. Tanca, editors, Semantic Web Information Management: A Model-Based Perspective. Springer, 2009.
A large amount of semi structured data is now made available on the Web in form of RDF, RDFa and Microformats. In this chapter we discuss a general model for the Web of Data and, based on our experience in Sindice.com, we discuss how this is reflected in the architecture and components of a large scale infrastructure. Aspects such as data collection, processing, indexing, ranking are touched and we give an ample example of an applications built on top of said infrastructure.
R. Delbru, N. Toupikov, M. Catasta, R. Fuller and G. Tummarello. SIREn: Efficient Search on Semi-Structured Documents. In Lucene in Action 2nd Edition (In Action series). Manning Publications Co., 2009.
While the specifications for RDF (Resource Description Framework) and Microformats have been out for quite some time now, it is only in the last few years that many web sites have begun to make use of them, thus effectively starting a "Web of Data" or as some refer to it a "Web 3.0". Sites such as LinkedIn, Eventfull, Digg, LastFM and others are using these specifications to share pieces of information that can be automatically reused by other web sites or by smart clients.
Traditionally, querying graph structured data (RDF) has been done using ad-hoc solutions, called Triplestores, typically based on DBMS backends. In Sindice we needed something much more scalable than DBMS and with the desirable features of the typical Web Search engines: top-k query processing, real time updates, full text search, distributed indexes over shards, etc. While Lucene has long offered these capabilities, we will see that its native capabilities are not intended for large semi-structured document collections with very different schemata. For this reason we developed SIREn (Semantic Information Retrieval Engine), a Lucene extension to overcome these shortcomings and efficiently index and query RDF, as well as any textual document with an arbitrary number of metadata fields.
^ TOP

Conference papers

S. Campinas, R. Delbru, G. Tummarello. Effective Retrieval Model for Entity with Multi-Valued Attributes: BM25MF and Beyond. In Proceedings of the 18th International Conference on Knowledge Engineering and Knowledge Management (EKAW). 2012.
The task of entity retrieval becomes increasingly prevalent as more and more structured information about entities is available on the Web in various forms such as documents embedding metadata (RDF, RDFa, Microdata, Microformats). International benchmarking campaigns, e.g., the Text REtrieval Conference or the Semantic Search Challenge, propose entity-oriented search tracks. This reflects the need for an effective search and discovery of entities. In this work, we present a multi-valued attributes model for entity retrieval which extends and generalises existing field-based ranking models. Our model introduces the concept of multi-valued attributes and enables attribute and value-specific normalization and weighting. Based on this model we extend two state-of-the-art field-based rankings, i.e., BM25F and PL2F, and demonstrate based on evaluations over heterogeneous datasets that this model improves significantly the retrieval performance compared to existing models. Finally, we introduce query dependent and independent weights specifically designed for our model which provide significant performance improvement.
R. Delbru, G. Tummarello, A. Polleres. Context-Dependent OWL Reasoning in Sindice - Experiences and Lessons Learnt. In Proceedings of the 5th International Conference on Web Reasoning and Rule Systems (RR). 2011.
The Sindice Semantic Web index provides search capabilities over 260 million documents. Reasoning over web data enables to make explicit what would otherwise be implicit knowledge: it adds value to the information and enables Sindice to ultimately be more competitive in terms of precision and recall. However, due to the scale and heterogeneity of web data, a reasoning engine for the Sindice system must (1) scale out through parallelisation over a cluster of machines; and (2) cope with unexpected data usage. In this paper, we report our experiences and lessons learned in building a large scale reasoning engine for Sindice. The reasoning approach has been deployed, used and improved since 2008 within Sindice and has enabled Sindice to reason over billions of triples.
L. Dragan, R. Delbru, T. Groza, S. Handschuh, S. Decker. Linking Semantic Desktop Data to the Web of Data. In Proceedings of the 10th International Semantic Web Conference (ISWC). 2011.
The goal of the Semantic Desktop is to enable better organization of the personal information on our computers, by applying semantic technologies on the desktop. However, information on our desktop is often incomplete, as it is based on our subjective view, or limited knowledge about an entity. On the other hand, the Web of Data contains information about virtually everything, generally from multiple sources. Connecting the desktop to the Web of Data would thus enrich and complement desktop information. Bringing in information from the Web of Data automatically would take the burden of searching for information off the user. In addition, connecting the two networks of data opens up the possibility of advanced personal services on the desktop. Our solution tackles the problems raised above by using a semantic search engine for the Web of Data, such as Sindice, to find and retrieve a relevant subset of entities from the web. We present a matching framework, using a combination of configurable heuristics and rules to compare data graphs, that achieves a high degree of precision in the linking decision. We evaluate our methodology with real-world data; create a gold standard from relevance judgements by experts, and we measure the performance of our system against it. We show that it is possible to automatically link desktop data with web data in an effective way.
S. Campinas, R. Delbru, G. Tummarello. SkipBlock: Self-Indexing for Block-Based Inverted List. In Proceedings of the 33rd European Conference on Information Retrieval (ECIR). 2011.
In large web search engines the performance of Information Retrieval systems is a key issue. Block-based compression methods are often used to improve the search performance, but current self-indexing techniques are not adapted to such data structure and provide suboptimal performance. In this paper, we present SkipBlock, a self-indexing model for block-based inverted lists. Based on a cost model, we show that it is possible to achieve significant improvements on both search performance and structure's space storage.
R. Delbru, N. Toupikov, M. Catasta, G. Tummarello. A Node Indexing Scheme for Web Entity Retrieval. In Proceedings of the 7th Extended Semantic Web Conference (ESWC). 2010. [slides]
Now motivated also by the partial support of major search engines, hundreds of millions of documents are being published on the web embedding semi-structured data in RDF, RDFa and Microformats. This scenario calls for novel information search systems which provide effective means of retrieving relevant semi-structured information. In this paper, we present an "entity retrieval system" designed to provide entity search capabilities over datasets as large as the entire Web of Data. Our system supports full-text search, semi-structural queries and top-k query results while exhibiting a concise index and efficient incremental updates. We advocate the use of a node indexing scheme and show that it offers a good compromise between query expressiveness, query processing time and update complexity in comparison to three other indexing techniques. We then demonstrate how such system can effectively answer queries over 10 billion triples on a single commodity machine.
R. Delbru, N. Toupikov, M. Catasta, G. Tummarello, S. Decker. Hierarchical Link Analysis for Ranking Web Data. In Proceedings of the 7th Extended Semantic Web Conference (ESWC). 2010. [slides]
On the Web of Data, entities are often interconnected in a way similar to web documents. Previous works have shown how PageRank can be adapted to achieve entity ranking. In this paper, we propose to exploit locality on the Web of Data by taking a layered approach, similar to hierarchical PageRank approaches. We provide justifications for a two-layer model of the Web of Data, and introduce DING (Dataset Ranking) a novel ranking methodology based on this two-layer model. DING uses links between datasets to compute dataset ranks and combines the resulting values with semantic-dependent entity ranking strategies. We quantify the effectiveness of the approach with other link-based algorithms on large datasets coming from the Sindice search engine. The evaluation which includes a user study indicates that the resulting rank is better than the other approaches. Also, the resulting algorithm is shown to have desirable computational properties such as parallelisation.
S. Corlosquet, R. Delbru, T. Clark, A. Polleres and S. Decker. Produce and Consume Linked Data with Drupal!. In Proceedings of the 8th International Semantic Web Conference (ISWC). 2009.
Currently a large number of Web sites are driven by Content Management Systems (CMS) which manage textual and multimedia content but also - inherently - carry valuable information about a site's structure and content model. Exposing this structured information to the Web of Data has so far required considerable expertise in RDF and OWL modelling and additional programming effort. In this paper we tackle one of the most popular CMS: Drupal. We enable site administrators to export their site content model and data to the Web of Data without requiring extensive knowledge on Semantic Web technologies. Our modules create RDFa annotations and --- optionally --- a SPARQL endpoint for any Drupal site out of the box. Likewise, we add the means to map the site data to existing ontologies on the Web with a search interface to find commonly used ontology terms. We also allow a Drupal site administrator to include existing RDF data from remote SPARQL endpoints on the Web in the site. When brought together, these features allow networked RDF Drupal sites that reuse and enrich Linked Data. We finally discuss the adoption of our modules and report on a use case in the biomedical field and the current status of its deployment.
X. Bai, R. Delbru and G. Tummarello. RDF Snippets for Semantic Web Search Engines. In Proceedings of the International Conference on Ontologies, Databases and Applications of Semantics (ODBASE). 2008.
There has been interest in ranking the resources and generating corresponding expressive descriptions from the Semantic Web recently. This paper proposes an approach for automatically generating snippets from RDF documents and assisting users in better understanding the content of RDF documents return by Semantic Web search engines. An heuristic method for discovering topics, based on the occurrences of RDF nodes and the URIs of original RDF documents, is presented and experimented in this paper. In order to make the snippets more understandable, two strategies are proposed and used for ranking the topic-related statements and the query-related statements respectively. Finally, the conclusion is drawn based on the discussion about the performances of our topic discovery and the whole snippet generation approaches on a test dataset provided by Sindice.
R. Cyganiak, H. Stenzhorn, R. Delbru, S. Decker and G. Tummarello. Semantic Sitemaps: Efficient and Flexible Access to Datasets on the Semantic Web. In Proceedings of the Proceedings of the 5th European Semantic Web Conference (ESWC). 2008.
Increasing amounts of RDF data are available on the Web for consumption by Semantic Web browsers and indexing by Semantic Web search engines. Current Semantic Web publishing practices, however, do not directly support efficient discovery and high-performance retrieval by clients and search engines. We propose an extension to the Sitemaps protocol which provides a simple and effective solution: Data publishers create Semantic Sitemaps to announce and describe their data so that clients can choose the most appropriate access method. We show how this protocol enables an extended notion of authoritative information across different access methods.
G. Tummarello, R. Delbru and E. Oren. Sindice.com: Weaving the open linked data. In Proceedings of the 6th International Semantic Web Conference (ISWC). 2007.
Developers of Semantic Web applications face a challenge with respect to the decentralised publication model: where to find statements about encountered resources. The "linked data" approach, which mandates that resource URIs should be de-referenced and yield metadata about the resource, helps but is only a partial solution. We present Sindice, a lookup index over resources crawled on the Semantic Web. Our index allows applications to automatically retrieve sources with information about a given resource. In addition we allow resource retrieval through inverse-functional properties, offer full-text search and index SPARQL endpoints.
E. Oren, R. Delbru, S. Gerke, A. Haller and S. Decker. ActiveRDF: Object-oriented semantic web programming. In Proceedings of the 16th International World-Wide Web Conference (WWW). May 2007.
Object-oriented programming is the current mainstream programming paradigm but existing RDF APIs are mostly triple-oriented. Traditional techniques for bridging a similar gap between relational databases and object-oriented programs cannot be applied directly, given the different nature of Semantic Web data, as can for example be seen in the semantics of class membership, inheritance relations, and object conformance to schemas. We present ActiveRDF, an object-oriented API for managing RDF data that offers full manipulation and querying of RDF data, does not rely on a schema and fully conforms to RDF(S) semantics. ActiveRDF can be used with different RDF data stores, adapters have been implemented to generic SPARQL endpoints, Sesame, Jena, Redland and YARS and new adapters can be added easily. In addition, integration with the popular Ruby on Rails framework enables fast development of Semantic Web applications.
E. Oren, R. Delbru, and S. Decker. Extending faceted navigation for RDF data. In Proceedings of the 5th International Semantic Web Conference (ISWC). November 2006.
Data on the Semantic Web is semi-structured and does not follow one fixed schema. Faceted browsing is a natural technique for navigating such data, partitioning the information space into orthogonal conceptual dimensions. Current faceted interfaces are manually constructed and have limited query expressiveness. We develop an expressive faceted interface for semi-structured data and formally show the improvement over existing interfaces. Secondly, we develop metrics for automatic ranking of facet quality, bypassing the need for manual construction of the interface. We develop a prototype for faceted navigation of arbitrary RDF data. Experimental evaluation shows improved usability over current interfaces.
^ TOP

Workshop papers

S. Campinas, T. E. Perry, D. Ceccarelli, R. Delbru and G. Tummarello. Introducing RDF Graph Summary With Application to Assisted SPARQL Formulation. In Proceedings of the the 23rd International Workshop on Database and Expert Systems Applications (DEXA). Vienna, 2012.
One of the reasons for the slow adoption of SPARQL is the complexity in query formulation due to data diversity. The principal barrier a user faces when trying to formulate a query is that he generally has no information about the underlying structure and vocabulary of the data. In this paper, we address this problem at the maximum scale we can think of: providing assistance in formulating SPARQL queries over the entire Sindice data collection - 15 billion triples and counting coming from more than 300K datasets. We present a method to help users in formulating complex SPARQL queries across multiple heterogeneous data sources. Even if the structure and vocabulary of the data sources are unknown to the user, the user is able to quickly and easily formulate his queries. Our method is based on a summary of the data graph and assists the user during an interactive query formulation by recommending possible structural query elements.
N. Toupikov, J. Umbrich, R. Delbru, M. Hausenblas and G. Tummarello. DING! Dataset Ranking using Formal Descriptions. In Proceedings of the WWW-2009 Workshop on Linked Data on the Web (LDOW-2009). Madrid, Spain, 2009.
Considering that thousands if not millions of linked datasets will be published soon, we motivate in this paper the need for an efficient and effective way to rank interlinked datasets based on formal descriptions of their characteristics. We propose DING (from Dataset RankING) as a new approach to rank linked datasets using information provided by the voiD vocabulary. DING is a domain-independent link analysis that measures the popularity of datasets by considering the cardinality and types of the relationships. We propose also a methodology to automatically assign weights to link types. We evaluate the proposed ranking algorithm against other well known ones, such as PageRank or HITS, using synthetic voiD descriptions. Early results show that DING performs better than the standard Web ranking algorithms.
R. Delbru, A. Polleres, G. Tummarello and S. Decker. Context Dependent Reasoning for Semantic Documents in Sindice. In Proceedings of the 4th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS). Karlsruhe, Germany, 2008. [slides]
The Sindice Semantic Web index provides search capabilities over today more than 30 million documents. A scalable reasoning mechanism for real-world web data is important in order to increase the precision and recall of the Sindice index by inferring useful information (e.g. RDF Schema features, equality, property characteristic such as inverse functional properties or annotation properties from OWL). In this paper, we introduce our notion of context dependent reasoning for RDF documents published on the Web according to the linked data principle. We then illustrate an efficient methodology to perform context dependent RDFS and partial OWL inference based on a persistent TBox composed of a network of web ontologies. Finally we report preliminary evaluation results of our implementation underlying the Sindice web data index.
G. Tummarello and R. Delbru. Entity Coreference Resolution Services in Sindice.com: Identification on the current Web of Data. In Proceedings of the 1st international workshop on Identity and Reference on the Semantic Web (IRSW). Tenerife, Spain. 2008.
A. Harth, A. Hogan, R. Delbru, J. Umbrich, S. O'Riain and S. Decker. SWSE: Answers Before Links!. In Proceedings of the Semantic Web Challenge (ISWC). Busan, Korea. 2007.
We present a system that improves on current document-centric Web search engine technology; adopting an entity-centric perspective, we are able to integrate data from both static and live sources into a coherent, interlinked information space. Users can then search and navigate the integrated information space through relationships, both existing and newly materialised, for improved knowledge discovery and understanding.
E. Oren and R. Delbru. ActiveRDF: Object-oriented RDF in Ruby. In Proceedings of the European Semantic Web Conference Workshop on Scripting for the Semantic Web (ESWC). Budva, Montenegro. June 2006.
Although most developers are object-oriented, programming RDF is triple-oriented. Bridging this gap, by developing a truly object-oriented API that uses domain terminology, is not straightforward, because of the dynamic and semi-structured nature of RDF and the open-world semantics of RDF Schema. We present ActiveRDF, our object-oriented library for accessing RDF data. ActiveRDF is completely dynamic, offers full manipulation and querying of RDF data, does not rely on a schema and can be used against different data-stores. In addition, the integration with the popular Rails framework enables very easy development of Semantic Web applications.
E. Oren and R. Delbru. A prototype for faceted browsing of RDF data. In Proceedings of the Workshop on Scripting for Semantic Web (ESWC). Budva, Montenegro. June 2006.
E. Oren, R. Delbru, K. Möller, M. Völkel and S. Handschuh. Annotation and navigation in semantic wikis. In Proceedings of the European Semantic Web Conference Workshop on Semantic Wikis. Budva, Montenegro. June 2006.
Semantic Wikis allow users to semantically annotate their Wiki content. The particular annotations can differ in expressive power, simplicity, and meaning. We present an elaborate conceptual model for semantic annotations, introduce a unique and rich Wiki syntax for these annotations, and discuss how to best formally represent the augmented Wiki content. We improve existing navigation techniques to automatically construct faceted browsing for semistructured data. By utilising the Wiki annotations we provide greatly enhanced information retrieval. Further we report on our ongoing development of these techniques in our prototype SemperWiki.
^ TOP

Symposium

Renaud Delbru. SIREn: Entity Retrieval System for the Web of Data. In Proceedings of the 3rd Symposium on Future Directions in Information Access (FDIA). University of Padua, Italy. September 2009.
We present ongoing work on the Semantic Information Retrieval Engine (SIREn), an "entity retrieval system" specifically designed to meet the requirements of indexing and searching a large amount of semi-structured data, e.g. the entire Web of Data. SIREn supports efficient full text search with semi-structural queries and exhibits a concise index, constant time updates and inherits Information Retrieval features such as top-k queries, efficient caching and scalability via distribution over shards. We demonstrate how SIREn can effectively answer queries over 10 billion triples on single commodity machine. The prototype is currently in use in the Sindice search engine which index at the present time more than 50 million harvested documents containing semi-structured data.
Renaud Delbru. Methodology for Searching Entities on the Web. In Proceedings of the European Semantic Web Conference Ph.D Symposium (ESWC). Tenerife, Spain. June 2008.
^ TOP

Reports

R. Delbru, S. Campinas, K. Samp, G. Tummarello. Adaptive Frame Of Reference for Compressing Inverted Lists, DERI Technical Report 2010-12-16. December 2010.
The performance of Information Retrieval systems is a key issue in large web search engines. The use of inverted indexes and compression techniques is partially accountable for the current performance achievement of web search engines. In this paper, we introduce a new class of compression techniques for inverted indexes, the Adaptive Frame of Reference, that provides fast query response time, good compression ratio and also fast indexing time. We compare our approach against a number of state-of-the-art compression techniques for inverted index based on three factors: compression ratio, indexing and query processing performance. We show that significant performance improvements can be achieved.
Renaud Delbru. Manipulation and Exploration of Semantic Web Knowledge, Internship Report DERI and EPITA France. Jan—Jul 2006
La description des ressources web par des méta-données compréhensibles par les machines est l'un des fondements du Web Sémantique. Resource Description Framework (RDF) est le language pour décrire et échanger les connaissances du Web Sémantique. Comme ces données deviennent de plus en plus courantes, les techniques permettant de manipuler et d'explorer ces informations deviennent nécessaires.
Cependant, la manipulation des données RDF est orientée "triple". Ce type de représentation est moins intuitif et plus difficile à prendre en main que l'approche orientée objet. Notre objectif était donc de réconcilier les deux paradigmes en développant une interface de programmation (API) permettant d'exposer les données RDF sous forme d'objet. ActiveRDF est une API dynamique de haut niveau qui abstrait l'accès à différents types de base de données RDF. Cette interface propose un accès aux données RDF sous la forme d'objets en utilisant la terminologie du domaine.
Afin de pouvoir naviguer à travers les données RDF et pour chercher une information, nous proposons Faceteer, une technique de navigation par facettes pour données semi-structurées. Cette technique étend les possibilités de navigation par rapport aux techniques existantes. Elle permet de construire visuellement et facilement des requêtes très complexes. L'interface de navigation est générée automatiquement pour des données RDF arbitraires. Un ensemble de mesures nous permet d'ordonner les facettes du navigateur afin d'améliorer la navigabilité.
Les résultats de nos recherches sur ActiveRDF et Faceteer permettent un gain de temps substantiel dans la manipulation et l'exploration des données RDF pour les utilisateurs du Web Sémantique.
^ TOP