Public Profiles
Articles
-
R. Delbru,
S. Campinas, G. Tummarello.
Searching Web Data: an Entity Retrieval and High-Performance Indexing Model.
In Journal of Web Semantics,
2011.
- More and more (semi) structured information is
becoming available on the Web in the form of documents embedding
metadata (e.g., RDF, RDFa, Microformats and others). There are already
hundreds of millions of such documents accessible and their number is
growing rapidly. This calls for large scale systems providing effective
means of searching and retrieving this semi-structured information with
the ultimate goal of making it exploitable by humans and machines alike.
This article examines the shift from the traditional web document model
to a web data object (entity) model and studies the challenges faced in
implementing a scalable and high performance system for searching
semi-structured data objects over a large heterogeneous and
decentralised infrastructure. Towards this goal, we define an entity
retrieval model, develop novel methodologies for supporting this model
and show how to achieve a high-performance entity retrieval system. We
introduce an indexing methodology for semi-structured data which offers a
good compromise between query expressiveness, query processing and index
maintenance compared to other approaches. We address high-performance by
optimisation of the index data structure using appropriate compression
techniques. Finally, we demonstrate that the resulting system can index
billions of data objects and provides keyword-based as well as more
advanced search interfaces for retrieving relevant data objects in
sub-second time.
This work has been part of the Sindice search engine project at the
Digital Enterprise Research Institute (DERI), NUI Galway. The Sindice
system currently maintains more than 200 million pages downloaded from
the Web and is being used actively by many researchers within and
outside of DERI.
-
G. Tummarello, R. Cyganiak, M. Catasta, S. Danielczyk,
R. Delbru,
S. Decker.
Sig.ma : Live views on the Web of Data.
In Journal of Web Semantics,
2010.
- We present Sig.ma, both a service and an end
user application to access the Web of Data as an integrated information
space. Sig.ma uses an holistic approach in which large scale semantic
web indexing, logic reasoning, data aggregation heuristics, ad-hoc
ontology consolidation, external services and responsive user
interaction all play together to create rich entity descriptions. These
consolidated entity descriptions then form the base for embeddable data
mashups, machine oriented services as well as data browsing services.
Finally, we discuss Sig.ma's peculiar characteristics and report on
lessons learned and ideas it inspires.
- E. Oren,
R. Delbru,
M. Catasta, R. Cyganiak, H. Stenzhorn, and G. Tummarello.
Sindice.com: A document-oriented lookup index for open linked data.
In International Journal of Metadata, Semantics and
Ontologies, 3(1), 2008.
- Developers of Semantic Web applications face a challenge with respect
to the decentralised publication model: how and where to find statements
about encountered resources. The "linked data" approach mandates that
resource URIs should be de-referenced to return resource metadata. But
for data discovery linkage itself is not enough, and crawling and indexing
of data is necessary. Existing Semantic Web search engines are focused on
database-like functionality, compromising on index size, query performance
and live updates. We present Sindice, a lookup index over resources crawled
on the Semantic Web. Our index allows applications to automatically locate
documents containing information about a given resource. In addition, we
allow resource retrieval through uniquely identifying inverse-functional
properties, offer a full-text search and index SPARQL endpoints. Finally
we introduce an extension to the sitemap protocol which allows us to
efficiently index large Semantic Web datasets with minimal impact on the
data providers.
^ TOP
Books
- Renaud Delbru.
Searching Web Data: an Entity Retrieval Model. Ph.D Thesis at
Digital Enterprise Research Institute,
National University of Ireland, Galway.
September 2010.
[slides]
[video]
-
More and more (semi) structured information is becoming available
on the Web in the form of documents embedding metadata (e.g., RDF,
RDFa, Microformats and others). There are already hundreds of millions
of such documents accessible and their number is growing rapidly. This
calls for large scale systems providing effective means of searching and
retrieving this semi-structured information with the ultimate goal of
making it exploitable by humans and machines alike.
This dissertation examines the shift from the traditional web document
model to a web data object (entity) model and studies the
challenges and issues faced in implementing a scalable and high performance
system for searching semi-structured data objects on a large
heterogeneous and decentralised infrastructure. Towards this goal, we
define an entity retrieval model, develop novel methodologies for supporting
this model, and design a web-scale retrieval system around this
model. In particular, this dissertation focuses on the following four main
aspects of the system: reasoning, ranking, indexing and querying. We
introduce a distributed reasoning framework which is tolerant against
low data quality. We present a link analysis approach for computing the
popularity score of data objects among decentralised data sources. We
propose an indexing methodology for semi-structured data which oers
a good compromise between query expressiveness, query processing and
index maintenance compared to other approaches. Finally, we develop
an index compression technique which increase both the update and
query throughput of the system. The resulting system can index billions
of data objects and provides keyword-based as well as more advanced
search interfaces for retrieving the most relevant data objects.
This work has been part of the Sindice search engine project at the
Digital Enterprise Research Institute (DERI), NUI Galway. The Sindice
system currently maintains more than 100 million pages downloaded
from the Web and is being used actively by many researchers within and
outside of DERI. The reasoning, ranking, indexing and querying components
of the Sindice search engine is a direct result of this dissertation
research.
^ TOP
Book Chapters
- A. Polleres, A. Hogan,
R. Delbru,
and J. Umbrich.
RDFS & OWL Reasoning for Linked Data.
Chapter in Lecture Notes for the Reasoning Web Summer School.
Springer, July 2013 (to appear).
- Linked Data promises that a large portion of Web Data will be usable
as one big interlinked RDF database against which structured queries
can be answered. In this lecture we will show how reasoning -- using
RDF Schema (RDFS) and the Web Ontology Language (OWL) -- can help to
obtain more complete answers for such queries over Linked Data. We
first look at the extent to which RDFS and OWL features are being
adopted on the Web. We then introduce two high-level architectures for
query answering over Linked Data and outline how these can be enriched
by (lightweight) RDFS and OWL reasoning, enumerating the main
challenges faced and discussing reasoning methods that make practical
and theoretical trade-offs to address these challenges. In the end, we
also ask whether or not RDFS and OWL are enough and discuss numeric
reasoning methods that are beyond the scope of these standards but
that are often important when integrating Linked Data from several,
heterogeneous sources.
- M. Catasta, R. Delbru,
N. Toupikov and G. Tummarello.
Managing Terabytes of Web Semantics Data.
Invited paper in R. De Virgilio, F. Giunchiglia, and L. Tanca, editors,
Semantic Web Information Management: A Model-Based Perspective.
Springer, 2009.
- A large amount of semi structured data is now made available on the Web
in form of RDF, RDFa and Microformats. In this chapter we discuss a general model
for the Web of Data and, based on our experience in Sindice.com, we discuss how
this is reflected in the architecture and components of a large scale infrastructure.
Aspects such as data collection, processing, indexing, ranking are touched and we
give an ample example of an applications built on top of said infrastructure.
- R. Delbru,
N. Toupikov, M. Catasta, R. Fuller and G. Tummarello.
SIREn: Efficient Search on Semi-Structured Documents.
In Lucene in Action 2nd Edition (In Action series).
Manning Publications Co., 2009.
- While the specifications for RDF (Resource Description Framework) and Microformats
have been out for quite some time now, it is only in the last few years that many web sites
have begun to make use of them, thus effectively starting a "Web of Data" or as some refer
to it a "Web 3.0". Sites such as LinkedIn, Eventfull, Digg, LastFM and others are using these
specifications to share pieces of information that can be automatically reused by other web
sites or by smart clients.
Traditionally, querying graph structured data (RDF) has been done using ad-hoc
solutions, called Triplestores, typically based on DBMS backends. In Sindice we needed
something much more scalable than DBMS and with the desirable features of the typical Web
Search engines: top-k query processing, real time updates, full text search, distributed
indexes over shards, etc.
While Lucene has long offered these capabilities, we will see that its
native capabilities are not intended for large semi-structured document collections with very
different schemata. For this reason we developed SIREn (Semantic Information Retrieval
Engine), a Lucene extension to overcome these shortcomings and efficiently index and query
RDF, as well as any textual document with an arbitrary number of metadata fields.
^ TOP
Conference papers
-
S. Campinas, R. Delbru,
G. Tummarello.
Effective Retrieval Model for Entity with Multi-Valued Attributes: BM25MF and Beyond.
In Proceedings of the 18th International
Conference on Knowledge Engineering and Knowledge Management (EKAW).
2012.
-
The task of entity retrieval becomes increasingly prevalent as more and
more structured information about entities is available on the Web in
various forms such as documents embedding metadata (RDF, RDFa,
Microdata, Microformats). International benchmarking campaigns, e.g.,
the Text REtrieval Conference or the Semantic Search Challenge, propose
entity-oriented search tracks. This reflects the need for an effective
search and discovery of entities. In this work, we present a
multi-valued attributes model for entity retrieval which extends and
generalises existing field-based ranking models. Our model introduces
the concept of multi-valued attributes and enables attribute and
value-specific normalization and weighting. Based on this model we
extend two state-of-the-art field-based rankings, i.e., BM25F and PL2F,
and demonstrate based on evaluations over heterogeneous datasets that
this model improves significantly the retrieval performance compared to
existing models. Finally, we introduce query dependent and independent
weights specifically designed for our model which provide significant
performance improvement.
-
R. Delbru,
G. Tummarello, A. Polleres.
Context-Dependent OWL Reasoning in Sindice - Experiences and Lessons Learnt.
In Proceedings of the 5th International Conference on
Web Reasoning and Rule Systems (RR). 2011.
- The Sindice Semantic Web index provides search capabilities over
260 million documents. Reasoning over web data enables to make explicit what
would otherwise be implicit knowledge: it adds value to the information and enables
Sindice to ultimately be more competitive in terms of precision and recall.
However, due to the scale and heterogeneity of web data, a reasoning engine for
the Sindice system must (1) scale out through parallelisation over a cluster of
machines; and (2) cope with unexpected data usage. In this paper, we report our
experiences and lessons learned in building a large scale reasoning engine for
Sindice. The reasoning approach has been deployed, used and improved since
2008 within Sindice and has enabled Sindice to reason over billions of triples.
-
L. Dragan, R. Delbru,
T. Groza, S. Handschuh, S. Decker.
Linking Semantic Desktop Data to the Web of Data.
In Proceedings of the 10th International
Semantic Web Conference (ISWC). 2011.
-
The goal of the Semantic Desktop is to enable better organization of
the personal information on our computers, by applying semantic
technologies on the desktop. However, information on our desktop is
often incomplete, as it is based on our subjective view, or limited
knowledge about an entity. On the other hand, the Web of Data contains
information about virtually everything, generally from multiple sources.
Connecting the desktop to the Web of Data would thus enrich and
complement desktop information. Bringing in information from the Web
of Data automatically would take the burden of searching for
information off the user. In addition, connecting the two networks of
data opens up the possibility of advanced personal services on the desktop.
Our solution tackles the problems raised above by using a semantic
search engine for the Web of Data, such as Sindice, to find and
retrieve a relevant subset of entities from the web. We present a
matching framework, using a combination of configurable heuristics and
rules to compare data graphs, that achieves a high degree of precision
in the linking decision. We evaluate our methodology with real-world
data; create a gold standard from relevance judgements by experts, and
we measure the performance of our system against it. We show that it
is possible to automatically link desktop data with web data in an
effective way.
- S. Campinas,
R. Delbru,
G. Tummarello.
SkipBlock: Self-Indexing for Block-Based Inverted List.
In Proceedings of the 33rd European Conference on
Information Retrieval (ECIR). 2011.
- In large web search engines the performance of
Information Retrieval systems is a key issue. Block-based compression methods are
often used to improve the search performance, but current self-indexing
techniques are not adapted to such data structure and provide suboptimal
performance. In this paper, we present SkipBlock, a self-indexing
model for block-based inverted lists. Based on a cost model, we show that
it is possible to achieve significant improvements on both search performance
and structure's space storage.
-
R. Delbru,
N. Toupikov, M. Catasta, G. Tummarello.
A Node Indexing Scheme for Web Entity Retrieval.
In Proceedings of the 7th Extended Semantic
Web Conference (ESWC). 2010.
[slides]
- Now motivated also by the partial support of
major search engines, hundreds of millions of documents are being
published on the web embedding semi-structured data in RDF, RDFa and
Microformats. This scenario calls for novel information search systems
which provide effective means of retrieving relevant semi-structured
information. In this paper, we present an "entity retrieval system"
designed to provide entity search capabilities over datasets as large as
the entire Web of Data. Our system supports full-text search,
semi-structural queries and top-k query results while exhibiting a
concise index and efficient incremental updates. We advocate the use of
a node indexing scheme and show that it offers a good compromise between
query expressiveness, query processing time and update complexity in
comparison to three other indexing techniques. We then demonstrate how
such system can effectively answer queries over 10 billion triples on a
single commodity machine.
-
R. Delbru,
N. Toupikov, M. Catasta, G. Tummarello, S. Decker.
Hierarchical Link Analysis for Ranking Web Data.
In Proceedings of the 7th Extended Semantic
Web Conference (ESWC). 2010.
[slides]
- On the Web of Data, entities are often
interconnected in a way similar to web documents. Previous works have
shown how PageRank can be adapted to achieve entity ranking. In this
paper, we propose to exploit locality on the Web of Data by taking a
layered approach, similar to hierarchical PageRank approaches. We
provide justifications for a two-layer model of the Web of Data, and
introduce DING (Dataset Ranking) a novel ranking methodology based on
this two-layer model. DING uses links between datasets to compute
dataset ranks and combines the resulting values with semantic-dependent
entity ranking strategies. We quantify the effectiveness of the approach
with other link-based algorithms on large datasets coming from the
Sindice search engine. The evaluation which includes a user study
indicates that the resulting rank is better than the other approaches.
Also, the resulting algorithm is shown to have desirable computational
properties such as parallelisation.
- S. Corlosquet,
R. Delbru,
T. Clark, A. Polleres and S. Decker.
Produce and Consume Linked Data with Drupal!.
In Proceedings of the 8th International
Semantic Web Conference (ISWC). 2009.
- Currently a large number of Web sites are
driven by Content Management Systems (CMS) which manage textual and
multimedia content but also - inherently - carry valuable information
about a site's structure and content model. Exposing this structured
information to the Web of Data has so far required considerable expertise
in RDF and OWL modelling and additional programming effort. In this
paper we tackle one of the most popular CMS: Drupal. We enable site
administrators to export their site content model and data to the Web of
Data without requiring extensive knowledge on Semantic Web technologies.
Our modules create RDFa annotations and --- optionally --- a SPARQL
endpoint for any Drupal site out of the box. Likewise, we add the means
to map the site data to existing ontologies on the Web with a search
interface to find commonly used ontology terms. We also allow a Drupal
site administrator to include existing RDF data from remote SPARQL
endpoints on the Web in the site. When brought together, these features
allow networked RDF Drupal sites that reuse and enrich Linked Data. We
finally discuss the adoption of our modules and report on a use case in
the biomedical field and the current status of its deployment.
- X. Bai, R. Delbru and
G. Tummarello.
RDF Snippets for Semantic Web Search Engines. In
Proceedings of the
International Conference on Ontologies, Databases and Applications of
Semantics (ODBASE). 2008.
- There has been interest in ranking the resources and generating
corresponding expressive descriptions from the Semantic Web
recently. This paper proposes an approach for automatically generating
snippets from RDF documents and assisting users in better understanding
the content of RDF documents return by Semantic Web search engines. An
heuristic method for discovering topics, based on the occurrences of RDF
nodes and the URIs of original RDF documents, is presented and
experimented in this paper. In order to make the snippets more
understandable, two strategies are proposed and used for ranking the
topic-related statements and the query-related statements respectively.
Finally, the conclusion is drawn based on the discussion about the
performances of our topic discovery and the whole snippet generation
approaches on a test dataset provided by Sindice.
- R. Cyganiak, H. Stenzhorn,
R. Delbru,
S. Decker and G. Tummarello.
Semantic Sitemaps: Efficient and
Flexible Access to Datasets on the Semantic Web.
In Proceedings of the Proceedings of
the 5th European Semantic Web Conference (ESWC). 2008.
- Increasing amounts of RDF data are available on the Web for
consumption by Semantic Web browsers and indexing by Semantic Web
search engines. Current Semantic Web publishing practices, however, do
not directly support efficient discovery and high-performance retrieval
by clients and search engines. We propose an extension to the Sitemaps
protocol which provides a simple and effective solution: Data publishers
create Semantic Sitemaps to announce and describe their data so that
clients can choose the most appropriate access method. We show how
this protocol enables an extended notion of authoritative information
across different access methods.
- G. Tummarello,
R. Delbru and
E. Oren.
Sindice.com: Weaving the open linked data.
In Proceedings of the 6th International
Semantic Web Conference (ISWC). 2007.
- Developers of Semantic Web applications face a challenge with
respect to the decentralised publication model: where to find statements
about encountered resources. The "linked data" approach, which mandates
that resource URIs should be de-referenced and yield metadata
about the resource, helps but is only a partial solution. We present Sindice,
a lookup index over resources crawled on the Semantic Web. Our index
allows applications to automatically retrieve sources with information about
a given resource. In addition we allow resource retrieval through
inverse-functional properties, offer full-text search and index SPARQL endpoints.
- E. Oren,
R. Delbru,
S. Gerke, A. Haller and S. Decker.
ActiveRDF: Object-oriented
semantic web programming. In
Proceedings of the 16th International World-Wide Web Conference
(WWW). May 2007.
- Object-oriented programming is the current mainstream programming paradigm but existing RDF
APIs are mostly triple-oriented. Traditional techniques for bridging a similar gap between
relational databases and object-oriented programs cannot be applied directly, given the
different nature of Semantic Web data, as can for example be seen in the semantics of class
membership, inheritance relations, and object conformance to schemas. We present ActiveRDF,
an object-oriented API for managing RDF data that offers full manipulation and querying of
RDF data, does not rely on a schema and fully conforms to RDF(S) semantics. ActiveRDF can
be used with different RDF data stores, adapters have been implemented to generic SPARQL
endpoints, Sesame, Jena, Redland and YARS and new adapters can be added easily. In addition,
integration with the popular Ruby on Rails framework enables fast development of Semantic Web
applications.
- E. Oren,
R. Delbru, and
S. Decker.
Extending faceted navigation for RDF data. In
Proceedings of the 5th International Semantic Web Conference (ISWC).
November 2006.
- Data on the Semantic Web is semi-structured and does not follow one fixed schema.
Faceted browsing is a natural technique for navigating such data, partitioning the
information space into orthogonal conceptual dimensions. Current faceted interfaces
are manually constructed and have limited query expressiveness. We develop an expressive
faceted interface for semi-structured data and formally show the improvement over
existing interfaces. Secondly, we develop metrics for automatic ranking of facet
quality, bypassing the need for manual construction of the interface. We develop a
prototype for faceted navigation of arbitrary RDF data. Experimental evaluation shows
improved usability over current interfaces.
^ TOP
Workshop papers
- S. Campinas, T. E. Perry,
D. Ceccarelli, R. Delbru and G. Tummarello.
Introducing RDF Graph Summary With Application to Assisted SPARQL Formulation. In
Proceedings of the the 23rd International Workshop on Database and
Expert Systems Applications (DEXA). Vienna, 2012.
-
One of the reasons for the slow adoption of SPARQL is the complexity
in query formulation due to data diversity. The principal barrier a
user faces when trying to formulate a query is that he generally has
no information about the underlying structure and vocabulary of the
data. In this paper, we address this problem at the maximum scale we
can think of: providing assistance in formulating SPARQL queries over
the entire Sindice data collection - 15 billion triples and counting
coming from more than 300K datasets. We present a method to help users
in formulating complex SPARQL queries across multiple heterogeneous
data sources. Even if the structure and vocabulary of the data sources
are unknown to the user, the user is able to quickly and easily
formulate his queries. Our method is based on a summary of the data
graph and assists the user during an interactive query formulation by
recommending possible structural query elements.
- N. Toupikov, J. Umbrich,
R. Delbru, M. Hausenblas and G. Tummarello.
DING! Dataset Ranking using Formal Descriptions. In
Proceedings of the WWW-2009 Workshop on Linked Data on the Web
(LDOW-2009). Madrid, Spain, 2009.
- Considering that thousands if not millions of linked datasets will
be published soon, we motivate in this paper the need for an efficient
and effective way to rank interlinked datasets based on formal
descriptions of their characteristics. We propose DING (from
Dataset RankING) as a new approach to rank linked datasets
using information provided by the voiD vocabulary. DING is a
domain-independent link analysis that measures the popularity of
datasets by considering the cardinality and types of the relationships.
We propose also a methodology to automatically assign weights to link
types. We evaluate the proposed ranking algorithm against other well
known ones, such as PageRank or HITS, using synthetic voiD descriptions.
Early results show that DING performs better than the standard Web
ranking algorithms.
- R. Delbru, A. Polleres,
G. Tummarello and S. Decker.
Context Dependent Reasoning for Semantic Documents in Sindice. In
Proceedings of the 4th International Workshop on Scalable Semantic
Web Knowledge Base Systems (SSWS). Karlsruhe, Germany, 2008.
[slides]
- The Sindice Semantic Web index provides search capabilities over
today more than 30 million documents. A scalable reasoning mechanism for
real-world web data is important in order to increase the precision and
recall of the Sindice index by inferring useful information (e.g. RDF
Schema features, equality, property characteristic such as inverse
functional properties or annotation properties from OWL). In this paper,
we introduce our notion of context dependent reasoning for RDF documents
published on the Web according to the linked data principle. We then
illustrate an efficient methodology to perform context dependent RDFS
and partial OWL inference based on a persistent TBox composed of a
network of web ontologies. Finally we report preliminary evaluation
results of our implementation underlying the Sindice web data index.
- G. Tummarello and R. Delbru.
Entity Coreference Resolution Services in Sindice.com: Identification on
the current Web of Data. In Proceedings
of the 1st international workshop on Identity and Reference on the
Semantic Web (IRSW). Tenerife, Spain. 2008.
- A. Harth, A. Hogan,
R. Delbru, J. Umbrich, S. O'Riain and S. Decker.
SWSE: Answers Before Links!. In Proceedings of the
Semantic Web Challenge (ISWC). Busan, Korea. 2007.
- We present a system that improves on current document-centric Web
search engine technology; adopting an entity-centric
perspective, we are able to integrate data from both static and live sources
into a coherent, interlinked information space. Users can then search and
navigate the integrated information space through relationships, both
existing and newly materialised, for improved knowledge discovery and
understanding.
- E. Oren and R. Delbru.
ActiveRDF: Object-oriented RDF in Ruby. In Proceedings of the
European Semantic Web Conference Workshop on Scripting for the
Semantic Web (ESWC). Budva, Montenegro. June 2006.
- Although most developers are object-oriented, programming RDF is triple-oriented.
Bridging this gap, by developing a truly object-oriented API that uses domain
terminology, is not straightforward, because of the dynamic and semi-structured nature
of RDF and the open-world semantics of RDF Schema. We present ActiveRDF, our
object-oriented library for accessing RDF data. ActiveRDF is completely dynamic,
offers full manipulation and querying of RDF data, does not rely on a schema and can
be used against different data-stores. In addition, the integration with the popular
Rails framework enables very easy development of Semantic Web applications.
- E. Oren and R. Delbru.
A prototype for faceted browsing of RDF data. In Proceedings of the
Workshop on Scripting for Semantic Web (ESWC). Budva, Montenegro.
June 2006.
- E. Oren, R. Delbru,
K. Möller, M. Völkel and S. Handschuh. Annotation and navigation in
semantic wikis. In Proceedings of the European Semantic Web
Conference Workshop on Semantic Wikis. Budva, Montenegro. June 2006.
- Semantic Wikis allow users to semantically annotate their Wiki content. The particular
annotations can differ in expressive power, simplicity, and meaning. We present an
elaborate conceptual model for semantic annotations, introduce a unique and rich Wiki
syntax for these annotations, and discuss how to best formally represent the augmented
Wiki content. We improve existing navigation techniques to automatically construct
faceted browsing for semistructured data. By utilising the Wiki annotations we
provide greatly enhanced information retrieval. Further we report on our ongoing
development of these techniques in our prototype SemperWiki.
^ TOP
Symposium
- Renaud Delbru.
SIREn: Entity Retrieval System for the Web of Data.
In Proceedings of the 3rd Symposium on Future Directions in
Information Access (FDIA). University of Padua, Italy. September 2009.
- We present ongoing work on the Semantic Information Retrieval Engine (SIREn), an
"entity retrieval system" specifically designed to meet the requirements of indexing
and searching a large amount of semi-structured data, e.g. the entire Web of Data.
SIREn supports efficient full text search with semi-structural queries and exhibits
a concise index, constant time updates and inherits Information Retrieval features
such as top-k queries, efficient caching and scalability via distribution over shards.
We demonstrate how SIREn can effectively answer queries over 10 billion triples on
single commodity machine. The prototype is currently in use in the Sindice search
engine which index at the present time more than 50 million harvested documents
containing semi-structured data.
- Renaud Delbru.
Methodology for Searching Entities on the Web.
In Proceedings of the European Semantic Web Conference Ph.D
Symposium (ESWC). Tenerife, Spain. June 2008.
^ TOP
Reports
- R. Delbru, S. Campinas,
K. Samp, G. Tummarello.
Adaptive Frame Of Reference for Compressing Inverted Lists, DERI
Technical Report 2010-12-16. December 2010.
-
The performance of Information Retrieval systems is a key issue in large web
search engines. The use of inverted indexes and compression techniques is partially accountable
for the current performance achievement of web search engines. In this paper, we
introduce a new class of compression techniques for inverted indexes, the Adaptive Frame
of Reference, that provides fast query response time, good compression ratio and also fast
indexing time. We compare our approach against a number of state-of-the-art compression
techniques for inverted index based on three factors: compression ratio, indexing and
query processing performance. We show that significant performance improvements can be
achieved.
- Renaud Delbru.
Manipulation and Exploration of Semantic Web Knowledge, Internship Report
DERI and
EPITA France. Jan—Jul
2006
-
La description des ressources web par des méta-données compréhensibles par les machines
est l'un des fondements du Web Sémantique. Resource Description Framework (RDF) est le
language pour décrire et échanger les connaissances du Web Sémantique. Comme ces données
deviennent de plus en plus courantes, les techniques permettant de manipuler et d'explorer
ces informations deviennent nécessaires.
Cependant, la manipulation des données RDF est orientée "triple". Ce type de représentation
est moins intuitif et plus difficile à prendre en main que l'approche orientée objet. Notre
objectif était donc de réconcilier les deux paradigmes en développant une interface de
programmation (API) permettant d'exposer les données RDF sous forme d'objet. ActiveRDF est
une API dynamique de haut niveau qui abstrait l'accès à différents types de base de données
RDF. Cette interface propose un accès aux données RDF sous la forme d'objets en utilisant
la terminologie du domaine.
Afin de pouvoir naviguer à travers les données RDF et pour chercher une information, nous
proposons Faceteer, une technique de navigation par facettes pour données semi-structurées.
Cette technique étend les possibilités de navigation par rapport aux techniques existantes.
Elle permet de construire visuellement et facilement des requêtes très complexes. L'interface
de navigation est générée automatiquement pour des données RDF arbitraires. Un ensemble
de mesures nous permet d'ordonner les facettes du navigateur afin d'améliorer la navigabilité.
Les résultats de nos recherches sur ActiveRDF et Faceteer permettent un gain de temps
substantiel dans la manipulation et l'exploration des données RDF pour les utilisateurs du
Web Sémantique.
^ TOP