This repository has been archived by the owner on May 24, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2
Versioning
Bo Ferri edited this page Jul 27, 2015
·
3 revisions
Graph matching algorithms and related topics are an issue that is analysed, evaluated and handled since ages in research and industry. This mendeley group gives the current state overview about our research re. (RDF) graph matching (and related topics). The following papers build (especially) the foundation of our graph delta algorithm:
- Matching RDF Graphs (J. J. Carroll, 2001) (1)
- Signing RDF Graphs (J. J. Carroll, 2003) (2)
- A Scale-Out RDF Molecule Store for Improved Co-Identification, Querying and Inferencing (Newman et. al, 2008) (3)
- Blank Node Matching and RDF/S Comparison Functions (Lantzaki et. al, 2012) (4)
Here are our pre-conditions:
- a GDM graph consists of a set of Concise Bounded Description (RDF Molecules, etc.) - these are our resources (i.e. records etc.)
- a CBD of resource consists of a set of statements, i.e., all statements belong to this resource (however, the subject of the statement doesn't always need to be the resource node, i.e., this is only the case for non-hierarchical resources) and this CBD belongs to certain provenance (i.e. sub graph)
- each resource is identified by an URI (i.e. there are no root nodes in a CBD that are bnodes)
- each CBD of a resource is a tree (not a graph)
- some structure information is already available explicitly, e.g.,
order
property at each statement ornode type
label at each node (e.g.RESOURCE
orLITERAL
)
General approach:
- we should compute the delta on basis of an abstract model (instead of a concrete representation (serialisation))
- we should primarily choose a structure-based approach (to be able to identify matching bnodes (sub graphs) in a deeper hierarchy of the CBD)
- we could utilise graph signatures (digests, hashes, ...) as pre-computing step to avoid unnecessary calculations
- we could utilise object identifier information (or other schema information) to address sub objects (i.e. requires the implementation of the 'key definition' feature)
- we could utilise hierarchy level information from the nodes
Concrete proposal:
- we can implement a graph delta algorithm that is based on the signature-based algorithm of (4) (see also BNodeLand) and improve and adapt this algorithm to our GDM and conditions
- the bnode signature algorithm builds the core component
- we can add a property for the hierarchy of a statement in a CBD to reduce the size of the comparison set
- we only need to compare resource by resource disc(i.e. retrieve the existing resource for a concrete provenance from GDBMS and compare it with the new one)
Versioning handling (simplified):
- add new statements
- mark removed statement as deleted
- re-assign versioning pointer (e.g. HEAD)
Currently, versioning is divided into 2 parts:
- one algorithm deals with calculating the delta between two resources in our GDM (see Delta Algorithm)
- and the other algorithm deals with writing these deltas back to the graph, i.e., do the versioning handling (see Versioning Algorithm)
- Overview
- misc
- Graph Data Model
- Server-Installation (Productive Environment)
- Maintenance
- HowTos
- Use Cases