Datasets: Simple and Yago-S

The presented datasets are used for the evaluation of a method for the computation of the statistics of knowledge graphs presented in the paper entitled "Statistics of a RDF store for querying knowledge graphs" by the authors Iztok Savnik, Kiyoshi Nitta, Riste Skrekovski and Nikolaus Augsten.

The system epsilon is a prototype system for browsing and querying knowledge graphs. epsilon includes the implementation of the algorithms for the computation of the statistics of knowledge graphs presented in the paper.

Dataset Simple

The dataset Simple is used as a working example in the paper. It represents a simple hierarchy of persons including scientists and philosophers, a location and a small set of properties of persons. The dataset includes 33 triples.

The script used for the computation of the statistics of Simple dataset can be run in the batch mode of epsilon. The results of the scripts are presented in Section 4.2 of the above stated paper.

Dataset Yago-S

The dataset Yago-S includes a subset of Yago 2 from Max Planck Institute for Informatics. It contains 24743914 triples from the files yagoFacts.tsv, yagoLiteralFacts.tsv, yagoSchema.tsv, yagoTaxonomy.tsv, and yagoTypes.tsv.

The scripts used for the computation of the statistics of Yago-S in the batch mode of epsilon are stored in a directory yago-s. The results of the scripts are presented in Section 4.3 of the paper.

Last update: Thu Feb 24 21:15:14 CET 2022