Format of Output

Different DGL-KE command line toolkits has different output data. Basically they have following dependency:

  • dglke_dist_train depends on the output of dglke_partition
  • dglke_eval depends on the output (Trained Embeddings) of the training CMD dglke_train or dglke_dist_train
  • dglke_predict and dglke_emb_sim depends on the the output (Trained Embeddings) of the training CMD dglke_train or dglke_dist_train as well as the ID mapping file.

Output format of dglke_partition

dglke_partition parititions a graph into parts. It generates N partition directories according to the input argument -k N. For example, when we set -k to 4, it will generate 4 directories: partition_0, partition_1, partition_2, and partition_3.

The detailed format of each partition_n is used by dglke_dist_train only and is out of the current scope. Please refer to distributed train section for more details.

Output format of dglke_train and dglke_dist_train

The output of dglke_train and dglke_dist_train are almost the same. Here we explain the output of dglke_train in this paragraph.

Basically there are four outputs:

  • Traned Embeddings: The saved model. For most of models like TransE, RESCAL, DistMult, ComplEx, and RotatE, there will be two files: <dataset_name>_<model>_entity.npy for entity embedding and <dataset_name>_<model>_relation.npy for relation embedding. There are all saved numpy tensor objects. For TransR, there is one additional output for saving the projection matrix.
  • config.json: The config file records all the details of the training configurations as well as the locations of ID mapping files generated by dgl_train. The fields of the config file are shown below:
Field Name Explanation
neg_sample_size int value of param –neg_sample_size
max_train_step int value of param –max_step
double_ent bool value of param –double_ent
rmap_file relation ID mapping file name
lr float value of param –lr
neg_adversarial_sampling bool value of param –neg_adversarial_sampling
gamma float value of param – gamma
adversarial_temperature float value of param – adversarial_temperature
batch_size int value of param – batch_size
regularization_coef float value of param –regularization_coef
model model name
dataset dataset name
emb_size embedding dimention size
regularization_norm int value of param –regularization_norm
double_rel bool value of param –double_rel
emap_file entity ID mapping file name
  • Training Log: The output log printed to stdout. If --test is set. The final test result is also output (MR, MRR, Hit@1, Hit@3, Hit@10).

  • ID mapping Files (Optional): The the input data is in format of Raw User Defined Knowledge Graph, that is all triplets use the Raw ID space. The training script will do the ID convertion and generate two ID mapping files:

    • entities.tsv, for entity ID mapping in format of KGE_entity_ID\tRaw_entity_Name, for example:

      0\tBeijing

      1\tChina”

    • relations.tsv, for relation ID mapping in format of KGE_relation_ID\tRaw_relation_name, for example:

      0\tis_capital_of

      1\tlocated_at

Output format of dglke_eval

There will be only one output of dglke_eval, the testing result including MR, MRR, Hit@1, Hit@3, Hit@10.

Output format of dglke_predict

The output of dglke_predict is a list of top ranked candidate (h, r, t) triplets as well as their prediction scores. The output is by default written into result.tsv and in the format of ‘src\trel\tdst\tscore’.

The example output is as:

src  rel  dst  score
6    0    15   -2.39380
8    0    14   -2.65297
2    0    14   -2.67331
9    0    18   -2.86985
8    0    20   -2.89651

If the input data of dglke_predict is in Raw IDs, dglke_predict will also convert the output result in Raw IDs.

The example output is as::
head rel tail score 08847694 _derivationally_related_form 09440400 -7.41088 08847694 _hyponym 09440400 -8.99562 02537319 _derivationally_related_form 01490112 -9.08666 02537319 _hyponym 01490112 -9.44877 00083809 _derivationally_related_form 05940414 -9.88155

Output format of dglke_emb_sim

The output of dglke_emb_sim is a list of top ranked candidate (left, right) pairs as well as their embedding similarity scores. The output is by default written into result.tsv and in the format of ‘left\tright\tscore’.

The example output is as:

left    right   score
6       15      0.55512
1       12      0.33153
7       20      0.27706
7       19      0.25631
7       13      0.21372

If the input data of dglke_emb_sim is in Raw IDs, dglke_emb_sim will also convert the output result in Raw IDs.

The example output is as:

left                          right                           score
_hyponym                      _hyponym                        0.99999
_derivationally_related_form  _derivationally_related_form    0.99999
_hyponym                      _also_see                       0.58408
_hyponym                      _member_of_domain_topic         0.44027
_hyponym                      _member_of_domain_region        0.30975