Partition a Knowledge Graph¶
For distributed training, a user needs to partition a graph beforehand. DGL-KE provides a partition tool dglke_partition, which partitions a given knowledge graph into N parts with the METIS partition algorithm. This partition algorithm reduces the number of edge cuts between partitions to reduce network communication in the distributed training. For a cluster of P machines, we usually split a graph into P partitions and assign a partition to a machine as shown in the figure below.
Arguments¶
The command line provides the following arguments:
--data_path DATA_PATHThe name of the knowledge graph stored under data_path. If it is one ofthe builtin knowledge grpahs such as FB15k, DGL-KE will automatically download the knowledge graph and keep it under data_path.--dataset DATA_SETThe name of the knowledge graph stored under data_path. If it is one of the builtin knowledge grpahs such asFB15k,FB15k-237,wn18,wn18rr, andFreebase, DGL-KE will automatically download the knowledge graph and keep it under data_path.--format FORMATThe format of the dataset. For builtin knowledge graphs, the format is determined automatically. For users own knowledge graphs, it needs to beraw_udd_{htr}orudd_{htr}.raw_udd_indicates that the user’s data use raw ID for entities and relations andudd_indicates that the user’s data uses KGE ID.{htr}indicates the location of the head entity, tail entity and relation in a triplet. For example,htrmeans the head entity is the first element in the triplet, the tail entity is the second element and the relation is the last element.--data_files [DATA_FILES ...]A list of data file names. This is required for training KGE on their own datasets. If the format israw_udd_{htr}, users need to provide train_file [valid_file] [test_file]. If the format isudd_{htr}, users need to provide entity_file relation_file train_file [valid_file] [test_file]. In both cases, valid_file and test_file are optional.--delimiter DELIMITERDelimiter used in data files. Note all files should use the same delimiter.-k NUM_PARTSor--num-parts NUM_PARTSThe number of partitions.