Data¶
DataPreprocess¶
- class neuralkg_ind.data.DataPreprocess.KGData(args)[source]¶
Bases:
objectData preprocessing of kg data.
- args¶
Some pre-set parameters, such as dataset path, etc.
- ent2id¶
Encoding the entity in triples, type: dict.
- rel2id¶
Encoding the relation in triples, type: dict.
- id2ent¶
Decoding the entity in triples, type: dict.
- id2rel¶
Decoding the realtion in triples, type: dict.
- train_triples¶
Record the triples for training, type: list.
- valid_triples¶
Record the triples for validation, type: list.
- test_triples¶
Record the triples for testing, type: list.
- all_true_triples¶
Record all triples including train,valid and test, type: list.
- TrainTriples¶
- Relation2Tuple¶
- RelSub2Obj¶
- hr2t_train¶
Record the tail corresponding to the same head and relation, type: defaultdict(class:set).
- rt2h_train¶
Record the head corresponding to the same tail and relation, type: defaultdict(class:set).
- h2rt_train¶
Record the tail, relation corresponding to the same head, type: defaultdict(class:set).
- t2rh_train¶
Record the head, realtion corresponding to the same tail, type: defaultdict(class:set).
- get_id()[source]¶
Getting entity/relation id, and entity/relation number.
- Update:
self.ent2id: Entity to id. self.rel2id: Relation to id. self.id2ent: id to Entity. self.id2rel: id to Relation. self.args.num_ent: Entity number. self.args.num_rel: Relation number.
- get_triples_id()[source]¶
Getting triples id, save in the format of (h, r, t).
- Update:
self.train_triples: Train dataset triples id. self.valid_triples: Valid dataset triples id. self.test_triples: Test dataset triples id.
- get_hr2t_rt2h_from_train()[source]¶
Getting the set of hr2t and rt2h from train dataset, the data type is numpy.
- Update:
self.hr2t_train: The set of hr2t. self.rt2h_train: The set of rt2h.
- static count_frequency(triples, start=4)[source]¶
Getting frequency of a partial triple like (head, relation) or (relation, tail).
The frequency will be used for subsampling like word2vec.
- Parameters:
triples – Sampled triples.
start – Initial count number.
- Returns:
Record the number of (head, relation).
- Return type:
count
- class neuralkg_ind.data.DataPreprocess.GRData(args, db_name_pos, db_name_neg)[source]¶
Bases:
DatasetData preprocessing of subgraph. – DGL Only
- args¶
Some pre-set parameters, such as dataset path, etc.
- db_name_pos¶
Database name of positive sample, type: str.
- db_name_neg¶
Database name of negative sample, type: str.
- m_h2r¶
The matrix of head to rels, type: NDArray[signedinteger].
- m_t2r¶
The matrix of tail to rels, type: NDArray[signedinteger].
- ssp_graph¶
The collect of head to tail csc_matrix. type: list.
- graph¶
Dgl graph of train or test, type: DGLHeteroGraph.
- id2entity¶
Record the id to entity. type: dict.
- id2relation¶
Record the id to relation. type: dict.
- load_data_grail()[source]¶
Load train dataset, adj_list, ent2idx, etc.
- Returns:
The collect of head to tail csc_matrix. type: list. triplets: Triple of train-train and train-validation. train_ent2idx: Entity to idx of train graph. train_rel2idx: Relation to idx of train graph. train_idx2ent: idx to entity of train graph. train_idx2rel: idx to relation of train graph. h2r: Head to relation of train-train triple. m_h2r: The matrix of head to rels. t2r: Tail to relation of train-train triple. m_t2r: The matrix of tail to rels
- Return type:
adj_list
- load_ind_data_grail()[source]¶
Load test dataset, adj_list, ent2idx, etc.
- Returns:
The collect of head to tail csc_matrix. type: list. triplets: Triple of test-train and test-test. train_ent2idx: Entity to idx of test graph. train_rel2idx: Relation to idx of test graph. train_idx2ent: idx to entity of test graph. train_idx2rel: idx to relation of test graph. h2r: Head to relation of test-train triple. m_h2r: The matrix of head to rels. t2r: Tail to relation of test-train triple. m_t2r: The matrix of tail to rels
- Return type:
adj_list
- prepare_subgraphs(nodes, r_label, n_labels)[source]¶
Initialize subgraph nodes and relation characteristics.
- Parameters:
nodes – The nodes of subgraph.
r_label – The label of relation in subgraph corresponding triple.
n_labels – The label of node in subgraph.
- Returns:
Subgraph after processing.
- Return type:
subgraph
- prepare_features_new(subgraph, n_labels, r_label=None)[source]¶
prepare subgraph node features
- Parameters:
subgraph – Extract subgraph.
r_label – The label of relation in subgraph corresponding triple.
n_labels – The label of node in subgraph.
- Returns:
Subgraph after initialize node label.
- Return type:
subgraph
- class neuralkg_ind.data.DataPreprocess.MetaTrainGRData(args)[source]¶
Bases:
DatasetData preprocessing of meta train task.
- subgraphs_db¶
database of train subgraphs.
- class neuralkg_ind.data.DataPreprocess.MetaValidGRData(args)[source]¶
Bases:
DatasetData preprocessing of meta valid task.
- subgraphs_db¶
database of valid subgraphs.
- class neuralkg_ind.data.DataPreprocess.KGEEvalData(args, eval_triples, num_ent, hr2t, rt2h)[source]¶
Bases:
DatasetData processing for kge evaluate.
- triples¶
Evaluate triples. type: list.
- num_ent¶
The number of entity. type: int.
- hr2t¶
Head and raltion to tails. type: dict.
- rt2h¶
Relation and tail to heads. type: dict.
- num_cand¶
The number of candidate entities. type: str or int.
- class neuralkg_ind.data.DataPreprocess.BaseSampler(args)[source]¶
Bases:
KGDataTraditional random sampling mode.
- corrupt_head(t, r, num_max=1)[source]¶
Negative sampling of head entities.
- Parameters:
t – Tail entity in triple.
r – Relation in triple.
num_max – The maximum of negative samples generated
- Returns:
The negative sample of head entity filtering out the positive head entity.
- Return type:
neg
- corrupt_tail(h, r, num_max=1)[source]¶
Negative sampling of tail entities.
- Parameters:
h – Head entity in triple.
r – Relation in triple.
num_max – The maximum of negative samples generated
- Returns:
The negative sample of tail entity filtering out the positive tail entity.
- Return type:
neg
- head_batch(h, r, t, neg_size=None)[source]¶
Negative sampling of head entities.
- Parameters:
h – Head entity in triple
t – Tail entity in triple.
r – Relation in triple.
neg_size – The size of negative samples.
- Returns:
The negative sample of head entity. [neg_size]
- class neuralkg_ind.data.DataPreprocess.RevSampler(args)[source]¶
Bases:
KGDataAdding reverse triples in traditional random sampling mode.
For each triple (h, r, t), generate the reverse triple (t, r`, h). r` = r + num_rel.
- hr2t_train¶
Record the tail corresponding to the same head and relation, type: defaultdict(class:set).
- rt2h_train¶
Record the head corresponding to the same tail and relation, type: defaultdict(class:set).
- add_reverse_relation()[source]¶
Getting entity/relation/reverse relation id, and entity/relation number.
- Update:
self.ent2id: Entity id. self.rel2id: Relation id. self.args.num_ent: Entity number. self.args.num_rel: Relation number.
- add_reverse_triples()[source]¶
Generate reverse triples (t, r`, h).
- Update:
self.train_triples: Triples for training. self.valid_triples: Triples for validation. self.test_triples: Triples for testing. self.all_ture_triples: All triples including train, valid and test.
- corrupt_head(t, r, num_max=1)[source]¶
Negative sampling of head entities.
- Parameters:
t – Tail entity in triple.
r – Relation in triple.
num_max – The maximum of negative samples generated
- Returns:
The negative sample of head entity filtering out the positive head entity.
- Return type:
neg
- corrupt_tail(h, r, num_max=1)[source]¶
Negative sampling of tail entities.
- Parameters:
h – Head entity in triple.
r – Relation in triple.
num_max – The maximum of negative samples generated
- Returns:
The negative sample of tail entity filtering out the positive tail entity.
- Return type:
neg
- class neuralkg_ind.data.DataPreprocess.BaseGraph(args)[source]¶
Bases:
objectBase subgraph class
collect train, valid, test dataset for inductive.
- generate_ind_test()[source]¶
generate inductive test triples.
- Returns:
Negative triplets.
- Return type:
neg_triplets
- load_data_grail_ind()[source]¶
Load train dataset, adj_list, ent2idx, etc.
- Returns:
The collect of head to tail csc_matrix. dgl_adj_list: The collect of undirected head to tail csc_matrix. triplets: Triple of test-train and test-test. m_h2r: The matrix of head to rels. m_t2r: The matrix of tail to rels
- Return type:
adj_list
- get_neg_samples_replacing_head_tail(test_links, adj_list, num_samples=50)[source]¶
Sample negative triplets by relacing head or tail.
- Parameters:
test_links – test-test triplets.
adj_list – The collect of head to tail csc_matrix.
num_samples – The number of candidates.
- Returns:
Sampled negative triplets.
- Return type:
neg_triplets
Grounding¶
KGDataModule¶
Base DataModule class.
- class neuralkg_ind.data.KGDataModule.KGDataModule(*args: Any, **kwargs: Any)[source]¶
Bases:
BaseDataModuleBase DataModule. Learn more at https://pytorch-lightning.readthedocs.io/en/stable/datamodules.html
- get_data_config()[source]¶
Return important settings of the dataset, which will be passed to instantiate models.
- prepare_data()[source]¶
Use this method to do things that might write to disk or that need to be done only from a single GPU in distributed settings (so don’t set state self.x = y).
- setup(stage=None)[source]¶
Split into train, val, test, and set dims. Should assign torch Dataset objects to self.data_train, self.data_val, and optionally self.data_test.
- get_train_bs()[source]¶
Get batch size for training.
If the num_batches isn`t zero, it will divide data_train by num_batches to get batch size. And if user don`t give batch size and num_batches=0, it will raise ValueError.
- Returns:
The batch size for training.
- Return type:
self.args.train_bs
- train_dataloader()[source]¶
Implement one or more PyTorch DataLoaders for training.
- Returns:
A collection of
torch.utils.data.DataLoaderspecifying training samples. In the case of multiple dataloaders, please see this page.
The dataloader you return will not be reloaded unless you set :paramref:`~pytorch_lightning.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.
For data processing use the following pattern:
download in
prepare_data()process and split in
setup()
However, the above are only necessary for distributed processing.
Warning
do not assign state in prepare_data
fit()…
Note
Lightning adds the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.
Example:
# single dataloader def train_dataloader(self): transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (1.0,))]) dataset = MNIST(root='/path/to/mnist/', train=True, transform=transform, download=True) loader = torch.utils.data.DataLoader( dataset=dataset, batch_size=self.batch_size, shuffle=True ) return loader # multiple dataloaders, return as list def train_dataloader(self): mnist = MNIST(...) cifar = CIFAR(...) mnist_loader = torch.utils.data.DataLoader( dataset=mnist, batch_size=self.batch_size, shuffle=True ) cifar_loader = torch.utils.data.DataLoader( dataset=cifar, batch_size=self.batch_size, shuffle=True ) # each batch will be a list of tensors: [batch_mnist, batch_cifar] return [mnist_loader, cifar_loader] # multiple dataloader, return as dict def train_dataloader(self): mnist = MNIST(...) cifar = CIFAR(...) mnist_loader = torch.utils.data.DataLoader( dataset=mnist, batch_size=self.batch_size, shuffle=True ) cifar_loader = torch.utils.data.DataLoader( dataset=cifar, batch_size=self.batch_size, shuffle=True ) # each batch will be a dict of tensors: {'mnist': batch_mnist, 'cifar': batch_cifar} return {'mnist': mnist_loader, 'cifar': cifar_loader}
- val_dataloader()[source]¶
Implement one or multiple PyTorch DataLoaders for validation.
The dataloader you return will not be reloaded unless you set :paramref:`~pytorch_lightning.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.
It’s recommended that all data downloads and preparation happen in
prepare_data().Note
Lightning adds the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.
- Returns:
A
torch.utils.data.DataLoaderor a sequence of them specifying validation samples.
Examples:
def val_dataloader(self): transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (1.0,))]) dataset = MNIST(root='/path/to/mnist/', train=False, transform=transform, download=True) loader = torch.utils.data.DataLoader( dataset=dataset, batch_size=self.batch_size, shuffle=False ) return loader # can also return multiple dataloaders def val_dataloader(self): return [loader_a, loader_b, ..., loader_n]
Note
If you don’t need a validation dataset and a
validation_step(), you don’t need to implement this method.Note
In the case where you return multiple validation dataloaders, the
validation_step()will have an argumentdataloader_idxwhich matches the order here.
- test_dataloader()[source]¶
Implement one or multiple PyTorch DataLoaders for testing.
The dataloader you return will not be reloaded unless you set :paramref:`~pytorch_lightning.trainer.Trainer.reload_dataloaders_every_n_epochs` to a postive integer.
For data processing use the following pattern:
download in
prepare_data()process and split in
setup()
However, the above are only necessary for distributed processing.
Warning
do not assign state in prepare_data
Note
Lightning adds the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.
- Returns:
A
torch.utils.data.DataLoaderor a sequence of them specifying testing samples.
Example:
def test_dataloader(self): transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (1.0,))]) dataset = MNIST(root='/path/to/mnist/', train=False, transform=transform, download=True) loader = torch.utils.data.DataLoader( dataset=dataset, batch_size=self.batch_size, shuffle=False ) return loader # can also return multiple dataloaders def test_dataloader(self): return [loader_a, loader_b, ..., loader_n]
Note
If you don’t need a test dataset and a
test_step(), you don’t need to implement this method.Note
In the case where you return multiple test dataloaders, the
test_step()will have an argumentdataloader_idxwhich matches the order here.
RuleDataLoader¶
Sampler¶
- class neuralkg_ind.data.Sampler.SubSampler(args)[source]¶
Bases:
BaseGraphSampling subgraphs.
Prepare subgraphs and collect batch of subgraphs.
- class neuralkg_ind.data.Sampler.RMPISampler(args)[source]¶
Bases:
BaseGraphSampling subgraphs for RMPI training, which add disclosing subgraph.
- class neuralkg_ind.data.Sampler.UniSampler(args)[source]¶
Bases:
BaseSamplerRandom negative sampling Filtering out positive samples and selecting some samples randomly as negative samples.
- cross_sampling_flag¶
The flag of cross sampling head and tail negative samples.
- class neuralkg_ind.data.Sampler.BernSampler(args)[source]¶
Bases:
BaseSamplerUsing bernoulli distribution to select whether to replace the head entity or tail entity.
- lef_mean¶
Record the mean of head entity
- rig_mean¶
Record the mean of tail entity
- sampling(data)[source]¶
Using bernoulli distribution to select whether to replace the head entity or tail entity.
- Parameters:
data – The triples used to be sampled.
- Returns:
The training data.
- Return type:
batch_data
- class neuralkg_ind.data.Sampler.AdvSampler(args)[source]¶
Bases:
BaseSamplerSelf-adversarial negative sampling, in math:
pleft(h_{j}^{prime}, r, t_{j}^{prime} midleft{left(h_{i}, r_{i}, t_{i}
ight) ight} ight)= rac{exp lpha f_{r}left(mathbf{h}_{j}^{prime}, mathbf{t}_{j}^{prime} ight)}{sum_{i} exp lpha f_{r}left(mathbf{h}_{i}^{prime}, mathbf{t}_{i}^{prime} ight)}
- Attributes:
freq_hr: The count of (h, r) pairs. freq_tr: The count of (t, r) pairs.
- class neuralkg_ind.data.Sampler.AllSampler(args)[source]¶
Bases:
RevSamplerMerging triples which have same head and relation, all false tail entities are taken as negative samples.
- class neuralkg_ind.data.Sampler.CrossESampler(args)[source]¶
Bases:
BaseSampler
- class neuralkg_ind.data.Sampler.ConvSampler(args)[source]¶
Bases:
RevSamplerMerging triples which have same head and relation, all false tail entities are taken as negative samples.
The triples which have same head and relation are treated as one triple.
- label¶
Mask the false tail as negative samples.
- triples¶
The triples used to be sampled.
- class neuralkg_ind.data.Sampler.XTransESampler(args)[source]¶
Bases:
RevSamplerRandom negative sampling and recording neighbor entities.
- triples¶
The triples used to be sampled.
- neg_sample¶
The negative samples.
- h_neighbor¶
The neighbor of sampled entites.
- h_mask¶
The tag of effecitve neighbor.
- max_neighbor¶
The maximum of the neighbor entities.
- class neuralkg_ind.data.Sampler.GraphSampler(args)[source]¶
Bases:
RevSamplerGraph based sampling in neural network.
- entity¶
The entities of sampled triples.
- relation¶
The relation of sampled triples.
- triples¶
The sampled triples.
- graph¶
The graph structured sampled triples by dgl.graph in DGL.
- norm¶
The edge norm in graph.
- label¶
Mask the false tail as negative samples.
- sampling(pos_triples)[source]¶
Graph based sampling in neural network.
- Parameters:
pos_triples – The triples used to be sampled.
- Returns:
The training data.
- Return type:
batch_data
- sampling_negative(mode, pos_triples, num_neg)[source]¶
Random negative sampling without filtering
- Parameters:
mode – The mode of negtive sampling.
pos_triples – The positive triples.
num_neg – The number of negative samples corresponding to each triple.
- Results:
neg_samples: The negative triples.
- build_graph(num_ent, triples, power)[source]¶
Using sampled triples to build a graph by dgl.graph in DGL.
- Parameters:
num_ent – The number of entities.
triples – The positive sampled triples.
power – The power index for normalization.
- Returns:
The relation of sampled triples. graph: The graph structured sampled triples by dgl.graph in DGL. edge_norm: The edge norm in graph.
- Return type:
rela
- comp_deg_norm(graph, power=-1)[source]¶
Calculating the normalization node weight.
- Parameters:
graph – The graph structured sampled triples by dgl.graph in DGL.
power – The power index for normalization.
- Returns:
The node weight of normalization.
- Return type:
tensor
- class neuralkg_ind.data.Sampler.KBATSampler(args)[source]¶
Bases:
BaseSamplerGraph based n_hop neighbours in neural network.
- n_hop¶
The graph of n_hop neighbours.
- graph¶
The adjacency graph.
- neighbours¶
The neighbours of sampled triples.
- adj_matrix¶
The triples of sampled.
- triples¶
The sampled triples.
- triples_GAT_pos¶
Positive triples.
- triples_GAT_neg¶
Negative triples.
- triples_Con¶
All triples including positive triples and negative triples.
- label¶
Mask the false tail as negative samples.
- sampling(pos_triples)[source]¶
Graph based n_hop neighbours in neural network.
- Parameters:
pos_triples – The triples used to be sampled.
- Returns:
The training data.
- Return type:
batch_data
- bfs(graph, source, nbd_size=2)[source]¶
Using depth first search algorithm to generate n_hop neighbor graph.
- Parameters:
graph – The adjacency graph.
source – Head node.
nbd_size – The number of hops.
- Returns:
N_hop neighbor graph.
- Return type:
neighbors
- get_neighbors(nbd_size=2)[source]¶
Getting the relation and entity of the source in the n_hop neighborhood.
- Parameters:
nbd_size – The number of hops.
- Returns:
Record the relation and entity of the source in the n_hop neighborhood.
- Return type:
self.neighbours
- get_unique_entity(triples)[source]¶
Getting the set of entity.
- Parameters:
triples – The sampled triples.
- Returns:
The set of entity
- Return type:
numpy.array
- get_batch_nhop_neighbors_all(nbd_size=2)[source]¶
Getting n_hop neighbors of all entities in batch.
- Parameters:
nbd_size – The number of hops.
- Returns:
The set of n_hop neighbors.
- class neuralkg_ind.data.Sampler.CompGCNSampler(args)[source]¶
Bases:
GraphSamplerGraph based sampling in neural network.
- relation¶
The relation of sampled triples.
- triples¶
The sampled triples.
- graph¶
The graph structured sampled triples by dgl.graph in DGL.
- norm¶
The edge norm in graph.
- label¶
Mask the false tail as negative samples.
- class neuralkg_ind.data.Sampler.TestSampler(sampler)[source]¶
Bases:
objectSampling triples and recording positive triples for testing.
- sampler¶
The function of training sampler.
- hr2t_all¶
Record the tail corresponding to the same head and relation.
- rt2h_all¶
Record the head corresponding to the same tail and relation.
- num_ent¶
The count of entities.
- get_hr2t_rt2h_from_all()[source]¶
Get the set of hr2t and rt2h from all datasets(train, valid, and test), the data type is tensor.
- Update:
self.hr2t_all: The set of hr2t. self.rt2h_all: The set of rt2h.
- class neuralkg_ind.data.Sampler.ValidSampler(sampler)[source]¶
Bases:
objectSampling subgraphs for validation.
- sampler¶
The function of training sampler.
- args¶
Model configuration parameters.
- class neuralkg_ind.data.Sampler.ValidRMPISampler(sampler)[source]¶
Bases:
objectSampling subgraphs for RMPI validation.
- sampler¶
The function of training sampler.
- args¶
Model configuration parameters.
- class neuralkg_ind.data.Sampler.TestSampler_hit(sampler)[source]¶
Bases:
objectSampling subgraphs for testing link prediction.
- sampler¶
The function of training sampler.
- args¶
Model configuration parameters.
- m_h2r¶
The matrix of head to rels.
- m_t2r¶
The matrix of tail to rels.
- sampling(data)[source]¶
Sampling function to collect batch of subgraph for testing mrr and hit@1,5,10.
- Parameters:
data – List of subgraph data for testing.
- Returns:
The batch of testing data.
- Return type:
batch_data
- get_subgraphs(all_links, adj_list, dgl_adj_list, max_node_label_value, m_h2r, m_t2r)[source]¶
Extracting and labeling subgraphs.
- Parameters:
all_links – All head or tail entities link to corresponding triple.
adj_list – List of adjacency matrix.
dgl_adj_list – List of undirected head to tail matrix.
max_node_label_value – Max value of node label.
m_h2r – The matrix of head to rels.
m_t2r – The matrix of tail to rels.
- Returns:
Subgraphs for testing. r_labels: Labels of relation.
- Return type:
subgraphs
- prepare_features(subgraph, n_labels, max_n_label, n_feats=None)[source]¶
One hot encode the node label feature and concat to n_featsure.
- Parameters:
subgraph – Subgraph for processing.
n_labels – Node labels.
max_n_label – Max value of node label.
n_feats – node features.
- Returns:
Subgraph after processing.
- Return type:
subgraph
- class neuralkg_ind.data.Sampler.TestRMPISampler_hit(sampler)[source]¶
Bases:
objectSampling subgraphs for RMPI testing link prediction.
- sampler¶
The function of training sampler.
- args¶
Model configuration parameters.
- sampling(data)[source]¶
Sampling function to collect batch of subgraph for RMPI testing.
- Parameters:
data – List of subgraph data for RMPI sting.
- Returns:
The batch of RMPI testing data.
- Return type:
batch_data
- prepare_subgraph(dgl_adj_list, nodes, rel, node_labels, max_node_label_value)[source]¶
Prepare enclosing or disclosing subgraph.
- Parameters:
dgl_adj_list – List of undirected head to tail matrix.
nodes – Nodes of subgraph.
rel – Relation idx.
node_labels – Node labels.
max_node_label_value – Max value of node label.
- Returns:
Subgraph for testing.
- Return type:
subgraph
- get_subgraphs(all_links, adj_list, dgl_adj_list, max_node_label_value)[source]¶
Extracting and labeling subgraphs.
- Parameters:
all_links – All head or tail entities link to corresponding triple.
adj_list – List of adjacency matrix.
dgl_adj_list – List of undirected head to tail matrix.
max_node_label_value – Max value of node label.
- Returns:
Subgraphs for testing. r_labels: Labels of relation.
- Return type:
subgraphs
- prepare_features(subgraph, n_labels, max_n_label, n_feats=None)[source]¶
One hot encode the node label feature and concat to n_featsure for RMPI.
- Parameters:
subgraph – Subgraph for processing.
n_labels – Node labels.
max_n_label – Max value of node label.
n_feats – node features.
- Returns:
Subgraph after processing.
- Return type:
subgraph
- class neuralkg_ind.data.Sampler.TestSampler_auc(sampler)[source]¶
Bases:
objectSampling subgraphs for testing triple classification.
- sampler¶
The function of training sampler.
- args¶
Model configuration parameters.
- class neuralkg_ind.data.Sampler.TestRMPISampler_auc(sampler)[source]¶
Bases:
objectSampling subgraphs for testing RMPI triple classification.
- sampler¶
The function of training sampler.
- args¶
Model configuration parameters.
- class neuralkg_ind.data.Sampler.MetaSampler(args)[source]¶
Bases:
BaseMetaSampling meta task and collecting train data for training.
- class neuralkg_ind.data.Sampler.ValidMetaSampler(sampler)[source]¶
Bases:
objectCollecting task for validating.
- class neuralkg_ind.data.Sampler.TestMetaSampler_hit(sampler)[source]¶
Bases:
objectCollecting task for testing.
- class neuralkg_ind.data.Sampler.TestMetaSampler_auc(sampler)[source]¶
Bases:
objectCollecting task for testing.
- class neuralkg_ind.data.Sampler.GraphTestSampler(sampler)[source]¶
Bases:
objectSampling graph for testing.
- sampler¶
The function of training sampler.
- hr2t_all¶
Record the tail corresponding to the same head and relation.
- rt2h_all¶
Record the head corresponding to the same tail and relation.
- num_ent¶
The count of entities.
- triples¶
The training triples.
- get_hr2t_rt2h_from_all()[source]¶
Get the set of hr2t and rt2h from all datasets(train, valid, and test), the data type is tensor.
- Update:
self.hr2t_all: The set of hr2t. self.rt2h_all: The set of rt2h.
- class neuralkg_ind.data.Sampler.CompGCNTestSampler(sampler)[source]¶
Bases:
objectSampling graph for testing.
- sampler¶
The function of training sampler.
- hr2t_all¶
Record the tail corresponding to the same head and relation.
- rt2h_all¶
Record the head corresponding to the same tail and relation.
- num_ent¶
The count of entities.
- triples¶
The training triples.
- get_hr2t_rt2h_from_all()[source]¶
Get the set of hr2t and rt2h from all datasets(train, valid, and test), the data type is tensor.
- Update:
self.hr2t_all: The set of hr2t. self.rt2h_all: The set of rt2h.
- class neuralkg_ind.data.Sampler.SEGNNTrainProcess(args)[source]¶
Bases:
RevSampler
- class neuralkg_ind.data.Sampler.SEGNNTestSampler(sampler)[source]¶
Bases:
Dataset- get_hr2t_rt2h_from_all()[source]¶
Get the set of hr2t and rt2h from all datasets(train, valid, and test), the data type is tensor.
- Update:
self.hr2t_all: The set of hr2t. self.rt2h_all: The set of rt2h.
base_data_module¶
Base DataModule class.
- class neuralkg_ind.data.base_data_module.BaseDataModule(*args: Any, **kwargs: Any)[source]¶
Bases:
LightningDataModuleBase DataModule. Learn more at https://pytorch-lightning.readthedocs.io/en/stable/datamodules.html
- prepare_data()[source]¶
Use this method to do things that might write to disk or that need to be done only from a single GPU in distributed settings (so don’t set state self.x = y).
- setup(stage=None)[source]¶
Split into train, val, test, and set dims. Should assign torch Dataset objects to self.data_train, self.data_val, and optionally self.data_test.
- train_dataloader()[source]¶
Implement one or more PyTorch DataLoaders for training.
- Returns:
A collection of
torch.utils.data.DataLoaderspecifying training samples. In the case of multiple dataloaders, please see this page.
The dataloader you return will not be reloaded unless you set :paramref:`~pytorch_lightning.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.
For data processing use the following pattern:
download in
prepare_data()process and split in
setup()
However, the above are only necessary for distributed processing.
Warning
do not assign state in prepare_data
fit()…
Note
Lightning adds the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.
Example:
# single dataloader def train_dataloader(self): transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (1.0,))]) dataset = MNIST(root='/path/to/mnist/', train=True, transform=transform, download=True) loader = torch.utils.data.DataLoader( dataset=dataset, batch_size=self.batch_size, shuffle=True ) return loader # multiple dataloaders, return as list def train_dataloader(self): mnist = MNIST(...) cifar = CIFAR(...) mnist_loader = torch.utils.data.DataLoader( dataset=mnist, batch_size=self.batch_size, shuffle=True ) cifar_loader = torch.utils.data.DataLoader( dataset=cifar, batch_size=self.batch_size, shuffle=True ) # each batch will be a list of tensors: [batch_mnist, batch_cifar] return [mnist_loader, cifar_loader] # multiple dataloader, return as dict def train_dataloader(self): mnist = MNIST(...) cifar = CIFAR(...) mnist_loader = torch.utils.data.DataLoader( dataset=mnist, batch_size=self.batch_size, shuffle=True ) cifar_loader = torch.utils.data.DataLoader( dataset=cifar, batch_size=self.batch_size, shuffle=True ) # each batch will be a dict of tensors: {'mnist': batch_mnist, 'cifar': batch_cifar} return {'mnist': mnist_loader, 'cifar': cifar_loader}
- val_dataloader()[source]¶
Implement one or multiple PyTorch DataLoaders for validation.
The dataloader you return will not be reloaded unless you set :paramref:`~pytorch_lightning.trainer.Trainer.reload_dataloaders_every_n_epochs` to a positive integer.
It’s recommended that all data downloads and preparation happen in
prepare_data().Note
Lightning adds the correct sampler for distributed and arbitrary hardware There is no need to set it yourself.
- Returns:
A
torch.utils.data.DataLoaderor a sequence of them specifying validation samples.
Examples:
def val_dataloader(self): transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (1.0,))]) dataset = MNIST(root='/path/to/mnist/', train=False, transform=transform, download=True) loader = torch.utils.data.DataLoader( dataset=dataset, batch_size=self.batch_size, shuffle=False ) return loader # can also return multiple dataloaders def val_dataloader(self): return [loader_a, loader_b, ..., loader_n]
Note
If you don’t need a validation dataset and a
validation_step(), you don’t need to implement this method.Note
In the case where you return multiple validation dataloaders, the
validation_step()will have an argumentdataloader_idxwhich matches the order here.
- test_dataloader()[source]¶
Implement one or multiple PyTorch DataLoaders for testing.
The dataloader you return will not be reloaded unless you set :paramref:`~pytorch_lightning.trainer.Trainer.reload_dataloaders_every_n_epochs` to a postive integer.
For data processing use the following pattern:
download in
prepare_data()process and split in
setup()
However, the above are only necessary for distributed processing.
Warning
do not assign state in prepare_data
Note
Lightning adds the correct sampler for distributed and arbitrary hardware. There is no need to set it yourself.
- Returns:
A
torch.utils.data.DataLoaderor a sequence of them specifying testing samples.
Example:
def test_dataloader(self): transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (1.0,))]) dataset = MNIST(root='/path/to/mnist/', train=False, transform=transform, download=True) loader = torch.utils.data.DataLoader( dataset=dataset, batch_size=self.batch_size, shuffle=False ) return loader # can also return multiple dataloaders def test_dataloader(self): return [loader_a, loader_b, ..., loader_n]
Note
If you don’t need a test dataset and a
test_step(), you don’t need to implement this method.Note
In the case where you return multiple test dataloaders, the
test_step()will have an argumentdataloader_idxwhich matches the order here.