Node Classification using Graph Convolutional Neural Network
Node Classification on Cora Dataset in PyTorch using GCN
Dataset: This article uses Cora Dataset, consisting of 2708 scientific publications classified into one of seven different classes. The citation network consists of 5429 links.
Objective: Node classification using GCN to accurately predict the subject of a paper given its words and citation network using PyTorch geometric
Graph Convolutional Neural Network(GCN) model is a framework of spectral graph convolutions applying a generalization of convolutions to non-Euclidean data.
GCNs are similar to convolutions applied to images as they generalize the graph data's convolution operations. The filter parameters are shared over all locations in the graph. GCN is based on graph convolutions built by stacking multiple convolutional layers, and a point-wise non-linearity function follows each layer.
In this example, you will classify the scientific papers in a citation graph where labels are only available for a small subset of nodes, and GCN must predict the correct label for the node.
The key idea of GCN is to generate node embeddings based on local network neighborhoods. Nodes aggregate information from their neighbors using neural networks. As a result, every node defines a computation graph based on its neighborhood by averaging neighbor messages and applying a neural network, as shown below.
Exploring the Dataset
Open the tar files inside the .tgz files
import tarfilecoraTarFile = 'https://linqs-data.soe.ucsc.edu/public/lbc/cora.tgz'
tarfiles = urllib.request.urlopen(coraTarFile)
zip_file = tarfile.open(fileobj=tarfiles, mode="r|gz")
for tarinfo in zip_file…