Multilabel User Classification Using the Community Structure of Online Networks



We study the problem of semi-supervised, multi-label user classification in online social networks. We propose a framework that combines unsupervised community extraction and supervised, community-based feature weighting before training a classifier. We introduce Approximate Regularized Commute-Time Embedding (ARCTE), an algorithm that projects the users of a social graph onto a latent space, but instead of packing the global structure into a matrix of predefined rank, as many spectral and neural representation learning methods do, it extracts local communities for all users in the graph. To this end, we employ an improvement of personalized PageRank algorithms for searching locally in each user's graph structure. Then, we perform supervised community feature weighting in order to boost the importance of highly predictive communities. We assess our method performance on the problem of user classification by performing an extensive comparative study among various recent methods based on graph embeddings. The comparison shows that ARCTE significantly outperforms the competition in almost all cases, achieving an up to 35% relative improvement compared to the second best competing method in terms of F1-score.


You can hover over the graphs to see the exact performance of each algorithm. You can also download the figures as png by clicking on the photo icon that appears at the top right corner of each figure.

Results on ASU-Flickr

Results on ASU-YouTube

Results on IRMV-PoliticsUK

Results on SNOW2014G


The implementation of the ARCTE algorithm is available on GitHub.


This research has been supported by EC-funded project REVEAL under contract number 610928.