Konstantin Avratchenkov

INRIA, France

Graph-based semi-supervised learning methods

Abstract. Semi-supervised learning methods constitute a category of machine learning methods which use labelled points together with the similarity graph for classification of data points into predefined classes. For each class a semi-supervised method provides a classification function. The main idea of the semi-supervised methods is based on the assumption that the classification function should change smoothly over the similarity graph. This idea can be formulated as an optimization problem. Some particularly well known semi-supervised learning methods are the Standard Laplacian (or transductive learning) method and the Normalized Laplacian (or diffusion kernel) method. Different semi-supervised learning methods have different kernels which reflect how the underlying similarity graph influences the values of the classification functions. In the present work, we analyse a general family of semi-supervised methods, explain the differences between the methods and provide recommendations for the choice of the kernel parameters and labelled points. In particular, it appears that it is preferable to choose a method and a kernel based on the properties of the labelled points. Our general framework gives particularly promising PageRank based method. We illustrate our general theoretical conclusions with a typical benchmark example, clustered preferential attachment model and two applications. One application is about classification of Wikipedia pages and another application is about classification of content in P2P networks. (This talk is based on the joint works with P. Goncalves, A. Mishenin and M. Sokol)

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!
To be used only for spelling or punctuation mistakes.

Laboratory of Algorithms and Technologies for Networks Analysis (Nizhny Novgorod)

Konstantin Avratchenkov