# Publications

In this paper, a deep learning method study is conducted to solve a new multiclass text classification problem, identifying user interests by text messages. We used an original dataset of almost 90 thousand forum text messages, labeled for ten interests. We experimented with different modern neural network architectures: recurrent and convolutional, as well as simpler feedforward networks. Classification accuracy was evaluated for different architectures, text representations, and sets of miscellaneous parameters.

This book constitutes the proceedings of the 8th International Conference on Analysis of Images, Social Networks and Texts, AIST 2019, held in Kazan, Russia, in July 2019.

The 24 full papers and 10 short papers were carefully reviewed and selected from 134 submissions (of which 21 papers were rejected without being reviewed). The papers are organized in topical sections on general topics of data analysis; natural language processing; social network analysis; analysis of images and video; optimization problems on graphs and network structures; analysis of dynamic behaviour through event data.

This volume contains the refereed proceedings of the 8th International Conference on Analysis of Images, Social Networks, and Texts (AIST 2019). The previous conferences during 2012–2018 attracted a significant number of data scientists – students, researchers, academics, and engineers working on interdisciplinary data analysis of images, texts, and social networks.

For a graph *G* and a positive integer *k*, a subset *C* of vertices of *G* is called a *k*-path vertex cover if *C* intersects all paths of *k* vertices in *G*. The cardinality of a minimum *k*-path vertex cover is denoted by *β_{**P**_k*}(*G*). For a graph *G* and a positive integer *k*, a subset *M* of pairwise vertex-disjoint paths of *k* vertices in *G* is called a *k*-path packing. The cardinality of a maximum *k*-path packing is denoted by *μ**_{P**_k}*(*G*). In this paper, we describe some graphs, having equal values of *β_{**P**_k*} and *μ_{**P**_k*}, for *k*≥5, and present polynomial-time algorithms of finding a minimum *k*-path vertex cover and a maximum *k*-path packing in such graphs.

The goal of the study is to increase the computation efficiency of the face recognition that uses feature vectors to describe facial images on photos and videos. These high-dimensional feature vectors are nowadays produced by convolutional neural networks. The methods to aggregate the features generated for each video frame are used to process the video sequences. A novel hierarchical recognition algorithm is proposed. In contrast to known approaches our algorithm seeks the nearest neighbors only among reference images of most reliable classes selected at the preceding stage to carry out the sequential analysis of a more detailed description level (with a greater dimensionality of the feature vector). At each stage principal components are compared, the number of the components being chosen according to a given portion of explained variations. Datasets like Labeled Faces in the Wild, YouTubeFaces, IAPRA, Jenus Benchmark C and different neural-net face descriptors are used to compare the algorithm with other methods. In contrast with the conventional nearest-neighbor method, the proposed approach is shown to allow a 2- to 16-times reduction of the classifier running time.

It is proved that sign network with elliptical distribution with known shift parameter is equivalent to sign network with elliptical distribution with unknown shift parameter estimated as sample mean. This result is proved for the case of independent identically distributed observation and for the case of sample from matrix elliptically contoured distribution with any dependence between observations.

In this paper, we consider the problem of event recognition on single images. In contrast to conventional fine-tuning of convolutional neural networks (CNN), we proposed to use image captioning, i.e., a generative model that converts images to textual descriptions. The motivation here is the possibility to combine conventional CNNs with a completely different approach in an ensemble with high diversity. As event recognition task has nothing serial or temporal, obtained captions are one-hot encoded and summarized into a sparse feature vector suitable for the learning of an arbitrary classifier. We provide the experimental study of several feature extractors for Photo Event Collection, Web Image Dataset for Event Recognition and Multi-Label Curation of Flickr Events Dataset. It is shown that the image captions trained on the Conceptual Captions dataset can be classified more accurately than the features from an object detector, though they both are obviously not as rich as the CNN-based features. However, an ensemble of CNN and our approach provides state-of-the-art results for several event datasets.

Independent domination is one of the rare problems for which the complexity of weighted and unweighted versions is known to be different in some classes of graphs. Trying to better understand the gap between the two versions of the problem, in the present paper we prove two complexity results. First, we extend NP-hardness of the weighted version in a certain class to the unweighted case. Second, we strengthen polynomial-time solvability of the unweighted version in the class of P2+P3-free graphs to the weighted case. This result is tight in the sense that both versions are NP-hard in the class of P3+P3-free graphs, i.e. P3+P3 is a minimal graph forbidding of which produces an NP-hard case for both versions of the problem.

The problem of automated quality control of audio recordings containing voice samples of individuals is considered. It is shown that in solving this problem the most acute impediment is the problem of small samples of observations. To overcome the problem, a new, high-speed method of acoustic measurements is proposed, based on the principle of relative stability of the frequency of the fundamental tone of a speech signal within a voice sample of short duration. An example of practical implementation of the developed method according to a scheme with inter-period signal accumulation is considered. Using proprietary software, a full-scale experiment was conducted in which statistical estimates of the effectiveness of the method in noise conditions were obtained. It is shown that when using the proposed method, if the signal-to-noise ratio is lower than 15 dB, an audio recording is rejected with a probability of 0.95 or more as unsuitable for biometric identification of a person. The results obtained are intended for use in the development of new systems and modernization of existing systems and technologies for collection and automated quality control of biometric personal data. The article is intended for a wide circle of specialists in the field of acoustic measurements and digital processing of speech signals, as well as for practitioners who organize the work of authorized organizations in preparing samples of biometric personal data for registration with the unified biometric system (UBS).

Under consideration is the Successive Minima Problem for the 2-dimensional lattice with respect to the order given by some conic function f.We propose an algorithm with complexity of 3.32*log_2(R)+O(1) calls to the comparison oracle of f, where R is the radius of the circular searching area, while the best known lower oracle complexity bound is 3*log_2(R) + O(1).We give an efficient criterion for checking that given vectors of a 2-dimensional lattice are successive minima and form a basis for the lattice.Moreover, we show that the similar Successive Minima Problem for dimension n can be solved by an algorithm with at most (O(n))^{2n}*log_2(R) calls to the comparison oracle. The results of the article can be applied to searching successive minima with respect to arbitrary convex functions defined by the comparison oracle.

This proceedings presents the result of the 8th International Conference in Network Analysis, held at the Higher School of Economics, Moscow, in May 2018. The conference brought together scientists, engineers, and researchers from academia, industry, and government. Contributions in this book focus on the development of network algorithms for data mining and its applications. Researchers and students in mathematics, economics, statistics, computer science, and engineering find this collection a valuable resource filled with the latest research in network analysis. Computational aspects and applications of large-scale networks in market models, neural networks, social networks, power transmission grids, maximum clique problem, telecommunication networks, and complexity graphs are included with new tools for efficient network analysis of large-scale networks. Machine learning techniques in network settings including community detection, clustering, and biclustering algorithms are presented with applications to social network analysis.

In the current paper we provide a proof of NP-completeness for the Cell Formation Problem (CFP) with the fractional grouping efficacy objective function. First the CFP with a linear objective function is considered. Following the ideas of Pinheiro et al. (2016) we show that it is equivalent to the Bicluster Graph Editing Problem (BGEP), which is known to be NP-complete due to the reduction from the 3-Exact 3-Cover Problem – 3E3CP (Amit, 2004). Then we suggest a polynomial reduction of the CFP with the linear objective to the CFP with the grouping efficacy objective. It proves the NP-completeness of this fractional CFP formulation. Along with the NP-status our paper presents important connections of the CFP with the BGEP and 3E3CP. Such connections could be used for ”transferring” of known theoretical properties, efficient algorithms, polynomial cases, and other features of well-studied graph editing and exact covering problems to the CFP.

In the present paper we prove that frames of one-dimensional separatrices in basins of sources of structurally stable 3-diffeomorhisms with two-dimensional expanding attractor are trivially embedded. This result plays an important part in the classification of such systems. The classification was given by V. Grines and E. Zhuzhoma with assumption that all one-dimensional separatrices are trivially embedded into the ambient manifold but the proof of the assumption was never given. Thus, the present paper is a necessary and nontrivial element of the classification of structurally stable diffeomorphisms with codimension one expanding attractors.

We consider a class of sequential network interdiction problem settings where the interdictor has incomplete initial information about the network while the evader has complete knowledge of the network including its structure and arc costs. In each decision epoch, the interdictor can block (for the duration of the epoch) at most k arcs known to him. By observing the evader's actions, the interdictor learns about the network structure and costs and thus, can adjust his actions in subsequent decision epochs. It is known from the literature that if the evader is greedy (i.e., she uses the shortest available path in each decision epochs), then under some assumptions the greedy interdiction policies that block k-most vital arcs in each epoch are efficient and have a finite regret. In this paper, we consider the evader's perspective and explore strategic evasion policies under the assumption that the interdictor is greedy. We first study the theoretical computational complexity of the evader's problem. Then we derive basic constructive properties of optimal evasion policies for two decision epochs when the interdictor has no initial information about the network structure. These properties are then exploited for the design of a heuristic algorithm for a strategic evader in a general setting with an arbitrary time horizon and any initial information available to the interdictor. Our computational experiments demonstrate that the proposed heuristic consistently outperforms the greedy evasion policy on several classes of synthetic network instances under perfect and noisy information feedbacks. Finally, some interesting insights from our theoretical and computational results conclude the paper.

If the training data set in image recognition task is not very large, the feature extraction with a convolutional neural network is usually applied. Here, we focus on the nonparametric classification of extracted feature vectors using the probabilistic neural network (PNN). The latter is characterized by the high runtime and memory space complexity. We propose to overcome these drawbacks by replacing the exponential activation function in the Gaussian kernel to the complex exponential functions. Such complex nonlinearities make it possible to accurately approximate the unknown density function using the network with the number of neurons proportional to only cubic root of the database size. As a result, the proposed approach decreases the runtime and memory complexities of the PNN without losing its main advantages, namely, fast training and convergence to the Bayesian decision. In the experimental study, we describe a protocol for comparing recognition methods using the well-known visual object category data sets in the context of the small sample size problem. It has been experimentally shown that our approach rapidly obtains accurate decisions when compared to the known classifiers including the baseline PNN.

In this paper, we show that the weighted vertex coloring problem can be solved in polynomial on the sum of vertex weights time for {P_5, K_{2,3}, K_{2,3}^+}-free graphs. As a corollary, this fact implies polynomial-time solvability of the unweighted vertex coloring problem for {P_5, K_{2,3}, K_{2,3}^+}-free graphs. As usual, P_5 and K_{2,3} stands, respectively, for the simple path on 5 vertices and for the biclique with the parts of 2 and 3 vertices, K_{2,3} denotes the graph, obtained from a K_{2,3} by joining its degree 3 vertices with an edge.

For any *n*, in the set of *n*-vertex trees such that any two leaves have no common adjacent vertex, we describe the trees with the smallest number of maximal independent sets.