# School Program

**Tuesday, May 1st**

11:00 – 11:30: Meeting at the main HSE building – 25/12, B. Pecherskaya Str.

11:30 – 12:30: Outgoing to the venue of the school

13:00 – 14:00: Accommodation

14:00 – 15:00: Lunch

15:00 – 16:00: Registration

19:00 – 20:00: Dinner

Wednesday, May 2nd

8:30 – 9:30: Breakfast

9:30 – 10:50: Pando G. Georgiev Optimization methods for data analysis

10:50 – 11:10: Coffee Break

11:10 – 12:30: Pando G. Georgiev Optimization methods for data analysis

12:30 – 14:00: Lunch Break

14:00 – 15:20: Petar Momcilovic Queueing Models in Service Engineering

15:30 – 16:50: Pando G. Georgiev Optimization methods for data analysis

17:00 – 17:40: Seminars Themistocles Rassias, Pando G. Georgiev

19:00 – 20:00: Dinner

Thursday, May 3rd

8:30 – 9:30: Breakfast

9:30 – 10:50: Pando G. Georgiev Optimization methods for data analysis

10:50 – 11:10: Coffee Break

11:10 – 12:30: Pando G. Georgiev Optimization methods for data analysis

12:30 – 13:50: Lunch Break

13:50 – 15:10: Themistocles Rassias On some old and new results and problems in nonlinear

analysis and major trends in mathematics

15:20 – 17:20: Themistocles Rassias On some old and new results and problems in nonlinear

analysis and major trends in mathematics

17:30 – 18:50: Seminars Petar Momcilovic, Themistocles Rassias, Pando G. Georgiev

19:00 – 20:00: Dinner

Friday, May 4th

8:30 – 9:30: Breakfast

9:30 – 10:50: Themistocles Rassias On some old and new results and problems in

nonlinear analysis and major trends in mathematics

10:50 – 11:10: Coffee Break

11:10 – 12:30: Petar Momcilovic Queueing Models in Service Engineering

12:30 – 13:50: Lunch Break

13:50 – 15:50: Themistocles Rassias On some old and new results and problems in

nonlinear analysis and major trends in mathematics

16:00 – 18:50: Seminars Christodoulos A. Floudas

19:00 – 20:00: Dinner

Saturday, May 5th

8:30 – 9:30: Breakfast

9:30 – 10:50: Boris Mirkin Clustering for Similarity and Network Data

10:50 – 11:10: Coffee Break

11:10 – 12:30: Boris Mirkin Clustering for Similarity and Network Data

12:30 – 13:50: Lunch Break

13:50 – 15:10: Sergiy Butenko Clique Relaxation Models in Networks: Theory,

Algorithms, and Applications

15:20 – 16:40: Christodoulos A. Floudas Biclustering methods and their applications. De novo

protein design. Hybrid feedstock energy processes

using biomass, coal and natural gas for the production

of liquid transportation fuels

16:50 – 18:10: Christodoulos A. Floudas Biclustering methods and their applications. De novo

protein design. Hybrid feedstock energy processes

using biomass, coal and natural gas for the production

of liquid transportation fuels

19:00 – 20:00: Dinner

Sunday, May 6th

8:30 – 9:30: Breakfast

9:30 – 10:50: Christodoulos A. Floudas Biclustering methods and their applications. De novo

protein design. Hybrid feedstock energy processes

using biomass, coal and natural gas for the production

of liquid transportation fuels

10:50 – 11:10: Coffee Break

11:10 – 12:30: Christodoulos A. Floudas Biclustering methods and their applications. De novo

protein design. Hybrid feedstock energy processes

using biomass, coal and natural gas for the production

of liquid transportation fuels

12:30 – 13:50: Lunch Break

13:50 – 15:10: Sergiy Butenko Clique Relaxation Models in Networks: Theory,

Algorithms, and Applications

15:20 – 16:40: Christodoulos A. Floudas Biclustering methods and their applications. De novo

protein design. Hybrid feedstock energy processes

using biomass, coal and natural gas for the production

of liquid transportation fuels

16:50 – 18:10: Christodoulos A. Floudas Biclustering methods and their applications. De novo

protein design. Hybrid feedstock energy processes

using biomass, coal and natural gas for the production

of liquid transportation fuels

19:00 – 20:00: Dinner

Monday, May 7th

8:30 – 9:30: Breakfast

09:30 – 10:50: Seminars Christodoulos A. Floudas, Sergiy Butenko

12:00 – 13:00: Outgoing to the main HSE building – 25/12, B. Pecherskaya Str.

Abstracts

Pando G. Georgiev

Optimization methods for data analysis

The proposed course provides a concise and rigorous introduction to some rapidly expanding methods for data analysis; develops conceptual and mathematical tools for them and creates a basis for independent research in interdisciplinary fields, combining concepts and techniques from operations research, optimization algorithms, machine learning theory in statistics, and kernel methods from mathematical analysis. The course includes topics in data mining and machine learning as follows:

1. Kernel methods - an overview, Reproducing Kernel Hilbert spaces, Reproducing Kernel Banach Spaces. Mapping the data in Reproducing Kernel Hilbert Space (RKHS) implicitly, gives possibilities of nonlinear algorithms (in the initial space) to run as linear ones in RKHS. Many key algorithms for data analysis exploit the advantages of the kernel methods. We extend the idea of Reproducing Kernel Hilbert Spaces to Banach spaces (and beyond), developing a theory without the requirement of existence of semi-inner product (which requirement is already explored in another construction of RKBS).

2. Applications of Kernel Methods: Kernel Principal Component Analysis, Kernel k-Means, Kernel Discriminant Analysis, Spectral Clustering.

3. Support Vector Machines and variants, Kernel Regression and its generalization - adaptive multiclass learning.

4. Nonlinear skeletons of data sets and skeleton classifiers. A particular case of multiclass learning problems is the problem of subspace clustering, which we extend to RKBS defining in such a way the concept of nonlinear skeleton and its derivative, Skeleton Classifier.

5. Data decomposition methods based on sparsity: Sparse Component Analysis, Non-negative Matrix Factorization, Compressive Sensing.

6. Data decomposition methods based on statistical independence: Independent Component Analysis and measures for statistical dependence.

7. Data decomposition methods based on high-order structures: tensor decompositions.

8. Iterative roots of multidimensional operators and dynamic system trajectory reconstruction. Square roots, or more generally, iterative roots of operators are of interest in dynamical systems, chaos and complexity theory and also in the modeling of certain industrial and financial processes. An operator f acting from a set X to X, satisfying the functional equation f(f(x)) = F(x) (for every x from X) is called "square root" of the given operator F acting from X to X. The problem of computing square roots (or their approximations) of operators remains a hard task. While the theory of functional equations provides some insight for the iterative roots of real and complex valued functions, iterative roots of mappings in high dimensional spaces are almost not studied and there are little contributions to numerical algorithms for their computation. We extend some results for existence of square roots from the scalar case to a certain class of monotone mappings in Hilbert spaces. We demonstrate how methods based on neural networks and statistical learning theory can find square roots of trajectories of certain dynamical systems.

Although some topics of the course are already classical (as Support Vector Machines), the presentation will emphasize new points of view and ideas for extensions. Other topics are very new and subject to recent development (as Reproducing Kernel Banach Spaces, dynamic system trajectory reconstruction, sparse component analysis, nonlinear skeletons). The presentation will be illustrated with algorithms implemented in Matlab.

Petar Momcilovic

Queueing Models in Service Engineering

Four out of five workers in the developed world are employed in the service sector (as opposed to manufacturing or agriculture). The goal of Service Engineering is to develop scientifically based design principles and tools, that support and balance service quality, efficiency and profitability, from the perspectives customers, servers and managers. In this course, we review relevant queueing models along with some mathematical background. In particular, our focus is on asymptotic analyses that identify various operational regimes and yield practically useful rules-of-thumb.

Themistocles Rassias

On some old and new results and problems in nonlinear analysis and major trends in mathematics

An attempt will be made to present in a unified manner some results and problems as well as new directions for further research in both pure and interdisciplinary research. Furthermore we will present some ideas regarding the present state and the near future of Mathematics.

Boris Mirkin

Clustering for Similarity and Network Data

1. Introduction: Sources of Similarity Data and Clustering Problems.

2. Summary similarity clustering: min cut and normalized cut. Uniform clustering. Modularity clustering. Laplacian Transformation. Spectral clustering. Illustrative examples.

3. Additive crisp and spectral clustering. One by one fitting approach and spectral clustering. Comparing different approaches to fuzzy clustering. Examples.

Sergiy Butenko

Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

Clique relaxation models that were originally introduced in the literature on social network analysis are not only gaining increasing popularity in a wide spectrum of complex network applications, but also keep garnering attention of mathematicians, computer scientists, and operations researchers as a promising avenue for fruitful theoretical investigations. This short course discusses the origins of clique relaxation concepts and provides an overview of recent developments in theory behind them, algorithms for solving the corresponding optimization problems, and selected real-life applications of the models of interest.

Christodoulos A. Floudas

Biclustering methods and their applications. De novo protein design. Hybrid feedstock energy processes using biomass, coal and natural gas for the production of liquid transportation fuels

This short course of 4hrs duration will focus on the following research domains and provide a state of the art description of the most recent advances: (a) Biclustering methods and their applications; (b) De novo protein design; and (c) Hybrid feedstock energy processes using biomass, coal and natural gas for the production of liquid transportation fuels. The titles and abstracts of the three subjects are presented in the sequel.

Advances in Biclustering Methods for Re-Ordering Data Matrices in Systems Biology, Drug Discovery, and Prediction of In Vivo Toxicities from In Vitro Experimental Data

Biclustering has emerged as an important problem in the analysis of gene expression data since genes may only jointly respond over a subset of conditions. Many of the methods for biclustering, and clustering algorithms in general, utilize simplified models or heuristic strategies for identifying the ``best'' grouping of elements according to some metric and cluster definition and thus result in suboptimal clusters.

In the first part of the presentation, we present a rigorous approach to biclustering, OREO, which is based on the Optimal RE-Ordering of the rows and columns of a data matrix so as to globally minimize the dissimilarity metric [1,2]. The physical permutations of the rows and columns of the data matrix can be modeled as either a network flow problem or a traveling salesman problem. The performance of OREO is tested on several important data matrices arising in systems biology to validate the ability of the proposed method and compare it to existing biclustering and clustering methods.

In the second part of the talk, we will focus on novel methods for clustering of data matrices that are very sparse [3]. These types of data matrices arise in drug discovery where the x- and y-axis of a data matrix can correspond to different functional groups for two distinct substituent sites on a molecular scaffold. Each possible x and y pair corresponds to a single molecule which can be synthesized and tested for a certain property, such as percent inhibition of a protein function. For even moderate size matrices, synthesizing and testing a small fraction of the molecules is labor intensive and not economically feasible. Thus, it is of paramount importance to have a reliable method for guiding the synthesis process to select molecules that have a high probability of success. In the second part of the presentation, we introduce a new strategy to enable efficient substituent reordering and descriptor-free property estimation. Our approach casts substituent reordering as a special high-dimensional rearrangement clustering problem, eliminating the need for functional approximation and enhancing computational efficiency [4, 5]. Deterministic optimization approaches based on mixed-integer linear programming can provide guaranteed convergence to the optimal substituent ordering. The proposed approach is demonstrated on a sparse data matrix (about 29% dense) of inhibition values for 14,043 unknown compounds provided by Pfizer Inc. It is shown that an iterative synthesis strategy is able to uncover a significant percentage of the lead molecules while using only a fraction of total compound library, even when starting from a mere 3% of the total library space.

In the third part of the presentation, we combine the strengths of integer linear optimization and machine learning to predict in vivo toxicities for a library of pesticide chemicals using only in vitro data. Our approach utilizes a biclustering method based on iterative optimal re-ordering [1,2] to identify biclusters corresponding to subsets of chemicals that have similar responses over distinct subsets of the in vitro assays. This enables us to determine subsets of experimental assays that are most likely to be correlated with toxicity, according to the in vivo data set. An optimal method based on integer linear optimization (ILP) for re-ordering sparse data matrices [3] is also applied to the in vivo dataset (21.3% sparse) in order to cluster endpoints that have similar lowest effect level (LEL) values, where it is observed that endpoints are grouped according to similar physiological attributes. Based upon the clustering results of the in vitro and in vivo data sets, logistic regression is then utilized to (a) learn the correlation between the subsets of in vitro data and the in vivo responses, and (b) subsequently predict the toxicity signatures of the chemicals. Our approach aims to find the highest prediction accuracy using the minimum number of in vitro descriptors.

De Novo Design of Proteins and Protein-Peptide Complexes

Proteins serve as vital components in our cellular makeup and perform many biological functions that are essential for sustaining life. An important feature which determines the functionality of a protein is the form of its three-dimensional structure. Elucidated protein structures can also serve as templates for the de novo protein design which is of major importance in structure-based drug design and plays a pivotal role in the scientific challenge/roadmap: sequence to structure to function.

The primary objective in de novo protein design is to determine the amino acid sequences which are compatible with specific template backbone structures that may be rigid or flexible. It is of fundamental importance since it addresses the mapping of the space of amino acid sequences to the space of known protein folds or postulated/putative protein folds. It is also of significant practical importance since it can lead to the improved design of inhibitors, design of novel sequences with better stability, design of catalytic sites of enzymes, and drug discovery.

The first part of this presentation will provide a motivation for the de novo protein design problem with flexible backbone template structures and an overview of the advances and limitations of the existing approaches. The second part will introduce a novel two stage approach which takes into account explicitly the flexibility of the templates. The first stage addresses the in silico sequence selection problem. The second stage addresses the fold specificity by performing structure prediction calculations using atomistic level force fields and the first principles approach, Astro-Fold. The probabilities of each sequence to fold specifically to the flexible templates are calculated. The third part will introduce an approach for the prediction of approximate binding affinities in protein-peptide complexes. Computational prediction results and experimental validation for proteins and protein-peptide complexes that include variants of Compstatin, human beta defensins, C3a, inhibitors for Complement C3c, entry inhibitors for HIV-1, inhibitors for histone methyltransferase EZH2 and inhibitors for histone demethylases LSD1/2 will be presented.

Novel Hybrid Biomass, Coal, and Natural Gas to Liquids (CBGTL) Systems: Process Design, Process Synthesis, Global Optimization and Supply Chain Optimization

Heavy dependence on petroleum and high greenhouse gas (GHG) emissions from the production, distribution, and consumption of hydrocarbon fuels pose serious challenges for the United States (US) transportation sector. Depletion of domestic petroleum sources combined with a volatile global oil market prompt the need to discover alternative fuel-producing technologies that utilize domestically abundant sources. The primary aim in the discovery of hybrid energy processes is to combine coal, biomass, and natural gas to meet the United States transportation fuel demand.

The first part of this presentation will outline the needs and introduce novel hybrid feedstock coal, biomass, and natural gas to liquids (CBGTL) process alternatives which employ the reverse water-gas-shift reaction and a plethora of process alternatives through a superstructure-based optimization framework.

The second part of the presentation will address important decisions at the process design and process synthesis level. Mathematical models for biomass and coal gasification are developed to model the nonequilibrium effluent conditions using a stoichiometry-based method. A thermochemical based process superstructure, its mixed-integer nonlinear optimization (MINLP) model, and systematic approaches for its global optimization will be discussed. Simultaneous heat, power and water integration takes place at the process synthesis stage. Case studies will be presented along with their techno-economic analysis that determines the break-even price of crude oil (BEOP) and suggests that the CBGTL process is competitive with existing petroleum-based processes, while at the same time attains at least 50% reduction of GHG emissions.

The third part will present a novel framework for the optimal energy supply chain of CBGTL processes. A mathematical model will be introduced that minimizes the total network cost while simultaneously evaluates the environmental performance through a life cycle analysis of each individual plant. The optimal network topology provides information on (i) the optimal plant locations throughout the country, (ii) the locations of feedstock sources, (iii) the interconnectivity between the feedstock source locations, CBGTL plants locations, and the demand locations, (iv) the modes of transportation used in each connection, and (v) the flow rate amounts of each feedstock and product type. Life cycle analysis on the nationwide energy supply chain shows that at least 50% reduction of GHG emissions is attainable.

Have you spotted a typo?

Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

To be used only for spelling or punctuation mistakes.