Dept. of Mathematics and Computer Science

D. Borza, T. Ileni, A. Marinescu, A. Dărăbant, Teacher or supervisor? Effective online knowledge distillation via guided collaborative learning, Elsevier ScienceDirect, Computer Vision and Image Understanding, 2023

Knowledge distillation is a widely-used and effective technique to boost the performance of a lightweight student network, by having it mimic the behavior of a more powerful teacher network. This paper presents an end-to-end online knowledge distillation strategy, in which several peer students are trained together and their predictions are aggregated into a powerful teacher ensemble via an effective ensembling technique that uses an online supervisor network to determine the optimal way of combining the student logits. Intuitively, this supervisor network learns the area of expertise of each student and assigns a weight to each student accordingly; it has knowledge of the input image, the ground truth data, and the predictions of each individual student, and tries to answer the following question: “how much can we rely on each student’s prediction, given the current input image with this ground truth class?”. The proposed technique can be thought of as an inference optimization mechanism as it improves the overall accuracy over the same number of parameters. The experiments we performed show that the proposed knowledge distillation consistently improves the performance of the knowledge-distilled students vs. the independently trained students.

T. Ileni, A. Dărăbant, D. Borza, A. Marinescu, DynK-hydra: improved dynamic architecture ensembling for efficient inference, Complex & Intelligent Systems, Springer-Link 2022

Accessibility on edge devices and the trade-off between latency and accuracy is an area of interest in deploying deep learning models. This paper explores a Mixture of Experts system, namely, DynK-Hydra, which allows training of an ensemble formed of multiple similar branches on data sets with a high number of classes, but uses, during the inference, only a subset of necessary branches. We achieve this by training a cohort of specialized branches (deep network of reduced size) and a gater/supervisor, that decides dynamically what branch to use for any specific input. An original contribution is that the number of chosen models is dynamically set, based on how confident the gater is (similar works use a static parameter for this). Another contribution is the way we ensure the branches’ specialization. We divide the data set classes into multiple clusters, and we assign a cluster to each branch while enforcing its specialization on this cluster by a separate loss function. We evaluate DynK-Hydra on CIFAR-100, Food-101, CUB-200, and ImageNet32 data sets and we obtain improvements of up to 4.3% accuracy compared with state-of-the-art ResNet. All this while reducing the number of inference flops by a factor of 2–5.5 times. Compared to a similar work (HydraRes), we obtain marginal accuracy improvements of up to 1.2% on the pairwise inference time architectures. However, we improve the inference times by up to 2.8 times compared to HydraRes.

D. Borza, A. Dărăbant, T. Ileni, A. Marinescu, Effective Online Knowledge Distillation via Attention-Based Model Ensembling, Mathematics 2022

Large-scale deep learning models have achieved impressive results on a variety of tasks; however, their deployment on edge or mobile devices is still a challenge due to the limited available memory and computational capability. Knowledge distillation is an effective model compression technique, which can boost the performance of a lightweight student network by transferring the knowledge from a more complex model or an ensemble of models. Due to its reduced size, this lightweight model is more suitable for deployment on edge devices. In this paper, we introduce an online knowledge distillation framework, which relies on an original attention mechanism to effectively combine the predictions of a cohort of lightweight (student) networks into a powerful ensemble, and use this as a distillation signal. The proposed aggregation strategy uses the predictions of the individual students as well as ground truth data to determine a set of weights needed for ensembling these predictions. This mechanism is solely used during system training. When testing or at inference time, a single, lightweight student is extracted and used. The extensive experiments we performed on several image classification benchmarks, both by training models from scratch (on CIFAR-10, CIFAR-100, and Tiny ImageNet datasets) and using transfer learning (on Oxford Pets and Oxford Flowers datasets), showed that the proposed framework always leads to an improvement in the accuracy of knowledge-distilled students and demonstrates the effectiveness of the proposed solution. Moreover, in the case of ResNet architecture, we observed that the knowledge-distilled model achieves a higher accuracy than a deeper, individually trained ResNet model.

A. Marinescu, H. Mureșan, A. Călin, A. Coroiu, M. Talla, FREIDA – Fracture Risk Evaluation using Highly Efficient Information Retrieval and Analysis of Large Healthcare Datasets, IEEE BigData 2021, 2021 IEEE International Conference on Big Data, Multi-Modal Medical Data Analysis Workshop

This paper presents an approach based on optimized data search and data analysis of high dimensional clinical health records, for the purpose of identifying the patients with significant risk of fracture to be introduced to the Fracture Liaison Service (FLS) for further investigations and treatment. The main objective of the work is to speed up and automate the process of identifying the patients suitable for FLS and provide a real decision support for the clinical team by fast information retrieval and structuring. The performed analysis and the current results are promising and support our research aim for the optimization of the process in the real life scenario.

A. Marinescu, Automatic Face Shape Classification via Facial Landmark Measurements, Studia Universitatis Babeș-Bolyai Informatica

This paper tackles the sensitive subject of face shape identification via near neutral-pose 2D images of human subjects. The possibility of extending to 3D facial models is also proposed, and would alleviate the need for the neutral stance. Accurate face shape classification serves as a vital building block of any hairstyle and eye-wear recommender system. Our approach is based on extracting relevant facial landmark measurements and passing them through a naive Bayes classifier unit in order to yield the final decision. The literature on this subject is particularly scarce owing to the very subjective nature of human face shape classification. We wish to contribute a robust and automatic system that performs this task and highlight future development directions on this matter.

A. Marinescu, A. Dărăbant, T. Ileni, Optimal Stereo Camera Calibration via Genetic Algorithms, IJCAI 2021 AI4AD Workshop, Artificial Intelligence for Autonomous Driving

“Stereo camera real world measurement accuracy is strongly dependent on the camera calibration process, which, in turn, relies on the set of input images and camera model. There is a strong relationship between the quality of the input, the calibration process and the accuracy of the 3D reconstruction. The total set of calibration images has a strong influence on the calibration outcome and selecting the optimal subset of images and parameters for calibration estimation is a problem of combinatorial complexity. In this paper we propose a genetic algorithm calibration image selection process, driven by multiple metrics, that eliminates the image and parameter selection bias and avoids exploratory search over the entire solution space. We numerically compare the contribution of our image and parameter selection algorithm to the overall reconstruction accuracy with the state of the art methods in stereo and mono calibration and show that we can bring substantial improvements to the 3D restitution accuracy or mono calibration parameter estimation.”

A. Marinescu, H. Mureșan, A. Călin, A. Coroiu, FREIDA – Fracture Risk Evaluation using Intelligent Data Analysis, Small Business Research Initiative (SBRI) Competition Funded by NHS Scotland, Challenge C: Deliver safer and better care for people in Scotland with diabetes

Over the last decade, the entire paradigm of diabetes management has been transformed due to the integration of new technologies such as continuous glucose monitoring devices and the development of the artificial pancreas, along with the exploitation of data acquired by applying these novel tools. AI (Artificial intelligence) is attracting increased attention in this field because the amount of data acquired electronically from patients suffering from diabetes has grown exponentially. By means of complex and refined methods, AI has been shown to provide useful management tools to deal with these incremental repositories of data. In this project we propose the development of an innovative machine learning (ML) software solution, FREIDA, based on latest developments in big data mining and ML, designed to have a key role in the delivery of care for patients with diabetes, by early predicting or identifying patient risks development and proposing therapeutic interventions to prevent further deterioration, injury or hospitalization. We envision our product will have a key role in the therapeutic routine of the future medical system, both in private and public sector, among health insurers which possess valuable patient data which can be used to infer new knowledge without additional screening, thus it does not overburden the system, but simplifies decision making and best practices in routine checks and interventions.

A. Marinescu, A musical similarity metric based on Symbolic Aggregate Approximation, SoftCOM 2020, 28th International Conference on Software, Telecommunications and Computer Networks

We have continued our work in the field of AI-driven music synthesis and have improved upon our previous 3-layer gated recurrent unit neural network, with results confirming higher accuracy and much smaller validation loss. In order to achieve this, we have designed a recurrent neural architecture that is more suited to learning the musical style of J. S. Bach from an enhanced database of partitas and sonatas. This architecture is based on four independent channels, each having two gated-recurrent units, with a bi-directional long short-term memory unit in between. However, the main incentive of this paper is finding a metric which is able to measure on a normalized scale the similarity between an artificial musical composition and the style of the famous composer. This measure is heavily based on a signal processing technique known as symbolic aggregate approximation. As a final note, we perform a statistical analysis of the recurring motifs in the works of J. S. Bach, and hint on the possibility of exploiting them to further increase the quality of the output.

A. Marinescu, A. Dărăbant, T. Ileni, A Fast and Robust, Forehead-Augmented 3D Face Reconstruction from Multiple Images using Geometrical Methods, SoftCOM 2020, 28th International Conference on Software, Telecommunications and Computer Networks

“3D Face Reconstruction is a complex problem in Computer Vision. Until recently the existing methods were based on multiple image captures and solving complex dense correspondences between different face poses. Recent methods are based on volumetric CNNs and try to reconstruct the 3D face model from a single image. Accurate 3D face reconstructions are used nowadays for avatar modelling, in eyewear and hairstyle recommendation systems. All these require accurate face shape determination, which is subject to at least a fully frontalized 2D projection of the face, or even better to an accurate 3D volumetric reconstruction of the face. Most of the existing methods for 3D reconstruction stop somewhere in the middle of the forehead, limiting thus the obtained 3D model. We propose a mostly geometric method for facial reconstruction, based on structure from motion techniques on uncalibrated cameras, augmented with forehead surface modelling for added realism. We present a full section for proving that we are on par with state of the art deep learning techniques.”

A. Marinescu, T. Ileni, A. Dărăbant, A Versatile 3D Face Reconstruction from Multiple Images for Face Shape Classification, SoftCOM 2019, 27th International Conference on Software, Telecommunications and Computer Networks

“In this paper we present a 3D facial reconstruction algorithm for facial shape classification. We propose a reconstruction method that preserves facial ratios independent on the captured poses allowing thus a precise facial shape classification/determination. We use an Active Appearance Model on a set of facial image captures with different poses on which we auto-calibrate the camera parameters (intrinsic and extrinsic). We then fit/morph the statistical model to a set of facial landmarks in order to obtain an accurate 3D face reconstruction. We show the obtained results and compare them with other approaches in the literature.”

A. Marinescu, Bach 2.0 – Generating Classical Music using Recurrent Neural Networks, KES 2019, 23rd International Conference on Knowledge-Based and Intelligent Information & Engineering Systems

“The main incentive of this paper is to approach the sensitive subject of classical music synthesis in the form of musical scores by providing an analysis of different Recurrent Neural Network architectures. We will be discussing in a side-by-side comparison two of the most common neural network layers, namely Long-Short Term Memory and Gated Recurrent Unit, respectively, and study the effect of altering the global architecture meta-parameters, such as number of hidden neurons, layer count and number of epochs on the categorical accuracy and loss. A case study is performed on musical pieces composed by Johann Sebastian Bach and a method for estimating the repetition stride in a given musical piece is introduced. This is identified as the primary factor in optimizing the input length that must be fed during the training process.”

A. Marinescu, A. Andreica, Evolving Mathematical Formulas using LINQ Expression Trees and Direct Applications to Credit Scoring, SYNASC 2018, 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing

“Credit scoring is a well established and scrutinized domain within the artificial intelligence field of research and has direct implications in the functioning of financial institutions, by evaluating the risk of approving loans for different clients, which may or may not reimburse them in due time. It is the clients who fail to repay their debt that we are interested in predicting, which makes it a much more difficult task, since they form only a small minority of the total client count. From an input-output perspective, the problem can be stated as: given a set of client properties, such as age, marital status, loan duration, one must yield a 0-1 response variable, with 0 meaning “good” and 1, “bad” clients. Many techniques with high accuracy exist, such as artificial neural networks, but they behave as black box units. We add to this whole context the constraint that the output must be a concrete, tractable mathematical formula, which provides significant added value for a financial analyst. To this end, we present a means for evolving mathematical formulas using genetic programming coupled with Language Integrated Query expression trees, a feature present in the C# programming language.”

A. Marinescu, Z. Bálint, L. Dioșan, A. Andreica, Unsupervised and Fully Autonomous 3D Medical Image Segmentation based on Grow Cut, SYNASC 2018, 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing

“Extending and optimizing cellular automata to handle 3D volume segmentation is a non-trivial task. First, it does not suffice to simply alter the cell neighborhood (be it von Neumann or Moore), and second, going from 2D to 3D means that the number of operations increases by an order of magnitude, thus GPU acceleration becomes a necessity, advantage inherent to cellular automata approaches. When discussing 3D medical imagistics, we mean that the entire stack of slices from a certain sequence within an acquisition is stored as a single entity. This, in turn, enables us to accurately segment whole volumes in a single run, which would otherwise need per-slice segmentation followed by a stitching post-process. This paper focuses mainly on a thorough benchmark analysis of the 3D Unsupervised Grow Cut technique. We discuss algorithm speed of convergence, stability and behavior with respect to global meta-parameters such as segmentation threshold, keeping track of output quality metrics as the algorithm unfolds. Our end goal is to segment the heart cavities from cardiac MRI and to yield an interactive 3D reconstruction which can be easily handled and analyzed by the radiologist.”

A. Marinescu, Z. Bálint, L. Dioșan, A. Andreica, Dynamic autonomous image segmentation based on Grow Cut, ESANN 2018, 26th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning

“The main incentive of this paper is to provide an enhanced approach for 2D medical image segmentation based on the Unsupervised Grow Cut algorithm, a method that requires no prior training. This paper assumes that the reader is, to some extent, familiar with cellular automata and their function as they make up the core of this technique. The benchmarks were performed on 2D MRI images of the heart and chest cavity. We obtained a significant increase in the output quality as compared to classical Unsupervised Grow Cut by using standard measures, based on the existence of accurate ground truth. This increase was obtained by dynamically altering the local threshold parameter. In conclusion, our approach provides the opportunity to become a building block of a computer aided diagnostic system.”

A. Marinescu, A Genetic Algorithm Approach for Evolving Neural Networks, Studia Universitatis Babeș-Bolyai Informatica

“We present an alternative approach for training feed-forward neural networks (abbrev. NN) by means of a genetic algorithm (abbrev. GA) that alters the network’s hidden weights and biases. We, by no means, out-rule the back-propagation training algorithm, but instead use it to train the evolved NNs for a much smaller number of generations and focus more on the mutation and crossover operators and how they can be applied. The basic principle involved is that each and every NN can be treated as a chromosome for a GA and, as a consequence, is subject to the mutation and crossover operators. A notable advantage of our approach is that we not only avoid over-fitting the NN, but are also able to alter the number of hidden neurons that make up the hidden layer of the NN, effectively removing the need for the user to specify them explicitly. We manage to outperform plain-vanilla NNs by a factor of 1 to 10 percent on well known data sets. At first this may not seem significant, but it becomes crucial when dealing with applications where accuracy is critical and training time is not an issue (such as disease diagnosis).”

A. Marinescu, C. Mihălceanu, D. Lapsanschi, M. Albert, MoodFlux – a NUI-based music player, Intel RealSense App Challenge 2014 Pioneer Track Winners

“Intel RealSense App Challenge is a worldwide developer challenge which aims to change how users interact with their devices, and the world around them. With 3D capabilities, hand and finger tracking, facial analysis and voice command, new collaboration and sharing technologies, and more, the computing experience will be taken to a whole new level. Developers in over 30 different countries competed for a bucket of $1,000,000 (USD) prizes in cash. A great team of developers from Evozon competed in the Intel® RealSense™ App Challenge in 2014 and won 2nd place worldwide with an innovative app, a music player with futuristic controls and an impressive colorful UI called MoodFlux. Its sensor is capable of face and hand tracking, being able to follow dozens of face landmarks and recognizing basic emotions (happy, scared, neutral, surprised, nervous, angry, etc.).”

A. Marinescu, Optimizations in Perlin Noise-Generated Procedural Terrain, Studia Universitatis Babeș-Bolyai Informatica

“The following article wishes to be the first of a series focused on different aspects involving procedural generation, with its ultimate goal being that of building an entire realistic 3D world from a single number (known in literature as the “seed”). It should allow for free exploration and render at interactive frame rates on mid to high-end graphics hardware. To begin with, we will discuss the manner in which we have generated procedural landscapes and some techniques we have borrowed from rendering algorithms in order to optimize the terrain generation and drawing process. Regarding the rendering framework, we have gone for Microsoft XNA Game Studio, the key factors of our choice being its simplicity plus the fact that we can focus more on the structure and realization of the algorithms and less on implementation and API details. As in our previous work, we have considered that this outweighs its limitations and that the concepts presented here should very well fit any programming language/drawing API.”

A. Marinescu, Achieving Real-Time Soft Shadows Using Layered Variance Shadow Maps (LVSM) in a Real-Time Strategy (RTS) Game, Studia Universitatis Babeș-Bolyai Informatica

“While building a game engine in Microsoft XNA 4 that powered a RTS (real-time strategy) tower defense type game, we were faced with the issue of increasing the amount of visual feedback received by the player and adding value to the gameplay by creating a more immersive atmosphere. This is a common goal shared by all games, and with the recent advancements in graphics hardware (namely OpenGL, DirectX and the advent of programmable shaders) it has become a necessity. In this paper we will build upon the shadowing techniques known as VSM (variance shadow map) and LVSM (layered variance shadow map) and discuss some of the issues and optimizations we employed in order to add real-time soft shadowing capabilities to our game engine.”