ARIA | The Computational Learning and Visual Perception Research Group

The Computational Learning and Visual Perception (CLVP) Research Group , lead by Prof. Cristian Sminchisescu, Ph.D., focuses on theoretical and applicative aspects of computational learning and visual perception.

We study low-level, mid-level and high-level vision problems including edge detection, image segmentation, graph matching, motion estimation, and 3d reconstruction. We are also interested in computational models that combine top-down object-specific priors, mid-level generic shape regularities, and bottom-up information. Our goal is to design new visual routines that enhance the dominantly monolithic, single-task, single-representation models in the existing state of the art with feedback and context awareness mechanisms. We also study the design of content-based indexing techniques applicable to gigantic image, video and text collections, as currently available on the Web.

Our work in machine learning focuses on optimization techniques and the learning of structured models based on a variety of supervision signals. Several supervised, unsupervised and semi-supervised directions are currently pursued including large-scale kernel approximations, manifold learning methods, and multiple instance learning formulations. We also derive new methods for spatial and temporal inference, including variational approximations and Markov Chain Monte Carlo sampling. Our emphasis is on designing efficient algorithms and on methods with provable convergence and asymptotic correctness guarantees.

We work on a variety of applications including computer vision, computer graphics, and sensor fusion. In computer vision, we study human pose reconstruction, object and action recognition, and scene understanding. In computer graphics, we work on deriving novel 3d representations with emphasis on modeling deformable and articulated objects as well as motion patterns and interactions in complex environments. We also study optimal sensor fusion from a variety of uncertain signals extracted from images, depth maps (time of flight sensors, laser scanners), sound, human body sensors, or motion capture data.

Bucharest Computer Vision and Image Processing Data fusion Machine Learning