Randomised Decision Forests and Tree-structured Algorithms in Computer Vision
Many computer vision tasks can be cast as large-scale classification problems, where extremely efficient and powerful classification methods are pursued for real-time performance. Randomised Decision Forests is an emerging technique in the fields, being highly successful for various real-time vision applications. It roots to tree and ensemble learning. A hierarchical tree structure yields many short paths, accelerating evaluation time, while ensemble learning with randomisation ensures smooth decision regions for good generalisation to unseen data. On the other hand, Boosting as a representative ensemble learning technique has been a standard method for computational demanding tasks, e.g. object detection. A boosting algorithm with simple weak learners can be seen as a flat structure and many developments including a Boosting cascade for time-efficient classification as a tree structure. In this talk, we review Randomised Decision Forests and tree-structured methods with comparative and insightful discussions. Following the concepts and principles, its various applications are also presented. More information is found at http://www.iis.ee.ic.ac.uk/icvl.
Professor Tae-Kyun Kim
Intelligent Systems and Networks Group Group and Imperial College London
- T-K. Kim and R. Cipolla, MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features, In Advances in Neural Information Processing Systems (NIPS), Vancouver, Canada, 2008.
- T-K. Kim, I. Budvytis, R. Cipolla, Making a Shallow Network Deep: Conversion of a Boosting Classifier into a Decision Tree by Boolean Optimisation, Int. Journal of Computer Vision, 100(2):203-215, 2012.
- D. Tang, T.H. Yu and T-K. Kim, Real-time Articulated Hand Pose Estimation using Semi-supervised Transductive Regression Forests, Proc. of IEEE Int. Conf. on Computer Vision (ICCV), Sydney, Australia, 2013.
- X. Zhao, T-K. Kim, W. Luo, Unified Face Analysis by Iterative Multi-Output Random Forests, Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Columbus, Ohio, USA, 2014.
Local Features Extraction and Description
Local features have been exploited in competitive or state-of-the approches addressing a range problems like multi-veiw matching, 3D reconstruction, augmented reality, SLAM and texture recognition. In general, each approach places different requirements on the properties of the features – the speed of extration, location accuracy, robustness to illumination changes, covariance to a certain group of geometric transformation, etc.
Local features will be reviwed from this perspective, focusing on methods that provide the best trade-off for certain classes of problems and applications.
Professor Jiri Matas
Center for Machine Perception and Czech Technical University, Prague
Document Image Analysis
Document Image Analysis and Recognition (DIAR) is an important field in Pattern Recognition, whose aim is the automatic analysis of contents of document images, towards their recognition and understanding. Traditionally, DIAR has been focused on the analysis of scanned document images, and has been instrumental in the development of key technologies such as Optical Character Recognition (OCR) engines, and the introduction of key pattern recognition and machine learning concepts. In the last decades, the discipline has grown exponentially extending to camera based DIAR, on-line pen-based interfaces and contemporary text containers other than paper documents (real scenes, digital-born images, etc). In parallel, DIAR has become a cornerstone technology for the preservation of cultural heritage, especially in Europe. The DIAR field is represented in the International Association of Pattern Recognition through two Technical Committees: TC10 (“Graphics Recognition”) and TC11 (“Reading Systems”). The main conference of the field is the bi-annual International Conference on Document Analysis and Recognition (ICDAR) that attracts about 400 participants.
The lecture will cover the typical DIAR processes, including document enhancement, layout analysis, Optical Character Recognition, handwriting recognition, document classification, information spotting and retrieval, graphics recognition and writer identification.
Doctor Alicia Fornes
Computer Vision Center, Universitat Autònoma de Barcelona
- Handbook of Document Image Processing and Recognition (D.Doermann, K.Tombre – editors). Springer-Verlag London, ISBN: 978-0-85729-860-7, (DOI: 10.1007/978-0-85729-859-1), 2014.
3D Scene understanding
Automatic Facial Expression Recognition – A Practical Introduction
Facial Expression Recognition has reached a state where it can make its first tentative steps out in the wild, the most notable example being the smile detection on digital cameras. In this hands-on tutorial we will give a practical introduction to facial expression analysis, guiding you through a number of essential steps resulting in a working version of a smile detector. Resources for the tutorial can be downloaded from here.
Professor Michel Valstar
University of Nottingham and
The advent of the Microsoft Kinect and other RGB-D sensors has resulted in great progress in dense mapping, object recognition and SLAM in recent years. Given the low cost of the sensor coupled with the high resolution visual and depth information provided at video frame rate, methods relying on RGB-D sensors are becoming more popular in tackling some of the key perception problems in robotics and computer vision. This course will feature an overview of many of the recent advances in RGB-D camera-based research, including methods which specifically exploit the wide-scale availability of general-purpose computing on graphics processing units. The following topics will be covered in the course:
- Sensor technology and calibration
- Dense tracking & GPGPU approaches
- 3D reconstruction
- Large scale dense SLAM
- Applications of RGB-D data
- “Kintinuous: Spatially Extended KinectFusion” by T. Whelan, M. Kaess, M.F. Fallon, H. Johannsson, J.J. Leonard and J.B. McDonald. In RSS Workshop on RGB-D: Advanced Reasoning with Depth Cameras, (Sydney, Australia), July 2012. .
- “Robust Real-Time Visual Odometry for Dense RGB-D Mapping” by T. Whelan, H. Johannsson, M. Kaess, J.J. Leonard, and J.B. McDonald. In IEEE Intl. Conf. on Robotics and Automation, ICRA, (Karlsruhe, Germany), May 2013.
- “Deformation-based Loop Closure for Large Scale Dense RGB-D SLAM” by T. Whelan, M. Kaess, J.J. Leonard, and J.B. McDonald. In IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems, IROS, (Tokyo, Japan), November 2013.
- Porto Tour
- Palácio da Bolsa
- Museu Romântico da Quinta da Macieirinha
Welcome drink will take place in the Sheraton Hotel
VISUM 2015 dinner will be at the Restaurant Além-Mar at Hotel Dom Henrique
Randomised Decision Forests And Tree-Structured Algorithms In Computer Vision
Tae-Kyun Kim is a Lecturer in computer vision and learning at the Imperial College London, UK, since 2010. He obtained his PhD from the Univ. of Cambridge in 2008 and had been a Junior Research Fellow (governing body) of Sidney Sussex College in Cambridge during 2007-2010. His research interest spans various topics including: object recognition, tracking, face recognition and surveillance, action/gesture recognition, sematic image segmentation and reconstruction, and novel man-machine interface. He has co-authored over 40 academic papers in top-tier conferences and journals in the field, 6 MPEG7 standard documents and 17 international patents. His co-authored algorithm is an international standard of MPEG-7 ISO/IEC for face image retrieval.
Local Features Extraction And Description
Jiri Matas is Professor at the the Center for Machine Perception of Czech Technical University in Prague, Czech Republic. Jiri received the MSc degree in cybernetics (with honors) from the Czech Technical University in 1987 and the PhD degree from the University of Surrey, UK, in 1995. His research interests include object recognition, image retrieval, tracking, sequential pattern recognition, invariant feature detection, and Hough Transform and RANSAC-type optimization.
Jiri has published more than 200 papers in refereed journals and conferences. His publications have approximately 18,000 citations in Google scholar, and his h-index is 50. He received the best paper prize at the British Machine Vision Conferences in 2002 and 2005 and at the Asian Conference on Computer Vision in 2007. Jiri Matas has served in various roles at major international conferences (e.g. ICCV, CVPR, ICPR, NIPS, ECCV), co-chairing ECCV 2004 and CVPR 2007. He is a Programme Chair of ECCV 2016. He is on the editorial board of IJCV and was the Associate Editor-in-Chief of IEEE T. PAMI.
Document Image Analysis
Alicia Fornés received the B.S. degree in Computer Science in 2003 from the Universitat de les Illes Balears (UIB) and the Ph.D. degree in 2009 from the Universitat Autònoma de Barcelona (UAB). She also finished the piano studies at the Conservatory Superior of Music of the Illes Balears in 2001. Her Ph.D. work was focused on writer identification of old music scores. She was the recipient of the AERFAI (Image Analysis and Pattern Recognition Spanish Association) best thesis award 2009-2010. She is currently a researcher at the Computer Vision Center.
She has participated in several research projects (including European projects). She has published more than 60 papers in international conferences and journals. Since 2011, she is the newsletter editor of the IAPR TC-10 (“Technical Committee 10 on Graphics Recognition”). She has done several research stays abroad, including the University of Bern (Switzerland), the University of La Rochelle (France) and the Osaka Prefecture University (Japan). Her research interests include document analysis, historical documents, handwriting recognition, symbol recognition, optical music recognition, and writer identification.
Martial Hebert is a Professor of Robotics at Carnegie-Mellon University, where he is Director of the Robotics Institute. His interest includes computer vision, especially recognition in images and video data, model building and object recognition from 3D data, and perception for mobile robots and for intelligent vehicles. His group has developed approaches for object recognition and scene analysis in images, 3D point clouds, and video sequences.
In the area of machine perception for robotics, his group has developed techniques for people detection, tracking, and prediction, and for understanding the environment of ground vehicles from sensor data. He has served on the editorial boards the IEEE Transactions on Robotics and Automation, the IEEE transactions on Pattern Analysis and Machine Intelligence, and the International Journal of Computer Vision (for which he currently serves as Editor-in-Chief). He was Program Chair of the 2009 International Conference on Computer Vision, General Chair of the 2005 IEEE Conference on Computer Vision and Pattern Recognition and Program Chair of the 2013 edition of this conference.
Automatic Facial Expression Recognition – A Practical Introduction
Professor Michel Valstar (http://www.cs.nott.ac.uk/~mfv) is a lecturer at the University of Nottingham. He was a Visiting Researcher at MIT’s Media Lab, and a Research Associate in the intelligent Behaviour Understanding Group (iBUG) at Imperial College London. He received his masters’ degree in Electrical Engineering at Delft University of Technology in 2005 and his PhD in computer science at Imperial College London in 2008. Currently he is working in the fields of computer vision and pattern recognition, where his main interest is in automatic recognition of human behaviour, specialising in the analysis of facial expressions. In 2011 he was the main organiser of the first facial expression recognition challenge, FERA2011, and the first and second Audio-Visual Emotion recognition Challenges, AVEC2011, AVEC2012, and AVEC2013. In 2007 he won the BCS British Machine Intelligence Prize for part of his PhD work.
He has published technical papers at authoritative Journals and conferences including SMC-B, TAC, CVPR, ICCV and SMC-B and his work has received popular press coverage in New Scientist and on BBC Radio. He has recently proposed a new area of research called Behaviomedics, which aims to revolutionise diagnosis, monitoring, and treatment of medical conditions that alter human behaviour by using affective computing and social signal processing to quantify people’s expressive behaviour (see reference).
Thomas Whelan received his B.Sc. (Hons) in Computer Science & Software Engineering from the National University of Ireland Maynooth in 2011. He recently finished his Ph.D at the National University of Ireland Maynooth under a 3 year post-graduate scholarship from the Irish Research Council. In 2012 he spent 3 months as a visiting researcher at Prof. John Leonard’s group in CSAIL, MIT funded by a Science Foundation Ireland Short-Term Travel Fellowship. As the principal author of the Kintinuous RGB-D SLAM system for dense mapping over large scale environments, his research focuses on developing methods for dense real-time perception and its applications in SLAM and robotics.
In August 2014 he started a new post as a Dyson Research Fellow at the new Dyson Robotics Laboratory lead by Prof. Andrew Davison at Imperial College London. The lab was established in 2014 in partnership with Dyson to develop computer vision programs that will enable robots to move beyond their controlled environments and successfully navigate the real world.
PhD in Electrical and Computer Eng. – IST (1995), Visiting Scientist – Robotics Institute – Carnegie Mellon University (1992-1995), Researcher@ Signal and Image Processing Group, Coordinator of Thematic Area D – ISR-Associated Laboratory, Group Coordinator – Signal and Image Processing Group SIPg and Co-director of the Dual PhD program in ECE and Robotics Carnegie Mellon|Portugal
- From Healthcare Imaging to shape interpretation
Doctor Jacek Kustra
Jacek Kustra is passionate about science and technology. He follows his creativity and out of the box thinking to make the best use of his skills to improve lives. Born in the city of Czestochowa in Poland, he moved to Portugal at an early age, where he developed his ongoing drive to learn. He followed a 5 year Licenciatura Degree in Electronics and Telecommunic
ations, followed by a Masters (pre-bologna) in Biomedical Engineering, both in the University of Aveiro and a PhD in Computational Geometry in the University of Groningen, Netherlands. Software freelancer, University Lecturer, System Architect have been some of the professional positions Jacek has held, where he worked on a broad spectrum of technology and applications, ranging from electronics design to novel algorithm development. He currently holds a Senior Scientist position in Philips Research in the Oncology Solutions Group.
- Doctor Daniel Heesch
Daniel Heesch’s professional life spans both academia and private enterprise. After a Ph.D. in computer science, he spent two years as a post-doctoral researcher with Prof Petrou at Imperial College London working on a
variety of computer vision problems. He has published in a wide range of disciplines, from theoretical biology and information retrieval to computer vision and machine learning.
Daniel took the entrepreneurial turn in 2006 when founding Pixsta, the first company in Europe to apply techniques for image analysis and retrieval to the online fashion market. As cofounder he helped the company raise multi-million pound investment rounds, secured a key technology patent, and won a £1.5M research grant. Daniel has a deep understanding of many aspects of running and growing a company, both operationally and strategically. Through his London-based advisory firm Hirschblau Consulting, Daniel helps exciting new technology ventures get off the ground, including ASAP54 where he currently acts as Chief Scientific Officer.
Daniel holds a B.A. in Biology from The University of Oxford, a B.Sc. in Mathematics from Open University and a Ph.D. in Computer Science from Imperial College London.
- Doctor Manuel João Ferreira
Manuel J. Ferreira received the B.Sc. degree in electronics and telecommunications from the University of Aveiro, Aveiro, Portugal, in 1992, and the M.Sc. degree in industrial informatics and the Ph.D. degree in industrial electronics from the University of Minho, Guimarães, Portugal, in 1996 and 2004, respectively. Between 1992 and 1999 he was a researcher at INESC Porto and Professor at the Universidade Lusíada. In this period he developed his Master thesis resulting in a system installed in four different countries (Portugal; Spain; Brazil and Australia) in several companies. From 1999 and 2001 was the technical coordinator of the computer vision area at IDITE-Minho research Institute. Since 2001 until 2012 he was Professor in the Industrial Electronics Department, University of Minho, mainly teaching in the fields of image and signal processing, and working with several companies helping them to specify and develop a large number of computer vision solutions for different industrial sectors, such as: plastic; textile; leather; automotive; beverage; agro-food; robotics and multimedia. From 2008 to 2012 was the coordinator of the computer vision group of CCG research institute. Currently he is the R&D coordinator at ENERMETER for the computer vision issues. His main research interests include computer vision, image processing, and image analysis. Since 1992 he has been working mainly on the development of computer vision technologies based on advanced algorithms, applied to industrial applications and medical imaging.
|Kelwin Fernandes||INESC TEC, Faculdade de Engenharia da Universidade do Porto|