The Data Science and Statistics group organises internal meetings, frequently in the form of seminars, with the aim of presenting and discussing ongoing research of its members. Seminars are mainly in the domain (but not limited to) statistics, data mining, machine learning, bioinformatics, mathematical modeling, and optimization.
The calendar below reports the schedule of the future and past DSS meetings and seminars.
Pinar Karagöz (Middle East Technical University)
Thu 05 Sep 2019 at 14:15 IMADA seminar room Abstract Permalink
Increasing use of social media and mobile devices lead to the accumulation of more evidence about where people go, what kind of paths they follow, where they are, etc. This evolution led to Location Based Social Networks (LBSN) enabling sharing locations and commenting of locations. Such data that can be obtained from LBSNs enable extraction of patterns about different dimensions of locations and the interaction between people and locations. In this talk, I will focus on generating recommendations for LSBN users, especially context-aware recommendation by using random walk. Additionally, I will talk about recommendations for a group of LBSN users, especially tour recommendations.
Michael E. Houle (National Institute of Informatics, Japan)
Tue 16 Jul 2019 at 14:15 IMADA seminar room Abstract Permalink
Researchers have long considered the analysis of similarity applications in terms of the intrinsic dimensionality (ID) of the data. This presentation is concerned with a generalization of a discrete measure of ID, the expansion dimension, to the case of smooth functions in general, and distance distributions in particular. A local model of the ID of smooth functions is first proposed and then explained within the well-established statistical framework of extreme value theory (EVT). Moreover, it is shown that under appropriate smoothness conditions, the cumulative distribution function of a distance distribution can be completely characterized by an equivalent notion of data discriminability. As the local ID model makes no assumptions on the nature of the function (or distribution) other than continuous differentiability, its generality makes it ideally suited for the learning tasks that often arise in data mining, machine learning, and other AI applications that depend on the interplay of similarity measures and feature representations. An extension of the local ID model to a multivariate form will also be presented, that can account for the contributions of different distributional components towards the intrinsic dimensionality of the entire feature set, or equivalently towards the discriminability of distance measures defined in terms of these feature combinations. The talk will conclude with a discussion of recent applications of local ID to deep learning.
Ricardo J. G. B. Campello (University of Newcastle)
Tue 11 Jun 2019 at 14:15 DIAS conference room Abstract Permalink
Non-parametric density estimates are a useful tool for tackling different problems in statistical learning and data mining, most noticeably in the unsupervised and semi-supervised learning scenarios. In this talk, I elaborate on HDBSCAN, a density-based framework for hierarchical and partitioning clustering, outlier detection, and data visualisation. Since its introduction in 2015, HDBSCAN has gained increasing attention from both researchers and practitioners in data mining, with computationally efficient third-party implementations already available in major open-source software distributions such as R/CRAN and Python/SciKit-learn, as well as successful real-world applications reported in different fields. I will discuss the core HDBSCAN* algorithm and its interpretation from a non-parametric modelling perspective as well as from the perspective of graph theory. I will also discuss post-processing routines to perform hierarchy simplification, cluster evaluation, optimal cluster selection, visualisation, and outlier detection. Finally, I briefly survey a number of unsupervised and semi-supervised extensions of the HDBSCAN* framework currently under development along with students and collaborators, as well as some topics for future research.
Georgios Kaiafas (University of Luxembourg)
Thu 23 May 2019 at 13:00 IMADA Methods Lab Abstract Permalink
Yuri Goegebeur: research presentation
Agenda: Networking, problem solving
Sangramsing Nathusing Kayte: Overview on NLP-related research
Peter Schneider-Kamp: Research Issues with Drones
Jonatan Møller Gøttcke, Class Imbalance and Probabilistic Learning in k-Nearest-Neighbor Classification