›› Research at DM&PG

Research conducted at DM&PG is targeted towards the fields of privacy preservation, record linkage and parallel scientific computation. In the following sections we present an overview of the focus of each area. More information can be acquired through the links we provide in the "Useful links" tab of our website.


Privacy Preserving Data Mining

In real world, we often face the challenge of exploiting high level knowledge while maintaining privacy. Common data mining tasks, such as rule-based categorization and association rule mining are usually capable of eliciting small pieces of knowledge that are well-hidden in huge piles of data. Depending on the dataset, this task may lead to a breach of fundamental privacy, a problem which arose lots of concerns in recent years and led to the genesis of the Privacy Preserving Data Mining (PPDM) field. Financial, corporate, health and clinical datasets constitute only a small part of vulnerable collections, containing information that needs to be cautiously mined in order to ensure preservation of privacy. Apart from concealing vulnerable data, the scope of PPDM is to allow for high-quality, valid results, that are as close as possible to the ones generated from a non-sensitive version of the same dataset. There are several approaches proposed nowadays in the literature that aim at securing the privacy-gap in the mining of sensitive information. Various of these approaches are presented and discussed in the papers collected under the "Publications" tab.


Record Linkage and Privacy Preservation

The enormous assortment of information in various, multipurpose datasets by different organizations and authorities over the past decades, did not comply with any commonly-accepted quality standards. This fact gave rise to the issue of record matching, also known as the record linkage problem, where records that are believed to relate to the same entity, are stored as different instead of being treated as identical. There are several methodologies proposed in the current literature that aim towards bringing such records together (which are thereof considered as linked), and addressing the special requirements of record matching, both for tupples related to the same entity (ex. individual, company, etc) and for duplicates existent in a database. Applications of record linkage techniques may be encountered in various disciplines like healthcare, government, demographic studies and medical research. It is self evident however, that information integration among different entities poses risks affecting the privacy of the individuals described by the linked information. Along these lines a new area of research has emerged which is known as privacy preserving record linkage that primarily aims at preserving the privacy of individuals during the linkage process.


Parallel and Scientific Computation

Wide-area, distributed high-performance resources will definitely play a significant role in the future of computing. Computational grids attempt to grant access of individuals to computer clusters spread throughout the globe at a trivial amount of time. Under this new evolutionary era, supercomputers will be joined together under the same unified framework and used for solving large-scale, data intensive and extremely demanding scientific applications. According to our opinion, grid computing constitutes a very interesting research field with lots of extensions and elevated applicability to our society.