KInIT realizes PhD study in partnership with Faculty of Information Technology, Brno University of Technology.

KInIT doctoral students will be full-time KInIT employees and devote their time to research and study for their PhD degree. At the same time, KInIT doctoral students will act as external students of FIT VUT and graduates will receive their degree from FIT VUT.

Supervising team: Jakub Šimko (supervisor, KInIT), Peter Brusilovsky (University of Pittsburgh), Jana Kosecka (George Mason University)

Key words: machine learning, crowdsourcing, human computation, data annotation, human-in-the-loop, collective intelligence

The models created in machine learning can only be as good as the data on which they are trained. Researchers and practitioners thus strive to provide their training processes with the best data possible. It is not uncommon to spend much human effort in achieving upfront good general data quality (e.g. through annotation). Yet sometimes, upfront dataset preparation cannot be done properly, sufficiently or at all.

In such cases the solutions, colloquially denoted as human-in-the-loop solutions, employ the human effort in improving the machine learned models through actions taken during the training process and/or during the deployment of the modes (e.g. user feedback on automated translations). They are particularly useful for surgical improvements of training data through identification and resolving of border cases. This is also directly related to explainability and interpretability of models.

Human-in-the-loop approaches draw from a wide palette of techniques, including active and interactive learning, human computation and crowdsourcing (also with motivation schemes of gamification and serious games), collective intelligence. Each of these fields (or combination thereof) presents opportunities for new discoveries. They border on computer science disciplines such as data visualization, user experience (usability in particular) and software engineering.

The application domains of machine learning with human-in-the-loop are predominantly those with a lot of heterogeneity and volatility of data. Such domains include online false information detection, online information spreading (including spreading of narratives or memes), support of manual/automated fact-checking and more.

Relevant publications:

Read more at: https://kinit.sk/education-training/doctoral-studies/