BIRDS 2021

Bridging the Gap between Information Science, Information Retrieval and Data Science

An interdisciplinary CHIIR 2021 workshop for students, practitioners and researchers in Data Science, Information Retrieval, Information Science and Human-Computer Interaction.

BIRDS offers a range of invited talks and accepted peer-reviewed papers

March 19, 2021


Like last year we will be running a workshop called BIRDS - Bridging the Gap between Information Science, Information Retrieval and Data Science - which aims to foster the cross-fertilization of Information Science (IS), Information Retrieval (IR), Data Science (DS) and Human-Computer Interaction (HCI). BIRDS is an interdisciplinary workshop for students, practitioners and researchers in the aforementioned disciplines. Recognising the commonalities and differences between these communities, we will bring together experts and researchers in IS, IR, DS and HCI to discuss how they can learn from each other to provide more user-driven data and information exploration and retrieval solutions. Therefore, we welcome submissions conveying interdisciplinary ideas on how to utilise, for instance, IS concepts and theories in IR and/or DS approaches to support users in data and information access. BIRDS will be online and collocated with the 6th ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR 2021).


The overarching theme of the BIRDS workshop is to look at how Data Science, Information Retrieval, Information Science and Human-Computer Interaction can complement each other by applying a more holistic approach to these disciplines that go beyond traditional DS or IR or IS alone.

BIRDS aims at extending the scope of current research to provide a view on data and information in all its quantity and variety through investigating user preferences and interaction. The cross-fertilization of DS, IR and IS that we want to address in this workshop goes three ways.

  1. BIRDS will focus on the utilisation of DS methodologies in interactive IR and IS, e.g. by integrating data mining, database concepts, heterogeneous data, data analysis, exploration and visualization techniques to interactive IR and IS.
  2. In addition, we will look at how user-oriented concepts and theories from IS, for instance, human-centric information seeking and searching, cognitive models such as Information Foraging Theory or the Principle of Polyrepresentation can be applied to enhance and complement the data-driven approaches in DS and IR.
  3. Finally, we will also examine how IR models and theory can be applied to IS and DS, e.g. by introducing the IR concepts of vagueness and uncertainty to DS and IS.

To this aim, relevant topics of the workshop will be, but are not limited to:

IS models and theory applied to IR and DS

  • User modelling
  • Information foraging
  • Cognitive models
  • Interactive information access and retrieval
  • User preference and behaviour analysis
  • User-centric exploratory data analysis
  • Evaluation methods for exploratory search and data analysis
  • Social and collaborative information seeking
  • Bibliometric-enhanced scholarly IR and data search

DS models and theory applied to IR and IS

  • Interpretability and explainability in its user-centric application
  • User-oriented machine learning and deep learning
  • User-centric data visualization
  • User-centric data exploration and mining
  • Data analysis for evaluation

IR models and theory applied to IS and DS

  • Multimodal information discovery
  • Database-IR integration
  • Query languages
  • Formal models for interactive heterogeneous data and information discovery
  • Conversational data exploration

The target audience of the workshop are students, practitioners and researchers in DS, IR, IS and HCI, from academia and industry alike.

Confirmed Talks and Program

Invited Talks

  • Ed Fox, Virginia Tech, USA: User Discovery and Exploration in Future Digital Libraries (slides)

    Since the early 1990s, digital libraries have been devised to support particular communities (societies) engaged in varied activities (scenarios) with a focus on specialized types of content (structured streams of data, locatable and presentable using spaces – vector/probability/topological and 1D/2D/3D). Whether these efforts involve already known items, or finding new/additional items (often through searching/browsing/visualizing), it is typical for the digital libraries to support such discovery and exploration. The 5S (societies/scenarios/spaces/structures/streams) framework facilitates devising, populating, and using such digital libraries, with appropriate mixes of data and information, guided by various types of knowledge. We explain how key personas can work with future digital libraries, such as curators, data scientists, and those with information needs. We describe a new approach to devising such extensible information systems that integrate information retrieval, information science, and data science approaches and requirements, involving teams of UX, subject matter experts, data scientists, and DevOps personnel. We also discuss technology transfer methods, with customer discovery of the ecosystem of users (societies), to help ensure the relevance and utility of such digital libraries.

  • Tony Russell-Rose, 2Dsearch and Goldsmiths, University of London, UK: Searching, fast and slow

    Knowledge workers such as information professionals, legal researchers and librarians need to create and execute search strategies that are comprehensive, transparent, and reproducible. The traditional solution is to use command-line query builders offered by proprietary database vendors. However, these are based on a paradigm that dates from the days when databases could be accessed only via text-based terminals and command-line syntax. In this talk, we explore alternative approaches based on a visual paradigm in which users express concepts as objects on an interactive canvas. This offers a more intuitive UX that eliminates error, makes the query semantics more transparent, and offers new ways to collaborate and share best practices.

  • Emanuele Di Buccio, University of Padua, Italy: Data Science and Information Access for Social Research on Technoscientific Issues in the Media

    Social Science Research can benefit from the massive amount of digitized content and the heterogeneous online sources available nowadays. An example is research activities in Science and Technology Studies, e.g., those investigating the presence and the perception of Science and Technology issues in the Media. Methodologies rooted in Data Science and Information Access can play a crucial role in supporting these research activities. In this context, users are specialists in Social Sciences, and their task is to investigate research hypotheses. This talk is about some of the challenges arising when supporting social scientists in their investigations on the media's discourse on technoscientific issues. The presentation will focus on a methodology and a system designed to support access and exploration of longitudinal corpora through diverse representations and diverse forms of user-system interaction.

  • Martin White, Intranet Focus Ltd and University of Sheffield, UK: Understanding and solving the complex IIR challenges of searching enterprise content

    It is not unusual to find that organisations need to search across 200 million+ files in multiple formats and in perhaps nine languages, and yet rarely is any training provided for what is almost always a business-critical search. There is also the need to support both known-item and exploratory search, and often groups of professional searchers within the organisation need to undertake high-recall searches. Very little research has been carried out into what is commonly described as ‘enterprise search’. This paper will examine the reasons behind this lack of research, summarise the emerging appreciation of how search is undertaken inside an organisation and suggest areas where it would be of value to the IR community and to the organisation to undertake research.

  • Lorraine Goeuriot, Univ. Grenoble Alpes, France: Exploiting clinical data to build patients trajectories

    Medical data is generated and stored throughout a patient’s life. Exploiting such an amount of heterogeneous information can hardly be done by humans, but recent advances in artificial intelligence now allows to model patients trajectories, and use them to predict certain factors in the future. In this talk, I will present recent work on prediction from patient trajectories with two use cases : sleep apnea disorders and hospitalization of GPs patients.

  • Tobias Eljasik-Swoboda, Fernuniversität Hagen, Germany: Querying by Example Using Bootstrapped Explainable Text Categorization in Emergent Knowledge-Domains

    Text Categorization (TC) is the act of assigning text documents to predefined categories. For instance, to distinguish between pro- and contra arguments for a specific topic. The automation of TC can either be done by using fixed rules or by machine learning. The difference between machine learning and programming is, that in machine learning, the machine creates its own program based on sample data. In the context of TC, these are example assignments of documents to categories called Target Function.
    Machine Learning based Text Categorization (MLTC) can be used for many different applications. One such application is Argument Mining (AM), the finding of pro- and contra- arguments in large text corpora. Other examples include the assignment of news articles to specific categories, spam filtering, detection of offensive language in internet communications or the detection of user intent when interacting with a voice assistant like Amazon’s Alexa or Apple’s Siri. MLTC is already widely applied. However, whenever a new application is developed that requires MLTC features, four fundamental problem fields arise. Firstly, the technical integration effort is high. This means that multiple prerequisites must be available, and programmers need to be familiar with details about the MLTC process. Secondly, the high effort required for the collection of examples for the MLTC process to learn from as well as providing manually crafted resources such as lists of relevant words for specific topics. Thirdly, according to the GDRP, MLTC systems operated in the EU that impact European citizens must be explainable. Generating explanations for the behavior of machine learning is no trivial task and an area of active research. A fourth problem field is semantic shift and the emergence of new knowledge. Previous resources and examples can become obsolete with future developments.
    To overcome these problem fields, our previous work combined two research frameworks, the Design-oriented Information Systems Research methodology (DIRS) and the Research Framework for Information Systems Research (RFISR) to create insight into the problem fields and create artifacts that can overcome these problems. After assessing the state of the art in relevant areas of science and technology, a formal problem model was constructed. Capitalizing on recent trends in information technology, such as Big Data and Cloud Computing, a microservice oriented application to quickly provide explainable MLTC was designed and prototypically implemented as microservice oriented application. This prototype can even function without a target function by using word embeddings, and other recently emerged technologies. The created suite of microservices has already been evaluated in five different applications that apply MLTC. Even though the evaluation shows slightly inferior effectiveness to technologies that are fine tuned for their specific problems, the created system can be applied to these different problems in two different natural languages in a matter of minutes. Different to the existing most effective applications, the created system also generates explanations for its decisions. A qualitative evaluation and subsequent survey have already shown that the explanations are of a high quality and understood by a majority of survey participants. The developed prototype also possesses the ability to create new categories to organize documents when new knowledge emerges.
    This capability of requiring little to no examples and other manually provided resources is well suited for scenarios in which said examples are hard to obtain. One such example is query by example information retrieval. This paper investigates how the developed MLTC microservices can be used for on-the-fly query construction that supports result sets including relevance feedback and facetted browsing. A unique characteristic of this information retrieval approach is its ability to generate explanations stating reasons why certain results are elements of the result set. Its applicability for facetted browsing also means, that the approach is well suited for information filtering applications.

  • Kevin Berwind, Fernuniversität Hagen, Germany: Design of use case diagrams and personas based on the CRISP4BigData process / Conceptual Design and Implementation of a graphical user interface for CRISP4BigData

    We give an overview of the current state of research regarding the persistence and reproduction of meta-information and artifacts used during a Big Data analysis process. The collection of meta information and artifacts as well as the business modeling of the analysis process are performed via appropriate user interfaces based on the Business Process Model and Notation and provided as packages via an automation interface. Based on requirements for Data Scientists from the industry, which were discussed and debated in an expert interview at the EGI Community Forum 2015. The requirements discussed were converted into various user stereotypes and serve as the basis for creating initial use case diagrams that also distinguish the technical perspectives of the respective user stereotypes with the help of personas. The use case diagrams are enriched by the individual phases and tasks of the Cross Industry Standard Process for Big Data Reference Model (CRISP4BigData) which was already presented at the Collaborative European Research Conference 2016 (CERC2016). Based on these use case diagrams and personas, concrete proposals for a software architecture will be derived in further work.
    In the second talk we present current research work on the design and implementation of a graphical user interface based on the CRISP4BigData use cases with respect to technical requirements of the model-view-controller approach and the Symfony PHP framework developed. At the same time, the CRISP4BigData UI is to be integrated into the Knowledge Management Ecosystem Portal in order to extend the portal's functionalities and thus map added value. The Knowledge Management Ecosystem Portal (KM-EP) is based on the approach of Virtual Research Environments to solve problems around content and knowledge management application scenarios. The KM-EP already provides solutions for Centralized Digital Library and Media Archive, Unified Access Rights, Detailed Metadata, Faceted Search with Categorization, Open Contribution. The CRISP4BigData UI is intended to take up this idea and provide an automated and guided documentation process for Big Data analyses and their (process) meta information and artifacts. The collection of meta information and artifacts as well as the business modeling of the analysis process are performed via appropriate user interfaces based on the Business Process Model and Notation and provided as packages via an automation interface.

Accepted Papers

  • Morshed Adnan, Matthias Hemmje and Michael Alexander Kaufmann. Social Media Mining to Study Social User Groups by Visualizing Tweet Clusters using Word2Vec, PCA and K-Means
  • Mahmoud Artemi and Haiming Liu. A User Study on User's Attention for an Interactive Content-based Image Search System
  • Stefan Wagenpfeil, Felix Engel, Paul McKevitt and Matthias Hemmje. Semantic Query Construction and Result Representation based on Graph Codes
  • Nicholas Collis and Ingo Frommholz. AQUACOLD: A Novel Crowdsourced Linked Data Question Answering System
  • Ghadeer Abuoda, Chad Hendrix and Stuart Campo. Automatic Tag Recommendation for the UN Humanitarian Data Exchange
  • Thoralf Reis, Sebastian Bruchhaus, Binh Vu, Marco X. Bornschlegl and Matthias L. Hemmje. Towards Modeling AI-based User Empowerment for Visual Big Data Analysis

Schedule and Registration

BIRDS will take place in two sessions on March 19, 2021:

  • Session 1 Begin: 0:00 (PDT), 2:00 (CDT), 3:00 (EDT), 7:00 (UTC/GMT), 8:00 (CET), 15:00 (CST), 16:00 (JST), 18:00 (AEDT), 20:00 (NZDT)
  • Session 1 End: 5:00 (PDT), 7:00 (CDT), 8:00 (EDT), 12:00 (UTC/GMT), 13:00 (CET), 20:00 (CST), 21:00 (JST), 23:00 (AEDT), 0:00 (NZDT March 20)
  • Session 2 Begin: 7:00 (PDT), 9:00 (CDT), 10:00 (EDT), 14:00 (UTC/GMT), 15:00 (CET), 22:00 (CST), 23:00 (JST), 1:00 (AEDT March 20), 3:00 (NZDT March 20)
  • Session 2 End: 11:00 (PDT), 13:00 (CDT), 14:00 (EDT), 18:00 (UTC/GMT), 19:00 (CET), 2:00 (CST, March 20), 3:00 (JST, March 20), 05:00 (AEDT March 20), 07:00 (NZDT March 20)
To register, please visit the CHIIR registration page. The CHIIR registration includes all workshops, incl. BIRDS. Please allow a 24-hour processing time of your registration and for sending access data.

Please note there might be minor changes to the schedule.

Workshop Organizers

Programme Chairs (Workshop Organisers)

Programme Committee

  • Guillaume Cabanac, IRIT - Université Paul Sabatier Toulouse 3, France
  • Emanuele Di Buccio, University of Padua, Italy
  • Catherine Dumas, Simmons University, USA
  • Edward Fox, Virginia Tech, USA
  • Paul Mulholland, The Open University, UK
  • Norbert Fuhr, University of Duisburg-Essen, Germany
  • Lorraine Goeuriot, Univ. Grenoble Alpes, CNRS, France
  • Udo Kruschwitz, University of Regensburg, Germany
  • Birger Larsen, Aalborg University, Denmark
  • Hyowon Lee, Dublin City University, Ireland
  • Philipp Mayr, GESIS, Cologne, Germany
  • Thomas Roelleke, Queen Mary University of London, UK
  • Tony Russell-Rose, 2Dsearch and Goldsmiths, University of London, UK
  • Sandro Sozzo, School of Business and Research Centre IQSCS
  • Ingo Schmitt, Technical University Cottbus, Germany
  • Martin White, Intranet Focus Ltd and University of Sheffield, UK
  • Hong Qing Yu, University of Derby, UK


If you have any further questions, you can reach out via email to