AUEB Natural Language Processing Group

Home

AUEB's Natural Language Processing Group develops algorithms, models, and systems that allow computers to understand and generate natural language text and speech. We are also investigating multimodal information processing, e.g., combining speech, text, and images.

The group's current research interests include:
  • machine learning for text, speech, and multimodal information, especially deep learning models,
  • spoken language understanding and dialog systems,
  • question answering, retrieval-augmented generation, and multi-step reasoning for document collections,
  • image to text generation, especially generating diagnostic tags and captions from medical images,
  • improving online discussions, including detecting and handling toxic posts and disinformation, and using large language models as mediators,
  • sentiment analysis and emotion recognition for text and speech,
  • natural language processing in the digital humanities,
  • natural language processing for biomedical, legal, and financial data,
  • text and speech processing tools for Greek.

The group is part of the Information Processing Laboratory of the Department of Informatics of the Athens University of Economics and Business.

Members of the group co-authored the paper "Restoring and attributing ancient texts using deep neural networks", which was published in Nature (March 2022).

The group co-organizes SemEval 2025 Task 10 on "Multilingual Characterization and Extraction of Narratives from Online News". It also co-organized the 2nd Athens Natural Language Processing Summer School (AthNLP 2024) and the Machine Learning for Ancient Languages workshop at ACL 2024, the 3rd Workshop on Natural Legal Language Processing (NLLP 2021) at EMNLP 2021, the SemEval Toxic Spans Detection task (2021), the 11th EETN (Greek) Conference on Artificial Intelligence (SETN 2020), the 2nd Workshop on Natural Legal Language Processing (NLLP 2020) at KDD 2020, the 1st Athens Natural Language Processing Summer School (AthNLP 2019), the EACL 2009 conference in Athens, the Large Scale Hierarchical Text Classification challenges (LSHTC3 was the ECML/PKDD 2012 Discovery Challenge), the BioASQ challenges, and the SemEval Aspect-Based Sentiment Analysis task (2014, 2015, 2016).

The group ranked 2nd in concept detection and 4th in caption prediction in ImageCLEFmed Caption 2024. It also ranked 1st in concept detection and 3rd in caption prediction in ImageCLEFmed Caption 2023 (see also this AUEB announcement in Greek). We also ranked 1st in concept detection and 2nd in caption prediction in ImageCLEFmed Caption 2021 and ImageCLEFmed Caption 2022. Our systems were also ranked at positions 1, 2, 3 and 5 among approximately 60 systems in the ImageCLEFmed Caption 2019 task, and at positions 1, 2, 6 among 49 systems in ImageCLEFmed Caption 2020 (see also this AUEB announcement). The group received a BioASQ award in 2018 for ranking first in three out of five document retrieval test batches and all five snippet retrieval test batches. We received another BioASQ award in 2019 for ranking first in the four document and snippet retrieval batches we participated in. We also received a BioASQ award in 2020 for ranking in the top 2 positions in 4 out of 5 document retrieval test batches, and for ranking 1st in 4 out of 5 snippet retrieval test batches.