Student Project List

Research Projects

Below a list of projects I am currently working on or I am interest in. The list is non exhaustive, and other proposal will be considered/accepted.

Benchmark of VLMs

  • The project involves the benchmarking of Visual Language Models (VLMs) for Optical Character Recognition (OCR) on 17th and 18th century printed books. Each VLM will be applied to multilingual editions (Italian, French, Dutch, English, German) of a sample of few editions. The results will be systematically compared with existing machine learning-based OCR solutions. The purpose is to assess how VLMs perform with complex textual structures and not-english text. Linked with ACDIC.

Spatiotemporal Data Analysis and Visualizations

  • The project focuses on the spatio-temporal analysis and visualization of a metadata corpus describing 10,000 French plays produced between 1550 and 1920. The research will include: (i) validation of LLM-generated historical periodizations, (ii) alignment of temporal and spatial data with historical maps, and (iii) design and implementation of interactive visualizations (maps and charts) to represent possible spatial and temporal convergences in literature. Linked with TextEnt.

RAG and Data Modeling

  • The project aims to test and compare Retrieval-Augmented Generation (RAG) pipelines for querying complex cultural datasets encoded in RDF/Graph databases. Various configurations (e.g., LLM-only, LLM + GNN) and large language models (e.g., LLaMA, GPT, RoBERTa) will be evaluated using the same dataset, but encoded with different ontologies (e.g., CIDOC-CRM, Dublin Core, schema.org, DOLCE). The objective is to assess whether different conceptual and formal representations impact retrieval quality.

IIIF and Text

  • The project involves the creation of a IIIF-compliant digital library based on TEI/XML source files. It includes: (i) development or adaptation of a TEI/XML-to-IIIF transformation pipeline capable of processing annotations and serving images via IIIF-compliant image servers (e.g., Cantaloupe), (ii) implementation of a web interface incorporating IIIF image viewers (e.g., Universal Viewer) for edition browsing, and (iii) integration of edition metadata within the interface. Linked with ACDIC.

Provenance of LLM-based information

  • Development of a data modelling framework for annotating information extracted with LLMs. The work includes (i) analyzing case studies where LLM-based outputs enrich cultural datasets, (ii) assessing and extending existing ontologies (prov-o, CRMdig) to capture LLM-based pipelines, and (iii) creating a dataset that annotates LLM-based statements using RDF 1.2, RDF-star, named graphs, or related reification methods.

Modeling of Intangible Heritage

  • Analysis and modelling of multilingual data sources documenting intangible heritage. The task involves collecting descriptors used by different agency for the documentation of different categories of intangible heritage, assessing them against existing classification systems, vocabularies and schemas, developing an ontological framework for their documentation, mapping and integrating the retrieved information using RDF.

Ontology for textual corpora

  • Ontology-based modelling of relationships between textual and visual elements. Using TEI editions of emblem books and iconographical treatises as case studies, the task focuses on documenting, through ontologies, the connections between textual elements, textual content, visual elements within the page, and visual references outside the text. The goal is to develop a small ontology to support retrieval across multimodal corpora. Linked with TextEnt.