Skip to main navigation Skip to search Skip to main content

LLMs4SchemaDiscovery: A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models

Sameer Sadruddin, Jennifer D’Souza*, Eleni Poupaki, Alex Watkins, Hamed Babaei Giglou, Anisa Rula, Bora Karasulu, Sören Auer, Adrie Mackus, Erwin Kessels

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Abstract

Extracting structured information from unstructured text is crucial for modeling real-world processes, but traditional schema mining relies on semi-structured data, limiting scalability. This paper introduces schema-miner, a novel tool that combines large language models with human feedback to automate and refine schema extraction. Through an iterative workflow, it organizes properties from text, incorporates expert input, and integrates domain-specific ontologies for semantic depth. Applied to materials science—specifically atomic layer deposition—schema-miner demonstrates that expert-guided LLMs generate semantically rich schemas suitable for diverse real-world applications.

Original languageEnglish
Title of host publicationThe Semantic Web
Subtitle of host publication22nd European Semantic Web Conference, ESWC 2025, Proceedings
EditorsEdward Curry, Maribel Acosta, Maria Poveda-Villalón, Marieke van Erp, Adegboyega Ojo, Katja Hose, Cogan Shimizu, Pasquale Lisena
PublisherSpringer Science and Business Media Deutschland GmbH
Pages244-261
Number of pages18
ISBN (Electronic)978-3-031-94578-6
ISBN (Print)9783031945779
DOIs
Publication statusPublished - 31 May 2025
Event22nd European Semantic Web Conference, ESWC 2025 - Portoroz, Slovenia
Duration: 1 Jun 20255 Jun 2025

Publication series

NameLecture Notes in Computer Science
Volume15719 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference22nd European Semantic Web Conference, ESWC 2025
Abbreviated titleESWC 2025
Country/TerritorySlovenia
CityPortoroz
Period1 Jun 20255 Jun 2025

Keywords

  • Human-in-the-loop Workflow
  • Large Language Models
  • Schema Discovery
  • Schema Mining
  • Scientific Schemas

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Cite this