COMPARATIVE ANALYSIS OF THE EFFECTIVENESS OF LARGE LANGUAGE MODELS FOR METAPHOR IDENTIFICATION: ZERO-SHOT AND FINE-TUNING METHODS

Бистров, Яків Володимирович; Большаков, Нестор Віталійович

Please use this identifier to cite or link to this item: http://hdl.handle.net/123456789/23932

Title:	COMPARATIVE ANALYSIS OF THE EFFECTIVENESS OF LARGE LANGUAGE MODELS FOR METAPHOR IDENTIFICATION: ZERO-SHOT AND FINE-TUNING METHODS
Other Titles:	ПОРІВНЯЛЬНИЙ АНАЛІЗ ЕФЕКТИВНОСТІ ВЕЛИКИХ МОВНИХ МОДЕЛЕЙ ДЛЯ ІДЕНТИФІКАЦІЇ МЕТАФОР: МЕТОДИ НУЛЬОВОГО ЗАПИТУ ТА ДОНАЛАШТУВАННЯ
Authors:	Бистров, Яків Володимирович Большаков, Нестор Віталійович
Keywords:	large language models (LLM), metaphor, metaphor identification, finetuning, zero-shot, computational linguistics, natural language processing (NLP)
Issue Date:	2025
Publisher:	Видавничий дім "Гельветика"
Citation:	Bystrov, Y., Bolshakov, N. Comparative analysis of the effectiveness of large language models for metaphor identification: zero-shot and fine-tuning methods. Folium. Одеса : Видавничий дім «Гельветика», 2025. № 7. С. 61-68. DOI: 10.32782/folium/2025.7.9
Series/Report no.:	;7
Abstract:	The article addresses the challenge of automatic metaphor identification, one of the most complex tasks in natural language processing (NLP). Based on the principles of cognitive linguistics, which define metaphor as a fundamental mechanism of thinking (Lakoff & Johnson, 1980), the role of metaphor as a powerful framing tool in political and media discourse is explored. Although the ability to analyse metaphorical patterns at scale is crucial for identifying manipulative technologies, the process of recognising them is complicated by contextual dependence, creativity, and the need for encyclopaedic knowledge. One of the main issues addressed in this article is the assessment of the potential of modern large language models (LLMs) for solving the task of automatic metaphor identification. The paper compares two key approaches: using the so-called ‘innate’ knowledge of models without additional tuning (the ‘zero-shot’ approach) and their specialised adaptation through fine-tuning. The effectiveness of the latest models (as of July 2025) from leading developers was investigated: OpenAI (GPT-4o), Google (Gemini 2.5 Pro, Gemini 2.5 Flash), and Anthropic (Claude Sonnet 4). Special attention was paid to the methodology of the experiment. The analysis was based on the NAACL 2020 Shared Task on Metaphor Detection corpus, and standard binary classification metrics were used to evaluate the performance of the models: precision, recall, and the F1-score. The article describes the fine-tuning procedure and identifies practical limitations associated with varying levels of tool availability in leading artificial intelligence ecosystems. The results of the study showed that the baseline models demonstrate low and unbalanced performance, while the finetuning procedure significantly improves their output (F1-Score increases by 24-29%). A comparative analysis of the retrained models revealed that GPT-4o achieves a better balance between recall and precision (F1-Score 64.20%), while Gemini 2.5 Flash retains a slight advantage in precision. The article makes an important contribution to the study of the capabilities of LLMs for analysing figurative language, demonstrating that fine-tuning is an extremely important method for adapting them to complex linguistic tasks.
URI:	http://hdl.handle.net/123456789/23932
Appears in Collections:	Статті та тези (ФІМ)

Files in This Item:

File	Description	Size	Format
Bystrov_Bolshakov article.pdf		1.27 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets