COMPARATIVE ANALYSIS OF THE EFFECTIVENESS OF LARGE LANGUAGE MODELS FOR METAPHOR IDENTIFICATION: ZERO-SHOT AND FINE-TUNING METHODS

Бистров, Яків Володимирович; Большаков, Нестор Віталійович

Будь ласка, використовуйте цей ідентифікатор, щоб цитувати або посилатися на цей матеріал: http://hdl.handle.net/123456789/23932

Назва:	COMPARATIVE ANALYSIS OF THE EFFECTIVENESS OF LARGE LANGUAGE MODELS FOR METAPHOR IDENTIFICATION: ZERO-SHOT AND FINE-TUNING METHODS
Інші назви:	ПОРІВНЯЛЬНИЙ АНАЛІЗ ЕФЕКТИВНОСТІ ВЕЛИКИХ МОВНИХ МОДЕЛЕЙ ДЛЯ ІДЕНТИФІКАЦІЇ МЕТАФОР: МЕТОДИ НУЛЬОВОГО ЗАПИТУ ТА ДОНАЛАШТУВАННЯ
Автори:	Бистров, Яків Володимирович Большаков, Нестор Віталійович
Ключові слова:	large language models (LLM), metaphor, metaphor identification, finetuning, zero-shot, computational linguistics, natural language processing (NLP)
Дата публікації:	2025
Видавництво:	Видавничий дім "Гельветика"
Бібліографічний опис:	Bystrov, Y., Bolshakov, N. Comparative analysis of the effectiveness of large language models for metaphor identification: zero-shot and fine-tuning methods. Folium. Одеса : Видавничий дім «Гельветика», 2025. № 7. С. 61-68. DOI: 10.32782/folium/2025.7.9
Серія/номер:	;7
Короткий огляд (реферат):	The article addresses the challenge of automatic metaphor identification, one of the most complex tasks in natural language processing (NLP). Based on the principles of cognitive linguistics, which define metaphor as a fundamental mechanism of thinking (Lakoff & Johnson, 1980), the role of metaphor as a powerful framing tool in political and media discourse is explored. Although the ability to analyse metaphorical patterns at scale is crucial for identifying manipulative technologies, the process of recognising them is complicated by contextual dependence, creativity, and the need for encyclopaedic knowledge. One of the main issues addressed in this article is the assessment of the potential of modern large language models (LLMs) for solving the task of automatic metaphor identification. The paper compares two key approaches: using the so-called ‘innate’ knowledge of models without additional tuning (the ‘zero-shot’ approach) and their specialised adaptation through fine-tuning. The effectiveness of the latest models (as of July 2025) from leading developers was investigated: OpenAI (GPT-4o), Google (Gemini 2.5 Pro, Gemini 2.5 Flash), and Anthropic (Claude Sonnet 4). Special attention was paid to the methodology of the experiment. The analysis was based on the NAACL 2020 Shared Task on Metaphor Detection corpus, and standard binary classification metrics were used to evaluate the performance of the models: precision, recall, and the F1-score. The article describes the fine-tuning procedure and identifies practical limitations associated with varying levels of tool availability in leading artificial intelligence ecosystems. The results of the study showed that the baseline models demonstrate low and unbalanced performance, while the finetuning procedure significantly improves their output (F1-Score increases by 24-29%). A comparative analysis of the retrained models revealed that GPT-4o achieves a better balance between recall and precision (F1-Score 64.20%), while Gemini 2.5 Flash retains a slight advantage in precision. The article makes an important contribution to the study of the capabilities of LLMs for analysing figurative language, demonstrating that fine-tuning is an extremely important method for adapting them to complex linguistic tasks.
URI (Уніфікований ідентифікатор ресурсу):	http://hdl.handle.net/123456789/23932
Розташовується у зібраннях:	Статті та тези (ФІМ)

Файли цього матеріалу:

Файл	Опис	Розмір	Формат
Bystrov_Bolshakov article.pdf		1.27 MB	Adobe PDF	Переглянути/Відкрити

Показати повний опис матеріалу Перегляд статистики

Усі матеріали в архіві електронних ресурсів захищені авторським правом, всі права збережені.

DSpace JSPUI

DSpace зберігає і дозволяє легкий і відкритий доступ до всіх видів цифрового контенту, включаючи текст, зображення, анімовані зображення, MPEG і набори даних