Large Language Models in Financial Decision-Making: A Methodological Framework for Evaluating AI Trading Strategies
By Theo Nicolas Sitjar
Large Language Models (LLMs) offer new possibilities for financial decision-making, but evaluating their effectiveness in trading requires systematic approaches. This paper describes a practical framework for assessing LLM performance in stock market scenarios. Our method follows a 5-step process: data preparation, prompt engineering, LLM inference, backtesting, and statistical analysis. We include memory mechanisms and standard risk metrics to evaluate trading strategies comprehensively. Through testing against fifteen traditional quantitative baseline strategies, we examine both the potential benefits and current limitations of LLMs in finance. The framework helps identify issues with overfitting, confidence calibration, and behavioral consistency, while showing where LLMs may be useful for pattern recognition. Case-study outputs in this paper are presented as methodological demonstrations of framework diagnostics, not as claims of generalizable trading edge. Our framework offers a practical starting point for systematic LLM evaluation in finance, providing essential methodological tools for researchers and practitioners.
Source SSRN
