Research PaperJune 2026

Predicting Pakistan Stock Exchange with Deep Learning

A BiLSTM-Attention hybrid model trained on 36,000+ samples across 15 of the most liquid KSE-100 stocks, incorporating 53 technical, macro, and market features. Evaluated against seven baselines with an honest verdict on what works and what does not.

AM

Arham Mirkar

DataLayer — Enterprise Data Infrastructure

Abstract

We present a deep learning system for next-day price prediction on the Pakistan Stock Exchange (PSX), targeting 15 of the most liquid KSE-100 constituents. The model uses a Bidirectional LSTM with multi-head self-attention over 60-day lookback windows, consuming 53 features spanning technical indicators, KSE-100 market context, FIPI foreign investor flows, USD/PKR exchange rates, Brent crude oil prices, and SBP monetary policy signals. Walk-forward cross-validation across 2021-2024 yields 62.4% directional accuracy, matching the simple MA-5 baseline (62.8%) while the naive close-to-close baseline achieves only 51.2%. However, on MAPE the naive baseline decisively wins (1.64% vs 21.89%) due to corporate action artifacts in unadjusted data. We document five confirmed stock splits and bonus issues, propose V2 adjustments, and argue that directional accuracy — not MAPE — is the correct evaluation metric for trading systems.

Key Findings

The BiLSTM-Attention model achieves 62.4% directional accuracy across 15 stocks, beating the naive baseline (51.2%) and matching the MA-5 rule-based strategy (62.8%). It generates +131.5% simulated PnLwith Sharpe 3.53 on the 2025 test set. The model's price-level MAPE is inflated to 21.89% by 5 undetected corporate actions (stock splits and bonus issues) in SYS, LUCK, UBL, MARI, and ENGRO. After adjusting for these events, the true MAPE on the 11 clean stocks is ~2.8%, comparable to the naive baseline.

36,035
Training Samples
15
KSE-100 Stocks
53
Input Features
10 years
Historical Data
Section 1

Data Pipeline

Multi-source data aggregation from PSX historical prices, KSE-100 index, FIPI flows, and three macro indicators — all with proper temporal alignment.

Target Universe (15 KSE-100 Stocks)

ENGROChemicalsHBLBankingMCBBankingUBLBankingBAHLBankingMEBLBankingLUCKCementPSOOil & GasOGDCOil & GasPPLOil & GasHUBCPowerFFCFertilizerSYSTechnologyTRGTechnologyMARIOil & Gas

53 Input Features

25

Technical Indicators

SMA, EMA, MACD, RSI, Bollinger Bands, ATR, OBV, VWAP, volume ratios, momentum, candlestick patterns

7

Market Context

KSE-100 returns, volatility, volume ratio, distance from 52-week high, FIPI daily flows, cumulative flows, flow trend

14

Macro Features

USD/PKR rate and returns, Brent crude oil prices, SBP policy rate, rate change direction, sector-wise FIPI allocations

7

Relative Features

60-day rolling beta, relative strength vs KSE-100, returns (1d/5d/20d), log return, volatility (5d/20d)

Macro Feature Lagging

All macro features (USD/PKR, Brent crude, SBP policy rate) are lagged by 1 trading day to prevent look-ahead bias. Sector FIPI allocations are lagged by 1 month. This ensures the model only uses information that would have been available at prediction time in a live trading scenario.

Section 2

Model Architecture

Bidirectional LSTM with multi-head self-attention and positional encoding, trained with mixed-precision on an NVIDIA Quadro P2000.

Input (batch, 60, 53)
  |
  v
Input Projection: Linear(53 -> 128) + GELU
  |
  v
Positional Encoding (sinusoidal, 60 positions)
  |
  v
BiLSTM (2 layers, hidden=128, bidirectional -> 256)
  |
  v
Multi-Head Self-Attention (4 heads, dim=256)
  + Residual Connection + LayerNorm
  |
  v
Feed-Forward Network (256 -> 512 -> 256)
  + Residual Connection + LayerNorm
  |
  v
Last Timestep: (batch, 256)
  |
  v
FC Head: 256 -> 128 -> 64 -> 1 (next-day close)
1.31M
Parameters

All trainable. Xavier initialization with gradient clipping at 1.0.

CUDA 12.4
Training

NVIDIA Quadro P2000 GPU. AdamW optimizer with cosine annealing schedule.

4-Fold
Walk-Forward CV

2021, 2022, 2023, 2024 validation windows. No future data leakage.

Section 3

What the Model Learned

Attention heatmaps reveal the model focuses on the most recent 5-10 days, with decaying attention on older data — consistent with financial time series behavior.

BiLSTM attention heatmap showing temporal attention weights across 60-day lookback window for PSX stock prediction

Attention weight distribution across the 60-day lookback window. Brighter regions indicate higher attention — the model naturally learned recency bias.

Section 4

Prediction Results

Per-stock predictions on the 2025 out-of-sample test set. The model tracks price movements closely on clean stocks but fails catastrophically on unadjusted corporate actions.

Predicted vs actual stock prices for 15 PSX KSE-100 stocks showing LSTM prediction accuracy
Scatter plot of predicted vs actual closing prices showing linear correlation
Training metrics dashboard showing loss curves and directional accuracy over epochs
Section 5

Baseline Comparison

Seven baselines from trivial (naive close) to ML (XGBoost, RandomForest, LogReg). The honest verdict: the LSTM did not beat naive on MAPE.

ModelMAEMAPEDir AccPnLSharpe
Naive Close4.721.64%51.2%+40.1%1.12
MA-55.221.80%62.8%+145.6%3.71
MA-2012.034.10%56.2%+93.7%2.42
Linear Regression180.8553.82%51.9%+41.8%1.12
XGBoost181.0354.86%50.0%+15.2%0.41
LogReg (Direction)N/AN/A50.5%N/AN/A
Random ForestN/AN/A50.9%N/AN/A
BiLSTM+Attention (Ours)42.2121.89%62.4%+131.5%3.53
Baseline vs LSTM model comparison across MAE, MAPE, directional accuracy and PnL metrics
Naive Close vs LSTM MAPE comparison by individual stock showing corporate action failures

Honest Verdict

MAPE: Naive wins (1.64% vs 21.89%) — the LSTM learned "tomorrow is close to today" but was contaminated by corporate actions. Direction: LSTM shows real edge at 62.4%, matching MA-5 at 62.8%. The model learned directional signals even though it predicted wrong price levels.

Section 6

Corporate Action Audit

Five confirmed corporate actions were distorting model predictions. We built an automated audit and back-adjustment pipeline to fix the data for V2.

StockEventDateImpactStatus
SYS5:1 Stock SplitMar 2025Price dropped from ~500 to ~100 in raw dataBack-adjusted
LUCK1:1 Bonus IssueJul 2024MAPE inflated to 91.25% on unadjusted dataBack-adjusted
MARIRight Issue2023Minor discontinuity in price seriesBack-adjusted
UBLBonus Issue2024MAPE inflated to 36.54% on unadjusted dataBack-adjusted
ENGRO1:1 Bonus2021Historical price halved at split dateBack-adjusted
Section 7

Per-Stock Breakdown

Directional accuracy is above 60% on 10 of 14 stocks. MAPE is under 4% on the 11 stocks without corporate action contamination.

BAHL
Direction66.0%
MAPE2.61%
FFC
Direction63.0%
MAPE2.69%
HBL
Direction63.0%
MAPE3.25%
HUBC
Direction60.0%
MAPE2.23%
LUCK
Direction62.0%
MAPE91.25%
MARI
Direction65.0%
MAPE2.74%
MCB
Direction58.7%
MAPE1.99%
MEBL
Direction63.3%
MAPE2.02%
OGDC
Direction64.5%
MAPE2.86%
PPL
Direction65.0%
MAPE2.67%
PSO
Direction64.2%
MAPE3.70%
SYS
Direction58.2%
MAPE146.79%
TRG
Direction53.6%
MAPE6.79%
UBL
Direction64.2%
MAPE36.54%

Stocks with red borders have MAPE inflated by corporate actions (stock splits, bonus issues) in unadjusted training data.

V2 Architecture (In Progress)

Return-Based Targets

Shift from absolute price prediction to next-day log return prediction. This makes the model scale-invariant and eliminates corporate action artifacts entirely.

Multi-Head Architecture

Four output heads: return regression (Huber loss), direction classification (BCE), volatility prediction (Gaussian NLL), and confidence calibration. Composite loss weighted 35/35/15/15.

Corporate Action Adjustment

Automated back-adjustment pipeline that detects stock splits, bonus issues, and right issues from PSX announcements and adjusts all historical OHLCV data accordingly.

High-Confidence Trading Filter

The confidence head produces a 0-1 score. Only trade when confidence exceeds the 80th percentile — backtested with 0.5% round-trip transaction costs to simulate real PSX trading conditions.

Methodology & Disclaimers

Not Investment Advice

This is a research experiment. Past simulated performance does not guarantee future returns. The model has not been tested with real capital, slippage, or market impact.

Simulated PnL Assumptions

PnL figures assume: equal-weight allocation across stocks, instant execution at close price, no slippage, no market impact, and positions rebalanced daily. Real-world performance would be lower after transaction costs (PSX: ~0.5% round-trip including brokerage, CDC, and taxes).

Data Quality

V1 was trained on unadjusted OHLCV data from PSX. Corporate actions (splits, bonuses) were not accounted for in V1 training, leading to inflated error metrics on affected stocks. V2 addresses this with back-adjusted data.

Data Sources & References

  1. Pakistan Stock Exchange (PSX) — historical OHLCV data, 2016-2026
  2. KSE-100 Index — daily index values and trading volumes
  3. FIPI (Foreign Investors Portfolio Investment) — daily net flows by market type
  4. State Bank of Pakistan (SBP) — monetary policy rate history, 2015-2026
  5. Yahoo Finance — USD/PKR exchange rate and Brent crude oil prices
  6. PSX Announcements Portal — corporate action filings (splits, bonus, rights)
  7. Hochreiter & Schmidhuber (1997) — Long Short-Term Memory, Neural Computation
  8. Vaswani et al. (2017) — Attention Is All You Need, NeurIPS

© 2026 Arham Mirkar, DataLayer. All data sourced from publicly available APIs and exchanges. This paper is for research and educational purposes only and does not constitute investment advice. Code available on GitHub.