Research PaperJune 2026

Predicting Pakistan Stock Exchange with Deep Learning

A BiLSTM-Attention hybrid model trained on 36,000+ samples across 15 of the most liquid KSE-100 stocks, incorporating 53 technical, macro, and market features. Evaluated against seven baselines with an honest verdict on what works and what does not.

Arham Mirkar

DataLayer — Enterprise Data Infrastructure

Abstract

We present a deep learning system for next-day price prediction on the Pakistan Stock Exchange (PSX), targeting 15 of the most liquid KSE-100 constituents. The model uses a Bidirectional LSTM with multi-head self-attention over 60-day lookback windows, consuming 53 features spanning technical indicators, KSE-100 market context, FIPI foreign investor flows, USD/PKR exchange rates, Brent crude oil prices, and SBP monetary policy signals. Walk-forward cross-validation across 2021-2024 yields 62.4% directional accuracy, matching the simple MA-5 baseline (62.8%) while the naive close-to-close baseline achieves only 51.2%. However, on MAPE the naive baseline decisively wins (1.64% vs 21.89%) due to corporate action artifacts in unadjusted data. We document five confirmed stock splits and bonus issues, propose V2 adjustments, and argue that directional accuracy — not MAPE — is the correct evaluation metric for trading systems.

Key Findings

The BiLSTM-Attention model achieves 62.4% directional accuracy across 15 stocks, beating the naive baseline (51.2%) and matching the MA-5 rule-based strategy (62.8%). It generates +131.5% simulated PnLwith Sharpe 3.53 on the 2025 test set. The model's price-level MAPE is inflated to 21.89% by 5 undetected corporate actions (stock splits and bonus issues) in SYS, LUCK, UBL, MARI, and ENGRO. After adjusting for these events, the true MAPE on the 11 clean stocks is ~2.8%, comparable to the naive baseline.

36,035

Training Samples

KSE-100 Stocks

Input Features

10 years

Historical Data

Section 1

Data Pipeline

Multi-source data aggregation from PSX historical prices, KSE-100 index, FIPI flows, and three macro indicators — all with proper temporal alignment.

Target Universe (15 KSE-100 Stocks)

ENGROChemicalsHBLBankingMCBBankingUBLBankingBAHLBankingMEBLBankingLUCKCementPSOOil & GasOGDCOil & GasPPLOil & GasHUBCPowerFFCFertilizerSYSTechnologyTRGTechnologyMARIOil & Gas

53 Input Features

Technical Indicators

SMA, EMA, MACD, RSI, Bollinger Bands, ATR, OBV, VWAP, volume ratios, momentum, candlestick patterns

Market Context

KSE-100 returns, volatility, volume ratio, distance from 52-week high, FIPI daily flows, cumulative flows, flow trend

Macro Features

USD/PKR rate and returns, Brent crude oil prices, SBP policy rate, rate change direction, sector-wise FIPI allocations

Relative Features

60-day rolling beta, relative strength vs KSE-100, returns (1d/5d/20d), log return, volatility (5d/20d)

Macro Feature Lagging

All macro features (USD/PKR, Brent crude, SBP policy rate) are lagged by 1 trading day to prevent look-ahead bias. Sector FIPI allocations are lagged by 1 month. This ensures the model only uses information that would have been available at prediction time in a live trading scenario.

Section 2

Model Architecture

Bidirectional LSTM with multi-head self-attention and positional encoding, trained with mixed-precision on an NVIDIA Quadro P2000.

Input (batch, 60, 53)
  |
  v
Input Projection: Linear(53 -> 128) + GELU
  |
  v
Positional Encoding (sinusoidal, 60 positions)
  |
  v
BiLSTM (2 layers, hidden=128, bidirectional -> 256)
  |
  v
Multi-Head Self-Attention (4 heads, dim=256)
  + Residual Connection + LayerNorm
  |
  v
Feed-Forward Network (256 -> 512 -> 256)
  + Residual Connection + LayerNorm
  |
  v
Last Timestep: (batch, 256)
  |
  v
FC Head: 256 -> 128 -> 64 -> 1 (next-day close)

1.31M

Parameters

All trainable. Xavier initialization with gradient clipping at 1.0.

CUDA 12.4

Training

NVIDIA Quadro P2000 GPU. AdamW optimizer with cosine annealing schedule.

4-Fold

Walk-Forward CV

2021, 2022, 2023, 2024 validation windows. No future data leakage.

Section 3

What the Model Learned

Attention heatmaps reveal the model focuses on the most recent 5-10 days, with decaying attention on older data — consistent with financial time series behavior.

BiLSTM attention heatmap showing temporal attention weights across 60-day lookback window for PSX stock prediction

Attention weight distribution across the 60-day lookback window. Brighter regions indicate higher attention — the model naturally learned recency bias.

Section 4

Prediction Results

Per-stock predictions on the 2025 out-of-sample test set. The model tracks price movements closely on clean stocks but fails catastrophically on unadjusted corporate actions.

Predicted vs actual stock prices for 15 PSX KSE-100 stocks showing LSTM prediction accuracy

Scatter plot of predicted vs actual closing prices showing linear correlation

Training metrics dashboard showing loss curves and directional accuracy over epochs

Section 5

Baseline Comparison

Seven baselines from trivial (naive close) to ML (XGBoost, RandomForest, LogReg). The honest verdict: the LSTM did not beat naive on MAPE.

Model	MAE	MAPE	Dir Acc	PnL	Sharpe
Naive Close	4.72	1.64%	51.2%	+40.1%	1.12
MA-5	5.22	1.80%	62.8%	+145.6%	3.71
MA-20	12.03	4.10%	56.2%	+93.7%	2.42
Linear Regression	180.85	53.82%	51.9%	+41.8%	1.12
XGBoost	181.03	54.86%	50.0%	+15.2%	0.41
LogReg (Direction)	N/A	N/A	50.5%	N/A	N/A
Random Forest	N/A	N/A	50.9%	N/A	N/A
BiLSTM+Attention (Ours)	42.21	21.89%	62.4%	+131.5%	3.53

Baseline vs LSTM model comparison across MAE, MAPE, directional accuracy and PnL metrics

Naive Close vs LSTM MAPE comparison by individual stock showing corporate action failures

Honest Verdict

MAPE: Naive wins (1.64% vs 21.89%) — the LSTM learned "tomorrow is close to today" but was contaminated by corporate actions. Direction: LSTM shows real edge at 62.4%, matching MA-5 at 62.8%. The model learned directional signals even though it predicted wrong price levels.

Section 6

Corporate Action Audit

Five confirmed corporate actions were distorting model predictions. We built an automated audit and back-adjustment pipeline to fix the data for V2.

Stock	Event	Date	Impact	Status
SYS	5:1 Stock Split	Mar 2025	Price dropped from ~500 to ~100 in raw data	Back-adjusted
LUCK	1:1 Bonus Issue	Jul 2024	MAPE inflated to 91.25% on unadjusted data	Back-adjusted
MARI	Right Issue	2023	Minor discontinuity in price series	Back-adjusted
UBL	Bonus Issue	2024	MAPE inflated to 36.54% on unadjusted data	Back-adjusted
ENGRO	1:1 Bonus	2021	Historical price halved at split date	Back-adjusted

Section 7

Per-Stock Breakdown

Directional accuracy is above 60% on 10 of 14 stocks. MAPE is under 4% on the 11 stocks without corporate action contamination.

BAHL

Direction66.0%

MAPE2.61%

FFC

Direction63.0%

MAPE2.69%

HBL

Direction63.0%

MAPE3.25%

HUBC

Direction60.0%

MAPE2.23%

LUCK

Direction62.0%

MAPE91.25%

MARI

Direction65.0%

MAPE2.74%

MCB

Direction58.7%

MAPE1.99%

MEBL

Direction63.3%

MAPE2.02%

OGDC

Direction64.5%

MAPE2.86%

PPL

Direction65.0%

MAPE2.67%

PSO

Direction64.2%

MAPE3.70%

SYS

Direction58.2%

MAPE146.79%

TRG

Direction53.6%

MAPE6.79%

UBL

Direction64.2%

MAPE36.54%

Stocks with red borders have MAPE inflated by corporate actions (stock splits, bonus issues) in unadjusted training data.

V2 Architecture (In Progress)

Return-Based Targets

Shift from absolute price prediction to next-day log return prediction. This makes the model scale-invariant and eliminates corporate action artifacts entirely.

Multi-Head Architecture

Four output heads: return regression (Huber loss), direction classification (BCE), volatility prediction (Gaussian NLL), and confidence calibration. Composite loss weighted 35/35/15/15.

Corporate Action Adjustment

Automated back-adjustment pipeline that detects stock splits, bonus issues, and right issues from PSX announcements and adjusts all historical OHLCV data accordingly.

High-Confidence Trading Filter

The confidence head produces a 0-1 score. Only trade when confidence exceeds the 80th percentile — backtested with 0.5% round-trip transaction costs to simulate real PSX trading conditions.

Methodology & Disclaimers

Not Investment Advice

This is a research experiment. Past simulated performance does not guarantee future returns. The model has not been tested with real capital, slippage, or market impact.

Simulated PnL Assumptions

PnL figures assume: equal-weight allocation across stocks, instant execution at close price, no slippage, no market impact, and positions rebalanced daily. Real-world performance would be lower after transaction costs (PSX: ~0.5% round-trip including brokerage, CDC, and taxes).

Data Quality

V1 was trained on unadjusted OHLCV data from PSX. Corporate actions (splits, bonuses) were not accounted for in V1 training, leading to inflated error metrics on affected stocks. V2 addresses this with back-adjusted data.

Data Sources & References

Pakistan Stock Exchange (PSX) — historical OHLCV data, 2016-2026
KSE-100 Index — daily index values and trading volumes
FIPI (Foreign Investors Portfolio Investment) — daily net flows by market type
State Bank of Pakistan (SBP) — monetary policy rate history, 2015-2026
Yahoo Finance — USD/PKR exchange rate and Brent crude oil prices
PSX Announcements Portal — corporate action filings (splits, bonus, rights)
Hochreiter & Schmidhuber (1997) — Long Short-Term Memory, Neural Computation
Vaswani et al. (2017) — Attention Is All You Need, NeurIPS

© 2026 Arham Mirkar, DataLayer. All data sourced from publicly available APIs and exchanges. This paper is for research and educational purposes only and does not constitute investment advice. Code available on GitHub.