Methods for Aligning Text and Numerical Data in Algorithmic Trading
The Core Problem
In modern quantitative trading, two fundamentally different data streams must work together:
- Numerical data (tick data, K-lines, order book snapshots) arrives at microsecond-to-minute frequency on a continuous, synchronous clock.
- Text data (news, tweets, filings, government documents) arrives irregularly as discrete events on a sparse, asynchronous clock.
When an LLM is used as a text feature extractor — converting raw text into structured signals like {"sentiment": 0.8, "relevance": 0.9, "urgency": 0.5} — the resulting features still live on a fundamentally different temporal grid than price. Naïvely forcing alignment (via forward-filling, zero-filling, or fixed-window aggregation) introduces biases: staleness from carry-forward, sparsity from zero-fill, and information loss from coarse bucketing. The misalignment is not merely a data-engineering inconvenience — it is a compound problem spanning temporal asynchrony, semantic heterogeneity, and feature-space incompatibility.
This report catalogues the principal methods that have been developed to address this misalignment, organized from foundational heuristics through advanced neural architectures.
1 · Event-Driven Resampling
Core idea. Invert the standard paradigm: instead of forcing text onto the tick clock, force ticks onto the event clock. The unit of analysis becomes a dynamic window around each text event rather than a fixed time bar. The system only evaluates trades when a text signal fires, grabbing the surrounding numerical context (price, volume, spread, order imbalance) from that event horizon.
Strengths. Every row in the training set is guaranteed to contain a rich textual event plus its localized market context. This eliminates matrix sparsity and forward-fill bias. The approach aligns naturally with Information-Driven Bars (López de Prado), where the market is sampled only when a threshold of information has arrived.
Extensions in the literature.
- Neural Hawkes Processes model the self-exciting and mutually-exciting clustering of news arrivals and trade events, capturing how a negative earnings report triggers algorithmic sell cascades that trigger further news alerts. The conditional intensity function learns cross-excitation kernels between text-type and price-type events, replacing hand-tuned windows with data-driven interaction structure.
- Complex Event Processing (CEP) systems (e.g., solutions to the DEBS 2022 Grand Challenge built on Apache Flink) operationalize event-driven execution at production scale, processing high-volume tick streams with real-time pattern detection.
- Adaptive Event-Driven Labeling (AEDL) integrates multi-scale temporal analysis to capture hierarchical causal patterns at different time granularities around events.
Limitations. Markets exhibit massive price volatility in the complete absence of news. An event-only clock is blind to momentum shifts, liquidity cascades, and purely technical microstructure breakdowns. Events with ambiguous timestamps (e.g., filings marked only by date) introduce residual alignment error.
2 · Decay Functions on Text Features
Core idea. Convert each discrete text event into a continuous signal via exponential decay: sentiment(t) = s₀ × e^{-λ(t - t₀)}. The text feature now lives in the same continuous time domain as price. The challenge is choosing λ — it differs for an earnings miss versus a CEO tweet versus a central bank statement.
Strengths. Conceptually simple, computationally cheap, and directly interpretable. Preserves information value over time rather than hard-truncating at a window boundary.
Extensions in the literature.
- Volatility-adaptive decay replaces static λ with a dynamic
λ(t) = λ₀ + λ₁ · tanh((vol(t) − μ) / σ), accelerating decay in high-volatility regimes and slowing it in calm markets. Empirical tests show this lifts short-term prediction accuracy from ~60% to ~65%. - Asymmetric decay uses different rates for positive vs. negative sentiment, reflecting behavioral finance evidence that bad news persists ~2× longer than good news.
- MIDAS regressions (Ghysels et al.) generalize hand-picked decay into a parametric distributed lag polynomial estimated from data, with decades of econometric theory behind weight design. This treats text features as high-frequency regressors and learns optimal weight schedules per event class.
Limitations. A single monotone decay cannot capture multi-peaked influence patterns from sustained events (e.g., multi-round sanctions). The approach also cannot model interaction effects — where the combination of two simultaneous text events produces a non-additive market response.
3 · Dual-Frequency Architecture
Core idea. Don't merge the modalities. Run two parallel systems: a fast numerical system (tick-level) and a slow text system (event-level). The text system adjusts the regime, parameters, or risk limits of the numerical system rather than feeding into the same model. Text sets the macro context; numbers handle micro execution.
Strengths. Isolates modality interference — high-frequency numerical noise cannot drown out low-frequency text signals, and vice versa. Each branch can use an architecture optimized for its data characteristics (e.g., CNN+LSTM for ticks, Transformer for text).
Extensions in the literature.
- Adaptive Frequency Fusion (AFF / VoT framework) operates in the frequency domain: both branches' predictions are FFT-decomposed, then learnable frequency-specific weights route text-derived signals to low-frequency bands and numerical signals to high-frequency bands. Inverse FFT reconstructs the unified prediction.
- Wavelet-enhanced fusion uses Continuous Wavelet Transforms for superior time-frequency localization on non-stationary financial data, combined with gated mechanisms that let text severity dynamically modulate temporal graph edge weights.
- Gated fusion (TFT / TIC-FusionNet) uses Variable Selection Networks or CBAM-based gates that evaluate contextual relevance in real time, dynamically suppressing stale or irrelevant text features when the market is in a purely technical regime, and amplifying text weight during regime shifts.
Limitations. The two branches may fail to learn fine-grained cross-modal correlations (e.g., which specific sentence in a filing maps to which specific order-flow pattern). Computational cost is roughly double that of single-branch models. Interpretability of the fusion weights is often weak.
4 · Cross-Modal Attention
Core idea. Replace hard temporal alignment with learned soft attention across modalities. A target modality (price) attends to a source modality (text) across different time steps via differentiable attention weights — no pre-alignment required.
Key works.
- MulT (Multimodal Transformer) — Tsai et al., ACL 2019, ~2,500 citations. Directional pairwise cross-modal attention reinforcing one modality with features from another regardless of temporal alignment. Open-source at ~700 GitHub stars.
- FAST — Sawhney et al., EACL 2021. Time-aware LSTM encoding inverse time intervals between consecutive text releases, handling intra-day temporal irregularity directly.
- SALMON / MSE-ITT — arXiv 2025. Modality-specific Mixture-of-Experts on top of Llama3-8B, processing interleaved text and time-series tokens ordered by timestamp. Key finding: cross-modal attention in early transformer layers is detrimental; modality-specific processing should occur first, with fusion only in later layers.
- MSGCA — trimodal encoding (price, documents, relational graphs) where the primary modality (price) guides fusion through gated cross-attention that suppresses noisy or temporally misaligned text signals.
- MM-iTransformer — inverts the standard paradigm by embedding entire variable histories as single variate tokens rather than embedding individual time steps, enabling cross-attention across variables instead of across time.
Strengths. Eliminates explicit alignment entirely. Scales well and can discover complex, non-linear temporal correspondences.
Limitations. Requires substantial paired training data. Provides limited theoretical guarantees about what temporal relationship the attention learned.
5 · Continuous-Time Neural Dynamics
Core idea. Model the latent market state as a continuous dynamical system, updated asynchronously whenever any observation — tick, news article, filing — arrives. This eliminates discretization entirely.
Key model families.
| Model | Core Mechanism | Relevance to Text-Price Fusion |
|---|---|---|
| Neural ODE (Chen et al., NeurIPS 2018, ~10K citations) | dh/dt = f_θ(h(t), t) with adaptive ODE solvers | Processes data at arbitrary, non-uniform time points |
| Latent ODE / ODE-RNN (Rubanova et al., NeurIPS 2019, ~2.2K citations) | Continuous dynamics between observations, discrete updates at arrivals | Jointly models when observations occur via Poisson processes |
| Neural CDE (Kidger et al., NeurIPS 2020 Spotlight, ~800 citations) | dz = f_θ(z) dX where X is a continuous control path | Naturally handles channels observed at different rates; universal approximation proven |
| Online Neural CDE (Morrill & Kidger, 2021) | Rectilinear interpolation replacing non-causal cubic splines | Deployable for live trading without look-ahead bias |
| Neural Jump ODE (NJ-ODE) (ICLR 2021) | ODE evolves continuously between events, "jumps" at observations | Most natural framework for text+price: ODE drift between ticks, jump at news arrival |
| Neural SDE (Kidger et al., ICML 2021) | Adds diffusion term for stochastic paths | Better captures financial randomness |
| Stable Neural SDE (Oh et al., ICLR 2024 Spotlight) | Solves training instability | Removes practical deployment blocker |
| Neural Merton Jump Diffusion (arXiv 2025) | Itô diffusion + compound Poisson jumps | Continuous price evolution + discrete news shocks in one SDE |
| mTAN (Shukla & Marlin, ICLR 2021, ~400 citations) | Learned continuous-time embeddings with attention | 85× faster than ODE-based methods; extended to cross-modal fusion |
| GRU-ODE-Bayes (de Brouwer et al., NeurIPS 2019, ~400 citations) | Bayesian updates for sporadically observed data | Handles channels updating at wildly different rates |
| ContiFormer (NeurIPS 2024) | Continuous-time Transformer with ODE trajectories per observation | Combines Transformer parallelism with Neural ODE dynamics |
| MambaStock (arXiv 2024, ~300 GitHub stars) | Selective state space mechanism for stock prediction | Outperforms Transformers and LSTMs on stock tasks |
Critical gap identified across all surveyed literature: No published paper simultaneously combines rich NLP encoders (FinBERT/LLM), neural temporal point processes, and continuous-time latent dynamics into a single end-to-end architecture for text+price fusion. The tools exist; the combination remains an open field.
Key open-source tooling: torchdiffeq (~5,500 stars), torchcde (~434 stars), torchsde, latent_ode (~1,200 stars).
6 · Temporal Point Processes
Core idea. Model the arrival times and types of events as stochastic processes with self- and cross-excitation. A news article's arrival increases the probability of subsequent price moves, and vice versa. Both modalities become events in a single mathematical framework.
Classical foundations.
- Hawkes processes in high-frequency finance (Bacry et al., 2015, ~1,000 citations) established that financial markets operate near the critical branching ratio — past events drive most future activity.
- Yang et al. (Quantitative Finance, 2018) modeled a 4-variate Hawkes process over {positive returns, negative returns, positive sentiment, negative sentiment} using intraday S&P 500 data. Key finding: return-event decay is ~2× faster than sentiment-event decay, precisely quantifying the temporal misalignment.
Neural extensions.
| Model | Innovation |
|---|---|
| Neural Hawkes Process (Mei & Eisner, NeurIPS 2017, ~800 citations) | Continuous-time LSTM replaces parametric kernels; allows non-linear, non-additive interaction effects |
| RMTPP (Du et al., KDD 2016, ~1,000 citations) | RNN + marked temporal point process for encoding full event histories |
| Transformer Hawkes Process (Zuo et al., ICML 2020) | Self-attention with continuous-time positional encodings for long-range event dependencies |
| TPP-LLM (Liu & Quan, arXiv 2024 / ICLR 2025 Workshop) | First model processing actual text content within the point process framework via LLM fine-tuning (LoRA), rather than reducing text to scalar sentiment |
How it solves misalignment. Both modalities are represented as timestamped marked events in continuous time. The LLM's structured output (sentiment, relevance, urgency, entity) becomes marks that modulate excitation strength or kernel parameters. No resampling or decay tuning is needed — the model learns event-type-specific and context-specific temporal influence patterns directly.
Key open-source tooling: tick library (~500 stars) for classical Hawkes; EasyTPP (Ant Research, ~400 stars) for neural TPP benchmarking.
7 · Mixed-Frequency Econometrics & State-Space Models
Core idea. Treat different sampling rates as a feature, not a bug. Use formal frameworks from econometrics that were explicitly built to handle mixed-frequency observations and irregular arrivals.
MIDAS regressions (Ghysels et al., JFE 2005/JoE 2006, ~1,000+ citations across variants) regress a low-frequency target on high-frequency regressors using parsimonious polynomial weighting schemes. Unlike hand-picked decay, the weight schedule is estimated from data. Reverse-MIDAS variants forecast high-frequency variables from low-frequency ones. Recent intraday MIDAS work predicts 3-minute returns from half-hourly sentiment, achieving 19% MAE reduction. A counterintuitive finding: sentiment during non-trading hours is more informative than during trading hours.
Mixed-frequency VAR models characterize both frequency mismatch and the timing of information releases, with real-time updating as new higher-frequency observations arrive — formalizing event-driven updates in a multivariate dynamic system.
State-space / Kalman filtering maintains a latent market state updated by both frequent market observations and sporadic text events, without ever requiring a shared clock. The Kalman filter provides optimal sequential updating under linear-Gaussian assumptions; nonlinear extensions (EKF, UKF, particle filters) handle more complex dynamics.
Strengths. Theoretically grounded, interpretable, and proven in production macro-finance. Provides estimable weighting structures and explicit information arrival times.
Limitations. Classical MIDAS assumes relatively simple parametric relationships. Struggles with the non-linear, context-dependent effects that deep learning captures.
8 · Information-Theoretic & Causal Approaches
Core idea. Measure and exploit the directional information flow between text and price to build alignment that respects causality.
Key methods.
- Transfer entropy (Souza & Aste, 2016/2019) revealed that nonlinear transfer entropy detects an order of magnitude more causality between social media and stock returns than linear Granger causality — the relationship is purely nonlinear and invisible to standard VAR models. Open-source: PyCausality library.
- CausalImpact (Brodersen et al., Google, Annals of Applied Statistics, 2015, ~1,500 citations) provides a Bayesian structural time-series framework for constructing counterfactuals around discrete interventions — directly applicable to measuring "what would the price have done absent this news event."
- Granger-causality learning for Hawkes processes (Xu et al., ICML 2016) recovers sparse interaction structure via group sparsity, providing tools to test whether "text → price" excitation is genuine versus "price → text commentary" (endogeneity).
- Dynamic Transfer Entropy + TFT (Díaz Berenguer et al., 2024) bridges information theory with deep learning by feeding causality features into Temporal Fusion Transformers.
Value for alignment. These tools diagnose whether alignment is capturing real causal structure or spurious correlation, and quantify the actual information content and temporal lag of text signals.
9 · Temporal Knowledge Graphs & Graph Neural Networks
Core idea. Encode not just text content but the relational structure between entities, events, and prices. Propagate text-derived signals across connected assets through learned graph dynamics.
Key works.
- FinDKG (Li & Sanna Passino, ACM ICAIF 2024) — fine-tuned 7B LLM extracts event quadruples (subject, relation, object, timestamp) from news; KGTransformer learns over rolling temporal KG snapshots. >10% uplift on link prediction versus SOTA temporal KG baselines.
- TRACE (arXiv 2025) — symbolic path mining on temporal KGs with temporal validity constraints ensuring only causally valid (past→present) reasoning chains, validated by LLM-guided graph exploration.
- THGNN (CIKM 2022) — learns dynamic inter-stock relations without manual graph construction.
- Semantic Company Relationship Graphs (SCRG) — LLM-extracted company co-occurrence and sentiment spillover metrics construct real-time relational graphs; combined with Spatial-Temporal GNNs, text-derived signals propagate across structurally linked equities with learned latency and attenuation.
- DGRCL — combines dynamic graph relations with contrastive learning to capture both temporal evolution and relationship constraints from text.
Value for alignment. Solves cross-sectional misalignment: a news event about one company propagates to related companies with different delays, which cannot be captured by single-asset alignment methods.
10 · Cross-Modal Reprogramming
Core idea. Instead of extracting features from text and feeding them to a numerical model, transform numerical data into the text domain and let the LLM process both natively.
Key work: Time-LLM (ICLR 2024, open-source). The backbone LLM is kept entirely frozen. Numerical time series is segmented into patches, then a trainable reprogramming layer maps these patches into text prototype embeddings in the LLM's vocabulary space. Actual textual data (news, sentiment, macro context) is fed alongside as Prompt-as-Prefix. The frozen LLM then jointly reasons over both modalities via its native self-attention — no explicit alignment needed because both modalities live in the same token space.
Strengths. Leverages the LLM's pre-trained sequence reasoning without fine-tuning. Achieves zero-shot and few-shot multimodal forecasting. Can be combined with RL agents (SAC) for end-to-end trading optimization.
Limitations. Computationally expensive at inference. The reprogramming quality depends on whether numerical patches genuinely map to meaningful text prototypes.
11 · Universal Tokenization & Foundation Models
Core idea. Build massive foundation models that treat the entire heterogeneous event stream — trades, quotes, news — as a single structured language with its own grammar.
Key work: TradeFM (J.P. Morgan, 524M parameters). Trained on billions of raw tick-level trade events across 9,000+ equities. Uses scale-invariant feature construction based on Universal Price Formation theory. A Universal Tokenization Scheme maps multi-feature trade tuples (price, volume, direction, inter-arrival time) into a single discrete sequence. Integrating text involves mapping textual events into the same vocabulary. The Transformer's autoregressive self-attention handles alignment implicitly.
Strengths. Eliminates alignment as a design problem entirely — all modalities are tokens in a unified sequence.
Limitations. Requires enormous compute for training. Proprietary; not yet open-source. Integrating full text content (beyond event tags) into the tokenization scheme remains an open challenge.
12 · Multi-Agent Cognitive Debate
Core idea. Discard mathematical fusion entirely. Decompose analysis into specialized, independent LLM personas that resolve text-price discrepancies through structured debate.
Key work: TradingAgents (UCLA/MIT, LangGraph-based). Specialized agents — Sentiment Analyst (text-only), Technical Analyst (numbers-only), Fundamental Analyst — produce independent analyses. A Researcher Team (bullish and bearish personas) debates the conflicting signals across multiple rounds. A Trader Agent formulates execution strategy; a Risk Management Agent evaluates liquidity and volatility constraints; a Portfolio Manager approves final execution.
How it resolves misalignment. When positive text sentiment conflicts with bearish technical indicators, the system doesn't average them to a neutral "hold." Instead, agents reason through the conflict in natural language: "While numerical indicators show overbought conditions, the regulatory approval in the news overrides historical resistance levels." Alignment happens through contextual reasoning rather than vector addition.
Strengths. Preserves nuance of conflicting signals. Interpretable. Handles novel situations gracefully.
Limitations. Latency — multi-round LLM debate is too slow for high-frequency trading. Non-deterministic outputs. Difficult to backtest rigorously.
13 · Contrastive Learning Alignment
Core idea. Use self-supervised contrastive learning to force the latent representations of matched text events and price movements into the same geometric region of embedding space, ensuring deep semantic alignment.
Key works.
- ContraSim — self-supervised contrastive similarity learning maps daily headlines and market movements into a shared latent space via InfoNCE loss. Weighted Self-Supervised Contrastive Learning (WSSCL) generates augmented headlines to ensure negated or sarcastic text is pushed far from baseline positive embeddings.
- SuCroMoCo — supervised contrastive learning with cross-momentum contrast aligns financial text representations with prototypical sentiment categories.
Emergent properties. The contrastive similarity space naturally clusters historical trading days with homogeneous market movement directions. This enables k-nearest-neighbor retrieval in latent space to find historical analogs to the current session, enabling preemptive positioning.
14 · Regime Switching, MoE, and Surprise-Based Frameworks
Core idea. Go beyond alignment to ask: given the current market regime, how much new information does this text actually carry?
Key works.
- StockMem (arXiv 2025) — introduces ΔInfo (incremental information): price movements depend on deviation from market expectations, not absolute sentiment polarity. Uses analogical memory retrieval to match current event sequences to historically similar scenarios. Reframes alignment from "when does news arrive" to "how much surprise does this news contain."
- DAFF-Net (Scientific Reports, 2025) — combines all three base solutions into one architecture: learnable temporal prototypes (event-driven resampling), exponential decay (decay functions), AND an event-aware MoE router (dual-frequency).
- Adaptive Regime-Aware architecture (arXiv 2025) — autoencoder detects anomalous regimes via reconstruction error; specialized Dual Node Transformers handle stable vs. event-driven conditions; a Soft Actor-Critic RL controller adaptively tunes regime detection thresholds.
- LLMoE (arXiv 2025) — pre-trained LLM as the router itself, reading both text and numeric features to decide which expert handles each market condition.
15 · Reinforcement Learning on Stock Prices (RLSP)
Core idea. Use actual market price reactions as the reward signal to align the LLM's text processing with financial outcomes, bypassing the alignment problem by optimizing end-to-end from text to trade profit.
Key work: FinGPT (AI4Finance, ~18,900 GitHub stars). Introduces RLSP as a financial evolution of RLHF. The training environment is the historical market; the reward function is tied to the asset's actual post-news price reaction. This coerces the LLM to align its sentiment extraction with tangible financial outcomes, bridging the semantic gap through the RL objective rather than through explicit temporal alignment.
16 · Temporal Grounding from Text
Core idea. Instead of aligning on publication time alone, extract the actual event time, time range, and temporal relations from the text itself.
LLMs can serve as temporal grounders outputting effective_time, time_range, and confidence — reducing misalignment at its root. Standards like TimeML and ISO-TimeML provide formal specifications for anchoring events to time expressions and representing temporal ordering relations. Recent transformer-based temporal information extraction work shows promising results. This is a genuinely "out-of-box" move: the LLM is not only a sentiment extractor but a temporal structure extractor.
17 · Interleaved Multimodal Tokenization
Core idea. Treat temporal misalignment as a sequence modeling problem with heterogeneous tokens. No alignment is needed because the model learns relationships across an interleaved timeline of text tokens and numerical tokens ordered by timestamp.
Key works.
- MSE-ITT — builds modality-specific MoE layers within Llama3-8B, processing interleaved sequences of text and time-series tokens. Pointwise embedding tokenization accommodates variable temporal gaps without fixed windows.
- Time-IMM benchmark (NeurIPS 2025) — first evaluation framework explicitly designed around cause-driven irregularities in multimodal time series. Categorizes irregularity into trigger-based, constraint-based, and artifact-based types. Achieves up to 38% MSE reduction when text is highly informative.
Summary Table
| # | Method | How It Handles Misalignment | Strengths | Limitations | Key References |
|---|---|---|---|---|---|
| 1 | Event-Driven Resampling | Forces ticks onto the event clock; dynamic windows around text events | Eliminates sparsity and forward-fill bias; high alignment precision | Blind to non-event market dynamics; ambiguous timestamps | López de Prado; DEBS 2022; AEDL |
| 2 | Decay Functions | Converts discrete text events into continuous signals via exponential decay | Simple, cheap, interpretable | Static λ cannot handle heterogeneous event types; no interaction effects | TASEM; volatility-adaptive decay |
| 3 | Dual-Frequency Architecture | Parallel fast-numerical and slow-text branches; text modulates numerical system | Isolates modality interference | May miss fine-grained cross-modal correlations; 2× compute cost | VoT/AFF; TFT; TIC-FusionNet |
| 4 | Cross-Modal Attention | Learned soft attention across modalities without pre-alignment | No explicit alignment needed; discovers non-linear correspondences | Requires large paired datasets; limited theoretical guarantees | MulT; FAST; SALMON/MSE-ITT; MSGCA; MM-iTransformer |
| 5 | Continuous-Time Neural Dynamics | Latent state evolves continuously via ODE/CDE/SDE; updated asynchronously at any observation | Most principled solution; keeps original timestamps | Computationally expensive; unexplored for joint text+price fusion | Neural ODE; Neural CDE; NJ-ODE; mTAN; Neural MJD; MambaStock |
| 6 | Temporal Point Processes | Both modalities as marked events in continuous time with learned excitation kernels | Naturally models arrival dynamics, cross-excitation, and event-specific decay | Struggles with rich text content (mostly reduces to scalar marks) | Neural Hawkes; RMTPP; Transformer Hawkes; TPP-LLM; EasyTPP |
| 7 | Mixed-Frequency Econometrics | MIDAS distributed lags; mixed-frequency VAR; Kalman filter with irregular updates | Theoretically grounded; decades of proven practice; interpretable | Assumes relatively simple parametric relationships | Ghysels et al. MIDAS; MF-VAR; Durbin & Koopman |
| 8 | Information-Theoretic / Causal | Measures directional information flow; constructs counterfactuals; diagnoses endogeneity | Validates whether alignment captures real causality vs. spurious correlation | Diagnostic rather than predictive; requires careful implementation | Transfer entropy; CausalImpact; Granger-Hawkes |
| 9 | Temporal Knowledge Graphs / GNN | Encodes entity relationships + temporal validity; propagates signals across connected assets | Solves cross-sectional misalignment; captures supply-chain / sector contagion | Graph construction quality is critical; computationally heavy | FinDKG; TRACE; THGNN; SCRG; DGRCL |
| 10 | Cross-Modal Reprogramming | Maps numerical patches into LLM vocabulary space; both modalities processed as tokens | Leverages frozen LLM reasoning; zero/few-shot capable | Expensive at inference; reprogramming quality uncertain | Time-LLM (ICLR 2024) |
| 11 | Universal Tokenization | All event types (trades, quotes, news) encoded into a single discrete token vocabulary | Eliminates alignment as a design problem entirely | Massive compute; integrating full text content is open challenge | TradeFM (J.P. Morgan) |
| 12 | Multi-Agent Cognitive Debate | Specialized LLM personas independently analyze then debate conflicting signals | Preserves nuance; interpretable reasoning; handles novel situations | Too slow for HFT; non-deterministic; hard to backtest | TradingAgents; MiroFish |
| 13 | Contrastive Learning Alignment | InfoNCE loss forces matched text-price pairs into shared embedding geometry | Deep semantic alignment; enables historical-analog retrieval | Requires careful negative sampling; batch-size sensitive | ContraSim; SuCroMoCo |
| 14 | Regime / MoE / Surprise Frameworks | Routes processing by market regime; measures information surprise relative to expectations | Adapts to non-stationarity; combines multiple base solutions | Complex to tune; regime detection can lag | StockMem (ΔInfo); DAFF-Net; Adaptive Regime-Aware; LLMoE |
| 15 | RLSP (RL on Stock Prices) | Uses actual price reactions as reward to align LLM text processing end-to-end | Optimizes for trading profit directly, not intermediate accuracy | Reward signal is noisy; sample-inefficient; risk of overfitting to market microstructure | FinGPT |
| 16 | Temporal Grounding from Text | Extracts actual event time, time range, and temporal relations from text itself | Reduces misalignment at the root; leverages LLM as temporal structure extractor | Extraction accuracy is imperfect; not all text has clear temporal anchors | TimeML; ISO-TimeML |
| 17 | Interleaved Multimodal Tokenization | Text and numerical tokens interleaved by timestamp; model learns cross-modal relationships in sequence | No alignment needed; modality-specific processing before fusion | Requires large-scale interleaved training data; early-stage research | MSE-ITT; Time-IMM (NeurIPS 2025) |
To do
For sub-second latency (HFT): Continuous-time neural dynamics (Neural CDE/ODE) with online interpolation, or event-driven resampling with Hawkes-informed windows.
For daily-frequency strategies: Cross-modal attention (MulT/TFT-style) with variable selection offers the best accuracy-to-complexity ratio. MIDAS regression with FinBERT features is a strong, interpretable baseline.
For event-driven strategies: Marked Hawkes processes for quantifying text→price excitation kernels, combined with "Trade the Event" style structured event extraction.
For maximum flexibility: Interleaved multimodal tokenization (MSE-ITT architecture) for medium-frequency applications, or multi-agent debate for strategies where interpretability matters most.
For production infrastructure: NautilusTrader (~21K stars, Rust/Python, nanosecond resolution) supports custom data types for mixed-frequency text+price streams. Microsoft Qlib (~16K stars) provides modular quant pipelines. FinGPT (~18.9K stars) provides LLM-native data curation with RLSP.