Skip to main content

Methods for Aligning Text and Numerical Data in Algorithmic Trading

The Core Problem

In modern quantitative trading, two fundamentally different data streams must work together:

  • Numerical data (tick data, K-lines, order book snapshots) arrives at microsecond-to-minute frequency on a continuous, synchronous clock.
  • Text data (news, tweets, filings, government documents) arrives irregularly as discrete events on a sparse, asynchronous clock.

When an LLM is used as a text feature extractor — converting raw text into structured signals like {"sentiment": 0.8, "relevance": 0.9, "urgency": 0.5} — the resulting features still live on a fundamentally different temporal grid than price. Naïvely forcing alignment (via forward-filling, zero-filling, or fixed-window aggregation) introduces biases: staleness from carry-forward, sparsity from zero-fill, and information loss from coarse bucketing. The misalignment is not merely a data-engineering inconvenience — it is a compound problem spanning temporal asynchrony, semantic heterogeneity, and feature-space incompatibility.

This report catalogues the principal methods that have been developed to address this misalignment, organized from foundational heuristics through advanced neural architectures.


1 · Event-Driven Resampling

Core idea. Invert the standard paradigm: instead of forcing text onto the tick clock, force ticks onto the event clock. The unit of analysis becomes a dynamic window around each text event rather than a fixed time bar. The system only evaluates trades when a text signal fires, grabbing the surrounding numerical context (price, volume, spread, order imbalance) from that event horizon.

Strengths. Every row in the training set is guaranteed to contain a rich textual event plus its localized market context. This eliminates matrix sparsity and forward-fill bias. The approach aligns naturally with Information-Driven Bars (López de Prado), where the market is sampled only when a threshold of information has arrived.

Extensions in the literature.

  • Neural Hawkes Processes model the self-exciting and mutually-exciting clustering of news arrivals and trade events, capturing how a negative earnings report triggers algorithmic sell cascades that trigger further news alerts. The conditional intensity function learns cross-excitation kernels between text-type and price-type events, replacing hand-tuned windows with data-driven interaction structure.
  • Complex Event Processing (CEP) systems (e.g., solutions to the DEBS 2022 Grand Challenge built on Apache Flink) operationalize event-driven execution at production scale, processing high-volume tick streams with real-time pattern detection.
  • Adaptive Event-Driven Labeling (AEDL) integrates multi-scale temporal analysis to capture hierarchical causal patterns at different time granularities around events.

Limitations. Markets exhibit massive price volatility in the complete absence of news. An event-only clock is blind to momentum shifts, liquidity cascades, and purely technical microstructure breakdowns. Events with ambiguous timestamps (e.g., filings marked only by date) introduce residual alignment error.


2 · Decay Functions on Text Features

Core idea. Convert each discrete text event into a continuous signal via exponential decay: sentiment(t) = s₀ × e^{-λ(t - t₀)}. The text feature now lives in the same continuous time domain as price. The challenge is choosing λ — it differs for an earnings miss versus a CEO tweet versus a central bank statement.

Strengths. Conceptually simple, computationally cheap, and directly interpretable. Preserves information value over time rather than hard-truncating at a window boundary.

Extensions in the literature.

  • Volatility-adaptive decay replaces static λ with a dynamic λ(t) = λ₀ + λ₁ · tanh((vol(t) − μ) / σ), accelerating decay in high-volatility regimes and slowing it in calm markets. Empirical tests show this lifts short-term prediction accuracy from ~60% to ~65%.
  • Asymmetric decay uses different rates for positive vs. negative sentiment, reflecting behavioral finance evidence that bad news persists ~2× longer than good news.
  • MIDAS regressions (Ghysels et al.) generalize hand-picked decay into a parametric distributed lag polynomial estimated from data, with decades of econometric theory behind weight design. This treats text features as high-frequency regressors and learns optimal weight schedules per event class.

Limitations. A single monotone decay cannot capture multi-peaked influence patterns from sustained events (e.g., multi-round sanctions). The approach also cannot model interaction effects — where the combination of two simultaneous text events produces a non-additive market response.


3 · Dual-Frequency Architecture

Core idea. Don't merge the modalities. Run two parallel systems: a fast numerical system (tick-level) and a slow text system (event-level). The text system adjusts the regime, parameters, or risk limits of the numerical system rather than feeding into the same model. Text sets the macro context; numbers handle micro execution.

Strengths. Isolates modality interference — high-frequency numerical noise cannot drown out low-frequency text signals, and vice versa. Each branch can use an architecture optimized for its data characteristics (e.g., CNN+LSTM for ticks, Transformer for text).

Extensions in the literature.

  • Adaptive Frequency Fusion (AFF / VoT framework) operates in the frequency domain: both branches' predictions are FFT-decomposed, then learnable frequency-specific weights route text-derived signals to low-frequency bands and numerical signals to high-frequency bands. Inverse FFT reconstructs the unified prediction.
  • Wavelet-enhanced fusion uses Continuous Wavelet Transforms for superior time-frequency localization on non-stationary financial data, combined with gated mechanisms that let text severity dynamically modulate temporal graph edge weights.
  • Gated fusion (TFT / TIC-FusionNet) uses Variable Selection Networks or CBAM-based gates that evaluate contextual relevance in real time, dynamically suppressing stale or irrelevant text features when the market is in a purely technical regime, and amplifying text weight during regime shifts.

Limitations. The two branches may fail to learn fine-grained cross-modal correlations (e.g., which specific sentence in a filing maps to which specific order-flow pattern). Computational cost is roughly double that of single-branch models. Interpretability of the fusion weights is often weak.


4 · Cross-Modal Attention

Core idea. Replace hard temporal alignment with learned soft attention across modalities. A target modality (price) attends to a source modality (text) across different time steps via differentiable attention weights — no pre-alignment required.

Key works.

  • MulT (Multimodal Transformer) — Tsai et al., ACL 2019, ~2,500 citations. Directional pairwise cross-modal attention reinforcing one modality with features from another regardless of temporal alignment. Open-source at ~700 GitHub stars.
  • FAST — Sawhney et al., EACL 2021. Time-aware LSTM encoding inverse time intervals between consecutive text releases, handling intra-day temporal irregularity directly.
  • SALMON / MSE-ITT — arXiv 2025. Modality-specific Mixture-of-Experts on top of Llama3-8B, processing interleaved text and time-series tokens ordered by timestamp. Key finding: cross-modal attention in early transformer layers is detrimental; modality-specific processing should occur first, with fusion only in later layers.
  • MSGCA — trimodal encoding (price, documents, relational graphs) where the primary modality (price) guides fusion through gated cross-attention that suppresses noisy or temporally misaligned text signals.
  • MM-iTransformer — inverts the standard paradigm by embedding entire variable histories as single variate tokens rather than embedding individual time steps, enabling cross-attention across variables instead of across time.

Strengths. Eliminates explicit alignment entirely. Scales well and can discover complex, non-linear temporal correspondences.

Limitations. Requires substantial paired training data. Provides limited theoretical guarantees about what temporal relationship the attention learned.


5 · Continuous-Time Neural Dynamics

Core idea. Model the latent market state as a continuous dynamical system, updated asynchronously whenever any observation — tick, news article, filing — arrives. This eliminates discretization entirely.

Key model families.

ModelCore MechanismRelevance to Text-Price Fusion
Neural ODE (Chen et al., NeurIPS 2018, ~10K citations)dh/dt = f_θ(h(t), t) with adaptive ODE solversProcesses data at arbitrary, non-uniform time points
Latent ODE / ODE-RNN (Rubanova et al., NeurIPS 2019, ~2.2K citations)Continuous dynamics between observations, discrete updates at arrivalsJointly models when observations occur via Poisson processes
Neural CDE (Kidger et al., NeurIPS 2020 Spotlight, ~800 citations)dz = f_θ(z) dX where X is a continuous control pathNaturally handles channels observed at different rates; universal approximation proven
Online Neural CDE (Morrill & Kidger, 2021)Rectilinear interpolation replacing non-causal cubic splinesDeployable for live trading without look-ahead bias
Neural Jump ODE (NJ-ODE) (ICLR 2021)ODE evolves continuously between events, "jumps" at observationsMost natural framework for text+price: ODE drift between ticks, jump at news arrival
Neural SDE (Kidger et al., ICML 2021)Adds diffusion term for stochastic pathsBetter captures financial randomness
Stable Neural SDE (Oh et al., ICLR 2024 Spotlight)Solves training instabilityRemoves practical deployment blocker
Neural Merton Jump Diffusion (arXiv 2025)Itô diffusion + compound Poisson jumpsContinuous price evolution + discrete news shocks in one SDE
mTAN (Shukla & Marlin, ICLR 2021, ~400 citations)Learned continuous-time embeddings with attention85× faster than ODE-based methods; extended to cross-modal fusion
GRU-ODE-Bayes (de Brouwer et al., NeurIPS 2019, ~400 citations)Bayesian updates for sporadically observed dataHandles channels updating at wildly different rates
ContiFormer (NeurIPS 2024)Continuous-time Transformer with ODE trajectories per observationCombines Transformer parallelism with Neural ODE dynamics
MambaStock (arXiv 2024, ~300 GitHub stars)Selective state space mechanism for stock predictionOutperforms Transformers and LSTMs on stock tasks

Critical gap identified across all surveyed literature: No published paper simultaneously combines rich NLP encoders (FinBERT/LLM), neural temporal point processes, and continuous-time latent dynamics into a single end-to-end architecture for text+price fusion. The tools exist; the combination remains an open field.

Key open-source tooling: torchdiffeq (~5,500 stars), torchcde (~434 stars), torchsde, latent_ode (~1,200 stars).


6 · Temporal Point Processes

Core idea. Model the arrival times and types of events as stochastic processes with self- and cross-excitation. A news article's arrival increases the probability of subsequent price moves, and vice versa. Both modalities become events in a single mathematical framework.

Classical foundations.

  • Hawkes processes in high-frequency finance (Bacry et al., 2015, ~1,000 citations) established that financial markets operate near the critical branching ratio — past events drive most future activity.
  • Yang et al. (Quantitative Finance, 2018) modeled a 4-variate Hawkes process over {positive returns, negative returns, positive sentiment, negative sentiment} using intraday S&P 500 data. Key finding: return-event decay is ~2× faster than sentiment-event decay, precisely quantifying the temporal misalignment.

Neural extensions.

ModelInnovation
Neural Hawkes Process (Mei & Eisner, NeurIPS 2017, ~800 citations)Continuous-time LSTM replaces parametric kernels; allows non-linear, non-additive interaction effects
RMTPP (Du et al., KDD 2016, ~1,000 citations)RNN + marked temporal point process for encoding full event histories
Transformer Hawkes Process (Zuo et al., ICML 2020)Self-attention with continuous-time positional encodings for long-range event dependencies
TPP-LLM (Liu & Quan, arXiv 2024 / ICLR 2025 Workshop)First model processing actual text content within the point process framework via LLM fine-tuning (LoRA), rather than reducing text to scalar sentiment

How it solves misalignment. Both modalities are represented as timestamped marked events in continuous time. The LLM's structured output (sentiment, relevance, urgency, entity) becomes marks that modulate excitation strength or kernel parameters. No resampling or decay tuning is needed — the model learns event-type-specific and context-specific temporal influence patterns directly.

Key open-source tooling: tick library (~500 stars) for classical Hawkes; EasyTPP (Ant Research, ~400 stars) for neural TPP benchmarking.


7 · Mixed-Frequency Econometrics & State-Space Models

Core idea. Treat different sampling rates as a feature, not a bug. Use formal frameworks from econometrics that were explicitly built to handle mixed-frequency observations and irregular arrivals.

MIDAS regressions (Ghysels et al., JFE 2005/JoE 2006, ~1,000+ citations across variants) regress a low-frequency target on high-frequency regressors using parsimonious polynomial weighting schemes. Unlike hand-picked decay, the weight schedule is estimated from data. Reverse-MIDAS variants forecast high-frequency variables from low-frequency ones. Recent intraday MIDAS work predicts 3-minute returns from half-hourly sentiment, achieving 19% MAE reduction. A counterintuitive finding: sentiment during non-trading hours is more informative than during trading hours.

Mixed-frequency VAR models characterize both frequency mismatch and the timing of information releases, with real-time updating as new higher-frequency observations arrive — formalizing event-driven updates in a multivariate dynamic system.

State-space / Kalman filtering maintains a latent market state updated by both frequent market observations and sporadic text events, without ever requiring a shared clock. The Kalman filter provides optimal sequential updating under linear-Gaussian assumptions; nonlinear extensions (EKF, UKF, particle filters) handle more complex dynamics.

Strengths. Theoretically grounded, interpretable, and proven in production macro-finance. Provides estimable weighting structures and explicit information arrival times.

Limitations. Classical MIDAS assumes relatively simple parametric relationships. Struggles with the non-linear, context-dependent effects that deep learning captures.


8 · Information-Theoretic & Causal Approaches

Core idea. Measure and exploit the directional information flow between text and price to build alignment that respects causality.

Key methods.

  • Transfer entropy (Souza & Aste, 2016/2019) revealed that nonlinear transfer entropy detects an order of magnitude more causality between social media and stock returns than linear Granger causality — the relationship is purely nonlinear and invisible to standard VAR models. Open-source: PyCausality library.
  • CausalImpact (Brodersen et al., Google, Annals of Applied Statistics, 2015, ~1,500 citations) provides a Bayesian structural time-series framework for constructing counterfactuals around discrete interventions — directly applicable to measuring "what would the price have done absent this news event."
  • Granger-causality learning for Hawkes processes (Xu et al., ICML 2016) recovers sparse interaction structure via group sparsity, providing tools to test whether "text → price" excitation is genuine versus "price → text commentary" (endogeneity).
  • Dynamic Transfer Entropy + TFT (Díaz Berenguer et al., 2024) bridges information theory with deep learning by feeding causality features into Temporal Fusion Transformers.

Value for alignment. These tools diagnose whether alignment is capturing real causal structure or spurious correlation, and quantify the actual information content and temporal lag of text signals.


9 · Temporal Knowledge Graphs & Graph Neural Networks

Core idea. Encode not just text content but the relational structure between entities, events, and prices. Propagate text-derived signals across connected assets through learned graph dynamics.

Key works.

  • FinDKG (Li & Sanna Passino, ACM ICAIF 2024) — fine-tuned 7B LLM extracts event quadruples (subject, relation, object, timestamp) from news; KGTransformer learns over rolling temporal KG snapshots. >10% uplift on link prediction versus SOTA temporal KG baselines.
  • TRACE (arXiv 2025) — symbolic path mining on temporal KGs with temporal validity constraints ensuring only causally valid (past→present) reasoning chains, validated by LLM-guided graph exploration.
  • THGNN (CIKM 2022) — learns dynamic inter-stock relations without manual graph construction.
  • Semantic Company Relationship Graphs (SCRG) — LLM-extracted company co-occurrence and sentiment spillover metrics construct real-time relational graphs; combined with Spatial-Temporal GNNs, text-derived signals propagate across structurally linked equities with learned latency and attenuation.
  • DGRCL — combines dynamic graph relations with contrastive learning to capture both temporal evolution and relationship constraints from text.

Value for alignment. Solves cross-sectional misalignment: a news event about one company propagates to related companies with different delays, which cannot be captured by single-asset alignment methods.


10 · Cross-Modal Reprogramming

Core idea. Instead of extracting features from text and feeding them to a numerical model, transform numerical data into the text domain and let the LLM process both natively.

Key work: Time-LLM (ICLR 2024, open-source). The backbone LLM is kept entirely frozen. Numerical time series is segmented into patches, then a trainable reprogramming layer maps these patches into text prototype embeddings in the LLM's vocabulary space. Actual textual data (news, sentiment, macro context) is fed alongside as Prompt-as-Prefix. The frozen LLM then jointly reasons over both modalities via its native self-attention — no explicit alignment needed because both modalities live in the same token space.

Strengths. Leverages the LLM's pre-trained sequence reasoning without fine-tuning. Achieves zero-shot and few-shot multimodal forecasting. Can be combined with RL agents (SAC) for end-to-end trading optimization.

Limitations. Computationally expensive at inference. The reprogramming quality depends on whether numerical patches genuinely map to meaningful text prototypes.


11 · Universal Tokenization & Foundation Models

Core idea. Build massive foundation models that treat the entire heterogeneous event stream — trades, quotes, news — as a single structured language with its own grammar.

Key work: TradeFM (J.P. Morgan, 524M parameters). Trained on billions of raw tick-level trade events across 9,000+ equities. Uses scale-invariant feature construction based on Universal Price Formation theory. A Universal Tokenization Scheme maps multi-feature trade tuples (price, volume, direction, inter-arrival time) into a single discrete sequence. Integrating text involves mapping textual events into the same vocabulary. The Transformer's autoregressive self-attention handles alignment implicitly.

Strengths. Eliminates alignment as a design problem entirely — all modalities are tokens in a unified sequence.

Limitations. Requires enormous compute for training. Proprietary; not yet open-source. Integrating full text content (beyond event tags) into the tokenization scheme remains an open challenge.


12 · Multi-Agent Cognitive Debate

Core idea. Discard mathematical fusion entirely. Decompose analysis into specialized, independent LLM personas that resolve text-price discrepancies through structured debate.

Key work: TradingAgents (UCLA/MIT, LangGraph-based). Specialized agents — Sentiment Analyst (text-only), Technical Analyst (numbers-only), Fundamental Analyst — produce independent analyses. A Researcher Team (bullish and bearish personas) debates the conflicting signals across multiple rounds. A Trader Agent formulates execution strategy; a Risk Management Agent evaluates liquidity and volatility constraints; a Portfolio Manager approves final execution.

How it resolves misalignment. When positive text sentiment conflicts with bearish technical indicators, the system doesn't average them to a neutral "hold." Instead, agents reason through the conflict in natural language: "While numerical indicators show overbought conditions, the regulatory approval in the news overrides historical resistance levels." Alignment happens through contextual reasoning rather than vector addition.

Strengths. Preserves nuance of conflicting signals. Interpretable. Handles novel situations gracefully.

Limitations. Latency — multi-round LLM debate is too slow for high-frequency trading. Non-deterministic outputs. Difficult to backtest rigorously.


13 · Contrastive Learning Alignment

Core idea. Use self-supervised contrastive learning to force the latent representations of matched text events and price movements into the same geometric region of embedding space, ensuring deep semantic alignment.

Key works.

  • ContraSim — self-supervised contrastive similarity learning maps daily headlines and market movements into a shared latent space via InfoNCE loss. Weighted Self-Supervised Contrastive Learning (WSSCL) generates augmented headlines to ensure negated or sarcastic text is pushed far from baseline positive embeddings.
  • SuCroMoCo — supervised contrastive learning with cross-momentum contrast aligns financial text representations with prototypical sentiment categories.

Emergent properties. The contrastive similarity space naturally clusters historical trading days with homogeneous market movement directions. This enables k-nearest-neighbor retrieval in latent space to find historical analogs to the current session, enabling preemptive positioning.


14 · Regime Switching, MoE, and Surprise-Based Frameworks

Core idea. Go beyond alignment to ask: given the current market regime, how much new information does this text actually carry?

Key works.

  • StockMem (arXiv 2025) — introduces ΔInfo (incremental information): price movements depend on deviation from market expectations, not absolute sentiment polarity. Uses analogical memory retrieval to match current event sequences to historically similar scenarios. Reframes alignment from "when does news arrive" to "how much surprise does this news contain."
  • DAFF-Net (Scientific Reports, 2025) — combines all three base solutions into one architecture: learnable temporal prototypes (event-driven resampling), exponential decay (decay functions), AND an event-aware MoE router (dual-frequency).
  • Adaptive Regime-Aware architecture (arXiv 2025) — autoencoder detects anomalous regimes via reconstruction error; specialized Dual Node Transformers handle stable vs. event-driven conditions; a Soft Actor-Critic RL controller adaptively tunes regime detection thresholds.
  • LLMoE (arXiv 2025) — pre-trained LLM as the router itself, reading both text and numeric features to decide which expert handles each market condition.

15 · Reinforcement Learning on Stock Prices (RLSP)

Core idea. Use actual market price reactions as the reward signal to align the LLM's text processing with financial outcomes, bypassing the alignment problem by optimizing end-to-end from text to trade profit.

Key work: FinGPT (AI4Finance, ~18,900 GitHub stars). Introduces RLSP as a financial evolution of RLHF. The training environment is the historical market; the reward function is tied to the asset's actual post-news price reaction. This coerces the LLM to align its sentiment extraction with tangible financial outcomes, bridging the semantic gap through the RL objective rather than through explicit temporal alignment.


16 · Temporal Grounding from Text

Core idea. Instead of aligning on publication time alone, extract the actual event time, time range, and temporal relations from the text itself.

LLMs can serve as temporal grounders outputting effective_time, time_range, and confidence — reducing misalignment at its root. Standards like TimeML and ISO-TimeML provide formal specifications for anchoring events to time expressions and representing temporal ordering relations. Recent transformer-based temporal information extraction work shows promising results. This is a genuinely "out-of-box" move: the LLM is not only a sentiment extractor but a temporal structure extractor.


17 · Interleaved Multimodal Tokenization

Core idea. Treat temporal misalignment as a sequence modeling problem with heterogeneous tokens. No alignment is needed because the model learns relationships across an interleaved timeline of text tokens and numerical tokens ordered by timestamp.

Key works.

  • MSE-ITT — builds modality-specific MoE layers within Llama3-8B, processing interleaved sequences of text and time-series tokens. Pointwise embedding tokenization accommodates variable temporal gaps without fixed windows.
  • Time-IMM benchmark (NeurIPS 2025) — first evaluation framework explicitly designed around cause-driven irregularities in multimodal time series. Categorizes irregularity into trigger-based, constraint-based, and artifact-based types. Achieves up to 38% MSE reduction when text is highly informative.

Summary Table

#MethodHow It Handles MisalignmentStrengthsLimitationsKey References
1Event-Driven ResamplingForces ticks onto the event clock; dynamic windows around text eventsEliminates sparsity and forward-fill bias; high alignment precisionBlind to non-event market dynamics; ambiguous timestampsLópez de Prado; DEBS 2022; AEDL
2Decay FunctionsConverts discrete text events into continuous signals via exponential decaySimple, cheap, interpretableStatic λ cannot handle heterogeneous event types; no interaction effectsTASEM; volatility-adaptive decay
3Dual-Frequency ArchitectureParallel fast-numerical and slow-text branches; text modulates numerical systemIsolates modality interferenceMay miss fine-grained cross-modal correlations; 2× compute costVoT/AFF; TFT; TIC-FusionNet
4Cross-Modal AttentionLearned soft attention across modalities without pre-alignmentNo explicit alignment needed; discovers non-linear correspondencesRequires large paired datasets; limited theoretical guaranteesMulT; FAST; SALMON/MSE-ITT; MSGCA; MM-iTransformer
5Continuous-Time Neural DynamicsLatent state evolves continuously via ODE/CDE/SDE; updated asynchronously at any observationMost principled solution; keeps original timestampsComputationally expensive; unexplored for joint text+price fusionNeural ODE; Neural CDE; NJ-ODE; mTAN; Neural MJD; MambaStock
6Temporal Point ProcessesBoth modalities as marked events in continuous time with learned excitation kernelsNaturally models arrival dynamics, cross-excitation, and event-specific decayStruggles with rich text content (mostly reduces to scalar marks)Neural Hawkes; RMTPP; Transformer Hawkes; TPP-LLM; EasyTPP
7Mixed-Frequency EconometricsMIDAS distributed lags; mixed-frequency VAR; Kalman filter with irregular updatesTheoretically grounded; decades of proven practice; interpretableAssumes relatively simple parametric relationshipsGhysels et al. MIDAS; MF-VAR; Durbin & Koopman
8Information-Theoretic / CausalMeasures directional information flow; constructs counterfactuals; diagnoses endogeneityValidates whether alignment captures real causality vs. spurious correlationDiagnostic rather than predictive; requires careful implementationTransfer entropy; CausalImpact; Granger-Hawkes
9Temporal Knowledge Graphs / GNNEncodes entity relationships + temporal validity; propagates signals across connected assetsSolves cross-sectional misalignment; captures supply-chain / sector contagionGraph construction quality is critical; computationally heavyFinDKG; TRACE; THGNN; SCRG; DGRCL
10Cross-Modal ReprogrammingMaps numerical patches into LLM vocabulary space; both modalities processed as tokensLeverages frozen LLM reasoning; zero/few-shot capableExpensive at inference; reprogramming quality uncertainTime-LLM (ICLR 2024)
11Universal TokenizationAll event types (trades, quotes, news) encoded into a single discrete token vocabularyEliminates alignment as a design problem entirelyMassive compute; integrating full text content is open challengeTradeFM (J.P. Morgan)
12Multi-Agent Cognitive DebateSpecialized LLM personas independently analyze then debate conflicting signalsPreserves nuance; interpretable reasoning; handles novel situationsToo slow for HFT; non-deterministic; hard to backtestTradingAgents; MiroFish
13Contrastive Learning AlignmentInfoNCE loss forces matched text-price pairs into shared embedding geometryDeep semantic alignment; enables historical-analog retrievalRequires careful negative sampling; batch-size sensitiveContraSim; SuCroMoCo
14Regime / MoE / Surprise FrameworksRoutes processing by market regime; measures information surprise relative to expectationsAdapts to non-stationarity; combines multiple base solutionsComplex to tune; regime detection can lagStockMem (ΔInfo); DAFF-Net; Adaptive Regime-Aware; LLMoE
15RLSP (RL on Stock Prices)Uses actual price reactions as reward to align LLM text processing end-to-endOptimizes for trading profit directly, not intermediate accuracyReward signal is noisy; sample-inefficient; risk of overfitting to market microstructureFinGPT
16Temporal Grounding from TextExtracts actual event time, time range, and temporal relations from text itselfReduces misalignment at the root; leverages LLM as temporal structure extractorExtraction accuracy is imperfect; not all text has clear temporal anchorsTimeML; ISO-TimeML
17Interleaved Multimodal TokenizationText and numerical tokens interleaved by timestamp; model learns cross-modal relationships in sequenceNo alignment needed; modality-specific processing before fusionRequires large-scale interleaved training data; early-stage researchMSE-ITT; Time-IMM (NeurIPS 2025)

To do

For sub-second latency (HFT): Continuous-time neural dynamics (Neural CDE/ODE) with online interpolation, or event-driven resampling with Hawkes-informed windows.

For daily-frequency strategies: Cross-modal attention (MulT/TFT-style) with variable selection offers the best accuracy-to-complexity ratio. MIDAS regression with FinBERT features is a strong, interpretable baseline.

For event-driven strategies: Marked Hawkes processes for quantifying text→price excitation kernels, combined with "Trade the Event" style structured event extraction.

For maximum flexibility: Interleaved multimodal tokenization (MSE-ITT architecture) for medium-frequency applications, or multi-agent debate for strategies where interpretability matters most.

For production infrastructure: NautilusTrader (~21K stars, Rust/Python, nanosecond resolution) supports custom data types for mixed-frequency text+price streams. Microsoft Qlib (~16K stars) provides modular quant pipelines. FinGPT (~18.9K stars) provides LLM-native data curation with RLSP.