Methods for Aligning Text and Numerical Data in Algorithmic Trading

The Core Problem

In modern quantitative trading, two fundamentally different data streams must work together:

Numerical data (tick data, K-lines, order book snapshots) arrives at microsecond-to-minute frequency on a continuous, synchronous clock.
Text data (news, tweets, filings, government documents) arrives irregularly as discrete events on a sparse, asynchronous clock.

Aligning Async Text & Sync Numerical Data

When an LLM is used as a text feature extractor — converting raw text into structured signals like {"sentiment": 0.8, "relevance": 0.9, "urgency": 0.5} — the resulting features still live on a fundamentally different temporal grid than price. Naïvely forcing alignment (via forward-filling, zero-filling, or fixed-window aggregation) introduces biases: staleness from carry-forward, sparsity from zero-fill, and information loss from coarse bucketing. The misalignment is not merely a data-engineering inconvenience — it is a compound problem spanning temporal asynchrony, semantic heterogeneity, and feature-space incompatibility.

This report catalogues the principal methods that have been developed to address this misalignment, organized from foundational heuristics through advanced neural architectures.

1 · Event-Driven Resampling

Core idea. Invert the standard paradigm: instead of forcing text onto the tick clock, force ticks onto the event clock. The unit of analysis becomes a dynamic window around each text event rather than a fixed time bar. The system only evaluates trades when a text signal fires, grabbing the surrounding numerical context (price, volume, spread, order imbalance) from that event horizon.

Strengths. Every row in the training set is guaranteed to contain a rich textual event plus its localized market context. This eliminates matrix sparsity and forward-fill bias. The approach aligns naturally with Information-Driven Bars (López de Prado), where the market is sampled only when a threshold of information has arrived.

Extensions in the literature.

Neural Hawkes Processes model the self-exciting and mutually-exciting clustering of news arrivals and trade events, capturing how a negative earnings report triggers algorithmic sell cascades that trigger further news alerts. The conditional intensity function learns cross-excitation kernels between text-type and price-type events, replacing hand-tuned windows with data-driven interaction structure.
Complex Event Processing (CEP) systems (e.g., solutions to the DEBS 2022 Grand Challenge built on Apache Flink) operationalize event-driven execution at production scale, processing high-volume tick streams with real-time pattern detection.
Adaptive Event-Driven Labeling (AEDL) integrates multi-scale temporal analysis to capture hierarchical causal patterns at different time granularities around events.

Limitations. Markets exhibit massive price volatility in the complete absence of news. An event-only clock is blind to momentum shifts, liquidity cascades, and purely technical microstructure breakdowns. Events with ambiguous timestamps (e.g., filings marked only by date) introduce residual alignment error.

2 · Decay Functions on Text Features

Core idea. Convert each discrete text event into a continuous signal via exponential decay: sentiment(t) = s₀ × e^{-λ(t - t₀)}. The text feature now lives in the same continuous time domain as price. The challenge is choosing λ — it differs for an earnings miss versus a CEO tweet versus a central bank statement.

Strengths. Conceptually simple, computationally cheap, and directly interpretable. Preserves information value over time rather than hard-truncating at a window boundary.

Extensions in the literature.

Volatility-adaptive decay replaces static λ with a dynamic λ(t) = λ₀ + λ₁ · tanh((vol(t) − μ) / σ), accelerating decay in high-volatility regimes and slowing it in calm markets. Empirical tests show this lifts short-term prediction accuracy from ~60% to ~65%.
Asymmetric decay uses different rates for positive vs. negative sentiment, reflecting behavioral finance evidence that bad news persists ~2× longer than good news.
MIDAS regressions (Ghysels et al.) generalize hand-picked decay into a parametric distributed lag polynomial estimated from data, with decades of econometric theory behind weight design. This treats text features as high-frequency regressors and learns optimal weight schedules per event class.

Limitations. A single monotone decay cannot capture multi-peaked influence patterns from sustained events (e.g., multi-round sanctions). The approach also cannot model interaction effects — where the combination of two simultaneous text events produces a non-additive market response.

3 · Dual-Frequency Architecture

Core idea. Don't merge the modalities. Run two parallel systems: a fast numerical system (tick-level) and a slow text system (event-level). The text system adjusts the regime, parameters, or risk limits of the numerical system rather than feeding into the same model. Text sets the macro context; numbers handle micro execution.

Strengths. Isolates modality interference — high-frequency numerical noise cannot drown out low-frequency text signals, and vice versa. Each branch can use an architecture optimized for its data characteristics (e.g., CNN+LSTM for ticks, Transformer for text).

Extensions in the literature.

Adaptive Frequency Fusion (AFF / VoT framework) operates in the frequency domain: both branches' predictions are FFT-decomposed, then learnable frequency-specific weights route text-derived signals to low-frequency bands and numerical signals to high-frequency bands. Inverse FFT reconstructs the unified prediction.
Wavelet-enhanced fusion uses Continuous Wavelet Transforms for superior time-frequency localization on non-stationary financial data, combined with gated mechanisms that let text severity dynamically modulate temporal graph edge weights.
Gated fusion (TFT / TIC-FusionNet) uses Variable Selection Networks or CBAM-based gates that evaluate contextual relevance in real time, dynamically suppressing stale or irrelevant text features when the market is in a purely technical regime, and amplifying text weight during regime shifts.

Limitations. The two branches may fail to learn fine-grained cross-modal correlations (e.g., which specific sentence in a filing maps to which specific order-flow pattern). Computational cost is roughly double that of single-branch models. Interpretability of the fusion weights is often weak.

Core idea. Replace hard temporal alignment with learned soft attention across modalities. A target modality (price) attends to a source modality (text) across different time steps via differentiable attention weights — no pre-alignment required.

Key works.

MulT (Multimodal Transformer) — Tsai et al., ACL 2019, ~2,500 citations. Directional pairwise cross-modal attention reinforcing one modality with features from another regardless of temporal alignment. Open-source at ~700 GitHub stars.
FAST — Sawhney et al., EACL 2021. Time-aware LSTM encoding inverse time intervals between consecutive text releases, handling intra-day temporal irregularity directly.
SALMON / MSE-ITT — arXiv 2025. Modality-specific Mixture-of-Experts on top of Llama3-8B, processing interleaved text and time-series tokens ordered by timestamp. Key finding: cross-modal attention in early transformer layers is detrimental; modality-specific processing should occur first, with fusion only in later layers.
MSGCA — trimodal encoding (price, documents, relational graphs) where the primary modality (price) guides fusion through gated cross-attention that suppresses noisy or temporally misaligned text signals.
MM-iTransformer — inverts the standard paradigm by embedding entire variable histories as single variate tokens rather than embedding individual time steps, enabling cross-attention across variables instead of across time.

Strengths. Eliminates explicit alignment entirely. Scales well and can discover complex, non-linear temporal correspondences.

Limitations. Requires substantial paired training data. Provides limited theoretical guarantees about what temporal relationship the attention learned.

5 · Continuous-Time Neural Dynamics

Core idea. Model the latent market state as a continuous dynamical system, updated asynchronously whenever any observation — tick, news article, filing — arrives. This eliminates discretization entirely.

Key model families.

Model	Core Mechanism	Relevance to Text-Price Fusion
Neural ODE (Chen et al., NeurIPS 2018, ~10K citations)	`dh/dt = f_θ(h(t), t)` with adaptive ODE solvers	Processes data at arbitrary, non-uniform time points
Latent ODE / ODE-RNN (Rubanova et al., NeurIPS 2019, ~2.2K citations)	Continuous dynamics between observations, discrete updates at arrivals	Jointly models when observations occur via Poisson processes
Neural CDE (Kidger et al., NeurIPS 2020 Spotlight, ~800 citations)	`dz = f_θ(z) dX` where X is a continuous control path	Naturally handles channels observed at different rates; universal approximation proven
Online Neural CDE (Morrill & Kidger, 2021)	Rectilinear interpolation replacing non-causal cubic splines	Deployable for live trading without look-ahead bias
Neural Jump ODE (NJ-ODE) (ICLR 2021)	ODE evolves continuously between events, "jumps" at observations	Most natural framework for text+price: ODE drift between ticks, jump at news arrival
Neural SDE (Kidger et al., ICML 2021)	Adds diffusion term for stochastic paths	Better captures financial randomness
Stable Neural SDE (Oh et al., ICLR 2024 Spotlight)	Solves training instability	Removes practical deployment blocker
Neural Merton Jump Diffusion (arXiv 2025)	Itô diffusion + compound Poisson jumps	Continuous price evolution + discrete news shocks in one SDE
mTAN (Shukla & Marlin, ICLR 2021, ~400 citations)	Learned continuous-time embeddings with attention	85× faster than ODE-based methods; extended to cross-modal fusion
GRU-ODE-Bayes (de Brouwer et al., NeurIPS 2019, ~400 citations)	Bayesian updates for sporadically observed data	Handles channels updating at wildly different rates
ContiFormer (NeurIPS 2024)	Continuous-time Transformer with ODE trajectories per observation	Combines Transformer parallelism with Neural ODE dynamics
MambaStock (arXiv 2024, ~300 GitHub stars)	Selective state space mechanism for stock prediction	Outperforms Transformers and LSTMs on stock tasks

Critical gap identified across all surveyed literature: No published paper simultaneously combines rich NLP encoders (FinBERT/LLM), neural temporal point processes, and continuous-time latent dynamics into a single end-to-end architecture for text+price fusion. The tools exist; the combination remains an open field.

Key open-source tooling: torchdiffeq (~5,500 stars), torchcde (~434 stars), torchsde, latent_ode (~1,200 stars).

6 · Temporal Point Processes

Core idea. Model the arrival times and types of events as stochastic processes with self- and cross-excitation. A news article's arrival increases the probability of subsequent price moves, and vice versa. Both modalities become events in a single mathematical framework.

Classical foundations.

Hawkes processes in high-frequency finance (Bacry et al., 2015, ~1,000 citations) established that financial markets operate near the critical branching ratio — past events drive most future activity.
Yang et al. (Quantitative Finance, 2018) modeled a 4-variate Hawkes process over {positive returns, negative returns, positive sentiment, negative sentiment} using intraday S&P 500 data. Key finding: return-event decay is ~2× faster than sentiment-event decay, precisely quantifying the temporal misalignment.

Neural extensions.

Model	Innovation
Neural Hawkes Process (Mei & Eisner, NeurIPS 2017, ~800 citations)	Continuous-time LSTM replaces parametric kernels; allows non-linear, non-additive interaction effects
RMTPP (Du et al., KDD 2016, ~1,000 citations)	RNN + marked temporal point process for encoding full event histories
Transformer Hawkes Process (Zuo et al., ICML 2020)	Self-attention with continuous-time positional encodings for long-range event dependencies
TPP-LLM (Liu & Quan, arXiv 2024 / ICLR 2025 Workshop)	First model processing actual text content within the point process framework via LLM fine-tuning (LoRA), rather than reducing text to scalar sentiment

How it solves misalignment. Both modalities are represented as timestamped marked events in continuous time. The LLM's structured output (sentiment, relevance, urgency, entity) becomes marks that modulate excitation strength or kernel parameters. No resampling or decay tuning is needed — the model learns event-type-specific and context-specific temporal influence patterns directly.

Key open-source tooling: tick library (~500 stars) for classical Hawkes; EasyTPP (Ant Research, ~400 stars) for neural TPP benchmarking.

7 · Mixed-Frequency Econometrics & State-Space Models

Core idea. Treat different sampling rates as a feature, not a bug. Use formal frameworks from econometrics that were explicitly built to handle mixed-frequency observations and irregular arrivals.

MIDAS regressions (Ghysels et al., JFE 2005/JoE 2006, ~1,000+ citations across variants) regress a low-frequency target on high-frequency regressors using parsimonious polynomial weighting schemes. Unlike hand-picked decay, the weight schedule is estimated from data. Reverse-MIDAS variants forecast high-frequency variables from low-frequency ones. Recent intraday MIDAS work predicts 3-minute returns from half-hourly sentiment, achieving 19% MAE reduction. A counterintuitive finding: sentiment during non-trading hours is more informative than during trading hours.

Mixed-frequency VAR models characterize both frequency mismatch and the timing of information releases, with real-time updating as new higher-frequency observations arrive — formalizing event-driven updates in a multivariate dynamic system.

State-space / Kalman filtering maintains a latent market state updated by both frequent market observations and sporadic text events, without ever requiring a shared clock. The Kalman filter provides optimal sequential updating under linear-Gaussian assumptions; nonlinear extensions (EKF, UKF, particle filters) handle more complex dynamics.

Strengths. Theoretically grounded, interpretable, and proven in production macro-finance. Provides estimable weighting structures and explicit information arrival times.

Limitations. Classical MIDAS assumes relatively simple parametric relationships. Struggles with the non-linear, context-dependent effects that deep learning captures.

8 · Information-Theoretic & Causal Approaches

Core idea. Measure and exploit the directional information flow between text and price to build alignment that respects causality.

Key methods.

Transfer entropy (Souza & Aste, 2016/2019) revealed that nonlinear transfer entropy detects an order of magnitude more causality between social media and stock returns than linear Granger causality — the relationship is purely nonlinear and invisible to standard VAR models. Open-source: PyCausality library.
CausalImpact (Brodersen et al., Google, Annals of Applied Statistics, 2015, ~1,500 citations) provides a Bayesian structural time-series framework for constructing counterfactuals around discrete interventions — directly applicable to measuring "what would the price have done absent this news event."
Granger-causality learning for Hawkes processes (Xu et al., ICML 2016) recovers sparse interaction structure via group sparsity, providing tools to test whether "text → price" excitation is genuine versus "price → text commentary" (endogeneity).
Dynamic Transfer Entropy + TFT (Díaz Berenguer et al., 2024) bridges information theory with deep learning by feeding causality features into Temporal Fusion Transformers.

Value for alignment. These tools diagnose whether alignment is capturing real causal structure or spurious correlation, and quantify the actual information content and temporal lag of text signals.

9 · Temporal Knowledge Graphs & Graph Neural Networks

Core idea. Encode not just text content but the relational structure between entities, events, and prices. Propagate text-derived signals across connected assets through learned graph dynamics.

Key works.

FinDKG (Li & Sanna Passino, ACM ICAIF 2024) — fine-tuned 7B LLM extracts event quadruples (subject, relation, object, timestamp) from news; KGTransformer learns over rolling temporal KG snapshots. >10% uplift on link prediction versus SOTA temporal KG baselines.
TRACE (arXiv 2025) — symbolic path mining on temporal KGs with temporal validity constraints ensuring only causally valid (past→present) reasoning chains, validated by LLM-guided graph exploration.
THGNN (CIKM 2022) — learns dynamic inter-stock relations without manual graph construction.
Semantic Company Relationship Graphs (SCRG) — LLM-extracted company co-occurrence and sentiment spillover metrics construct real-time relational graphs; combined with Spatial-Temporal GNNs, text-derived signals propagate across structurally linked equities with learned latency and attenuation.
DGRCL — combines dynamic graph relations with contrastive learning to capture both temporal evolution and relationship constraints from text.

Value for alignment. Solves cross-sectional misalignment: a news event about one company propagates to related companies with different delays, which cannot be captured by single-asset alignment methods.

Core idea. Instead of extracting features from text and feeding them to a numerical model, transform numerical data into the text domain and let the LLM process both natively.

Key work: Time-LLM (ICLR 2024, open-source). The backbone LLM is kept entirely frozen. Numerical time series is segmented into patches, then a trainable reprogramming layer maps these patches into text prototype embeddings in the LLM's vocabulary space. Actual textual data (news, sentiment, macro context) is fed alongside as Prompt-as-Prefix. The frozen LLM then jointly reasons over both modalities via its native self-attention — no explicit alignment needed because both modalities live in the same token space.

Strengths. Leverages the LLM's pre-trained sequence reasoning without fine-tuning. Achieves zero-shot and few-shot multimodal forecasting. Can be combined with RL agents (SAC) for end-to-end trading optimization.

Limitations. Computationally expensive at inference. The reprogramming quality depends on whether numerical patches genuinely map to meaningful text prototypes.

11 · Universal Tokenization & Foundation Models

Core idea. Build massive foundation models that treat the entire heterogeneous event stream — trades, quotes, news — as a single structured language with its own grammar.

Key work: TradeFM (J.P. Morgan, 524M parameters). Trained on billions of raw tick-level trade events across 9,000+ equities. Uses scale-invariant feature construction based on Universal Price Formation theory. A Universal Tokenization Scheme maps multi-feature trade tuples (price, volume, direction, inter-arrival time) into a single discrete sequence. Integrating text involves mapping textual events into the same vocabulary. The Transformer's autoregressive self-attention handles alignment implicitly.

Strengths. Eliminates alignment as a design problem entirely — all modalities are tokens in a unified sequence.

Limitations. Requires enormous compute for training. Proprietary; not yet open-source. Integrating full text content (beyond event tags) into the tokenization scheme remains an open challenge.

12 · Multi-Agent Cognitive Debate

Core idea. Discard mathematical fusion entirely. Decompose analysis into specialized, independent LLM personas that resolve text-price discrepancies through structured debate.

Key work: TradingAgents (UCLA/MIT, LangGraph-based). Specialized agents — Sentiment Analyst (text-only), Technical Analyst (numbers-only), Fundamental Analyst — produce independent analyses. A Researcher Team (bullish and bearish personas) debates the conflicting signals across multiple rounds. A Trader Agent formulates execution strategy; a Risk Management Agent evaluates liquidity and volatility constraints; a Portfolio Manager approves final execution.

How it resolves misalignment. When positive text sentiment conflicts with bearish technical indicators, the system doesn't average them to a neutral "hold." Instead, agents reason through the conflict in natural language: "While numerical indicators show overbought conditions, the regulatory approval in the news overrides historical resistance levels." Alignment happens through contextual reasoning rather than vector addition.

Strengths. Preserves nuance of conflicting signals. Interpretable. Handles novel situations gracefully.

Limitations. Latency — multi-round LLM debate is too slow for high-frequency trading. Non-deterministic outputs. Difficult to backtest rigorously.

13 · Contrastive Learning Alignment

Core idea. Use self-supervised contrastive learning to force the latent representations of matched text events and price movements into the same geometric region of embedding space, ensuring deep semantic alignment.

Key works.

ContraSim — self-supervised contrastive similarity learning maps daily headlines and market movements into a shared latent space via InfoNCE loss. Weighted Self-Supervised Contrastive Learning (WSSCL) generates augmented headlines to ensure negated or sarcastic text is pushed far from baseline positive embeddings.
SuCroMoCo — supervised contrastive learning with cross-momentum contrast aligns financial text representations with prototypical sentiment categories.

Emergent properties. The contrastive similarity space naturally clusters historical trading days with homogeneous market movement directions. This enables k-nearest-neighbor retrieval in latent space to find historical analogs to the current session, enabling preemptive positioning.

14 · Regime Switching, MoE, and Surprise-Based Frameworks

Core idea. Go beyond alignment to ask: given the current market regime, how much new information does this text actually carry?

Key works.

StockMem (arXiv 2025) — introduces ΔInfo (incremental information): price movements depend on deviation from market expectations, not absolute sentiment polarity. Uses analogical memory retrieval to match current event sequences to historically similar scenarios. Reframes alignment from "when does news arrive" to "how much surprise does this news contain."
DAFF-Net (Scientific Reports, 2025) — combines all three base solutions into one architecture: learnable temporal prototypes (event-driven resampling), exponential decay (decay functions), AND an event-aware MoE router (dual-frequency).
Adaptive Regime-Aware architecture (arXiv 2025) — autoencoder detects anomalous regimes via reconstruction error; specialized Dual Node Transformers handle stable vs. event-driven conditions; a Soft Actor-Critic RL controller adaptively tunes regime detection thresholds.
LLMoE (arXiv 2025) — pre-trained LLM as the router itself, reading both text and numeric features to decide which expert handles each market condition.

15 · Reinforcement Learning on Stock Prices (RLSP)

Core idea. Use actual market price reactions as the reward signal to align the LLM's text processing with financial outcomes, bypassing the alignment problem by optimizing end-to-end from text to trade profit.

Key work: FinGPT (AI4Finance, ~18,900 GitHub stars). Introduces RLSP as a financial evolution of RLHF. The training environment is the historical market; the reward function is tied to the asset's actual post-news price reaction. This coerces the LLM to align its sentiment extraction with tangible financial outcomes, bridging the semantic gap through the RL objective rather than through explicit temporal alignment.

16 · Temporal Grounding from Text

Core idea. Instead of aligning on publication time alone, extract the actual event time, time range, and temporal relations from the text itself.

LLMs can serve as temporal grounders outputting effective_time, time_range, and confidence — reducing misalignment at its root. Standards like TimeML and ISO-TimeML provide formal specifications for anchoring events to time expressions and representing temporal ordering relations. Recent transformer-based temporal information extraction work shows promising results. This is a genuinely "out-of-box" move: the LLM is not only a sentiment extractor but a temporal structure extractor.

17 · Interleaved Multimodal Tokenization

Core idea. Treat temporal misalignment as a sequence modeling problem with heterogeneous tokens. No alignment is needed because the model learns relationships across an interleaved timeline of text tokens and numerical tokens ordered by timestamp.

Key works.

MSE-ITT — builds modality-specific MoE layers within Llama3-8B, processing interleaved sequences of text and time-series tokens. Pointwise embedding tokenization accommodates variable temporal gaps without fixed windows.
Time-IMM benchmark (NeurIPS 2025) — first evaluation framework explicitly designed around cause-driven irregularities in multimodal time series. Categorizes irregularity into trigger-based, constraint-based, and artifact-based types. Achieves up to 38% MSE reduction when text is highly informative.

Summary Table

#	Method	How It Handles Misalignment	Strengths	Limitations	Key References
1	Event-Driven Resampling	Forces ticks onto the event clock; dynamic windows around text events	Eliminates sparsity and forward-fill bias; high alignment precision	Blind to non-event market dynamics; ambiguous timestamps	López de Prado; DEBS 2022; AEDL
2	Decay Functions	Converts discrete text events into continuous signals via exponential decay	Simple, cheap, interpretable	Static λ cannot handle heterogeneous event types; no interaction effects	TASEM; volatility-adaptive decay
3	Dual-Frequency Architecture	Parallel fast-numerical and slow-text branches; text modulates numerical system	Isolates modality interference	May miss fine-grained cross-modal correlations; 2× compute cost	VoT/AFF; TFT; TIC-FusionNet
4	Cross-Modal Attention	Learned soft attention across modalities without pre-alignment	No explicit alignment needed; discovers non-linear correspondences	Requires large paired datasets; limited theoretical guarantees	MulT; FAST; SALMON/MSE-ITT; MSGCA; MM-iTransformer
5	Continuous-Time Neural Dynamics	Latent state evolves continuously via ODE/CDE/SDE; updated asynchronously at any observation	Most principled solution; keeps original timestamps	Computationally expensive; unexplored for joint text+price fusion	Neural ODE; Neural CDE; NJ-ODE; mTAN; Neural MJD; MambaStock
6	Temporal Point Processes	Both modalities as marked events in continuous time with learned excitation kernels	Naturally models arrival dynamics, cross-excitation, and event-specific decay	Struggles with rich text content (mostly reduces to scalar marks)	Neural Hawkes; RMTPP; Transformer Hawkes; TPP-LLM; EasyTPP
7	Mixed-Frequency Econometrics	MIDAS distributed lags; mixed-frequency VAR; Kalman filter with irregular updates	Theoretically grounded; decades of proven practice; interpretable	Assumes relatively simple parametric relationships	Ghysels et al. MIDAS; MF-VAR; Durbin & Koopman
8	Information-Theoretic / Causal	Measures directional information flow; constructs counterfactuals; diagnoses endogeneity	Validates whether alignment captures real causality vs. spurious correlation	Diagnostic rather than predictive; requires careful implementation	Transfer entropy; CausalImpact; Granger-Hawkes
9	Temporal Knowledge Graphs / GNN	Encodes entity relationships + temporal validity; propagates signals across connected assets	Solves cross-sectional misalignment; captures supply-chain / sector contagion	Graph construction quality is critical; computationally heavy	FinDKG; TRACE; THGNN; SCRG; DGRCL
10	Cross-Modal Reprogramming	Maps numerical patches into LLM vocabulary space; both modalities processed as tokens	Leverages frozen LLM reasoning; zero/few-shot capable	Expensive at inference; reprogramming quality uncertain	Time-LLM (ICLR 2024)
11	Universal Tokenization	All event types (trades, quotes, news) encoded into a single discrete token vocabulary	Eliminates alignment as a design problem entirely	Massive compute; integrating full text content is open challenge	TradeFM (J.P. Morgan)
12	Multi-Agent Cognitive Debate	Specialized LLM personas independently analyze then debate conflicting signals	Preserves nuance; interpretable reasoning; handles novel situations	Too slow for HFT; non-deterministic; hard to backtest	TradingAgents; MiroFish
13	Contrastive Learning Alignment	InfoNCE loss forces matched text-price pairs into shared embedding geometry	Deep semantic alignment; enables historical-analog retrieval	Requires careful negative sampling; batch-size sensitive	ContraSim; SuCroMoCo
14	Regime / MoE / Surprise Frameworks	Routes processing by market regime; measures information surprise relative to expectations	Adapts to non-stationarity; combines multiple base solutions	Complex to tune; regime detection can lag	StockMem (ΔInfo); DAFF-Net; Adaptive Regime-Aware; LLMoE
15	RLSP (RL on Stock Prices)	Uses actual price reactions as reward to align LLM text processing end-to-end	Optimizes for trading profit directly, not intermediate accuracy	Reward signal is noisy; sample-inefficient; risk of overfitting to market microstructure	FinGPT
16	Temporal Grounding from Text	Extracts actual event time, time range, and temporal relations from text itself	Reduces misalignment at the root; leverages LLM as temporal structure extractor	Extraction accuracy is imperfect; not all text has clear temporal anchors	TimeML; ISO-TimeML
17	Interleaved Multimodal Tokenization	Text and numerical tokens interleaved by timestamp; model learns cross-modal relationships in sequence	No alignment needed; modality-specific processing before fusion	Requires large-scale interleaved training data; early-stage research	MSE-ITT; Time-IMM (NeurIPS 2025)

To do

For sub-second latency (HFT): Continuous-time neural dynamics (Neural CDE/ODE) with online interpolation, or event-driven resampling with Hawkes-informed windows.

For daily-frequency strategies: Cross-modal attention (MulT/TFT-style) with variable selection offers the best accuracy-to-complexity ratio. MIDAS regression with FinBERT features is a strong, interpretable baseline.

For event-driven strategies: Marked Hawkes processes for quantifying text→price excitation kernels, combined with "Trade the Event" style structured event extraction.

For maximum flexibility: Interleaved multimodal tokenization (MSE-ITT architecture) for medium-frequency applications, or multi-agent debate for strategies where interpretability matters most.

For production infrastructure: NautilusTrader (~21K stars, Rust/Python, nanosecond resolution) supports custom data types for mixed-frequency text+price streams. Microsoft Qlib (~16K stars) provides modular quant pipelines. FinGPT (~18.9K stars) provides LLM-native data curation with RLSP.

Methods for Aligning Text and Numerical Data in Algorithmic Trading

The Core Problem​

1 · Event-Driven Resampling​

2 · Decay Functions on Text Features​

3 · Dual-Frequency Architecture​

4 · Cross-Modal Attention​

5 · Continuous-Time Neural Dynamics​

6 · Temporal Point Processes​

7 · Mixed-Frequency Econometrics & State-Space Models​

8 · Information-Theoretic & Causal Approaches​

9 · Temporal Knowledge Graphs & Graph Neural Networks​

10 · Cross-Modal Reprogramming​

11 · Universal Tokenization & Foundation Models​

12 · Multi-Agent Cognitive Debate​

13 · Contrastive Learning Alignment​

14 · Regime Switching, MoE, and Surprise-Based Frameworks​

15 · Reinforcement Learning on Stock Prices (RLSP)​

16 · Temporal Grounding from Text​

17 · Interleaved Multimodal Tokenization​

Summary Table​

To do​

The Core Problem

1 · Event-Driven Resampling

2 · Decay Functions on Text Features

3 · Dual-Frequency Architecture

4 · Cross-Modal Attention

5 · Continuous-Time Neural Dynamics

6 · Temporal Point Processes

7 · Mixed-Frequency Econometrics & State-Space Models

8 · Information-Theoretic & Causal Approaches

9 · Temporal Knowledge Graphs & Graph Neural Networks

10 · Cross-Modal Reprogramming

11 · Universal Tokenization & Foundation Models

12 · Multi-Agent Cognitive Debate

13 · Contrastive Learning Alignment

14 · Regime Switching, MoE, and Surprise-Based Frameworks

15 · Reinforcement Learning on Stock Prices (RLSP)

16 · Temporal Grounding from Text

17 · Interleaved Multimodal Tokenization

Summary Table

To do