Findings

This is where I'll be saving links to interesting things I stumble across on the internet. You'll find cool research papers on LLMs that have caught my attention, fascinating tech discoveries, and other curiosities worth preserving.

DeepSeek Engram: N-gram embeddings for conditional memory

DeepSeek's Engram introduces conditional memory as a new dimesion of sparsity for LLMs, complementing Mixture-of-Experts by replacing expensive neural computation for static knowledge retrieval with O(1) hash-based lookups.

Language models without Engram waste compute just for retrieval of static multi-token phrases through layers of attention – work that is structurally just a dictionary lookup. Engram addresses this with a modernised N-gram embedding table, gated by context to remain semantically relevant.
DeepSeek discovers U-shaped sparsity allocation law, identifying ~20-25% of sparse capacity allocated to Engram as the optimal memory budget, with ~80-85% allocated to MoE.

Jan 12, 2026

LLM Architecture

arXiv

DeepSeek mHC: Manifold-Constrained Hyper-Connections

DeepSeek’s mHC is an architectural upgrade over the original Hyper-Connections (HC) paper, which replaces single residual paths with multiple parallel streams to make them stable enough for scaling. mHC addresses hyper-connections' sensitivity to signal explosions during training by introducing mathematical guardrails, making it viable for large-scale use.

mHC forces the mixing of parallel streams to follow a "doubly stochastic" pattern. This prevents the signal from exploding or collapsing.
It uses the Sinkhorn-Knopp algorithm and custom GPU kernels to restrict the hyper-connection signals without slowing down the model’s training speed.

Jan 05, 2026

LLM Architecture

arXiv

DFlash: Block Diffusion for Flash Speculative Decoding

DFlash uses a lightweight block diffusion model as a drafter for speculative decoding, conditioned to predict future token blocks based on the K/V cache extracted directly from the target LLM during inference. Reportedly achieves ~6.17x lossless speed-up for Qwen3-8B.

Jan 04, 2026

Inference

Z Lab

The Universal Weight Subspace Hypothesis

This research from JHU presents groundbreaking evidence that deep neural networks trained across vastly different tasks converge to shared, low-dimensional parametric subspaces. Fine-tuning using LoRA updates only a fraction of weights; this paper suggests that even these low-rank updates are redundant across different tasks.

They show that a subspace learned from one set of tasks can be effectively applied to adapt models to completely different distributions with minimal performance loss.
The paper provides mathematical analysis to explain why these subspaces emerge, linking them to the spectral properties of the model's Hessian and the pre-training data distribution.

Dec 06, 2025

Model Analysis

arXiv

Anthropic - Transformer Circuits favicon

Emergent Introspective Awareness in LLMs

Research report from Anthropic exhibiting that language models can demonstrate introspective awareness of their internal states. Under certain conditions, Claude models show the ability to notice and identify injected concept vectors and internal states.

Oct 29, 2025

Interpretability

Anthropic - Transformer Circuits

Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders

This paper releases 256 open-source SAEs trained on every layer and sublayer of Llama-3.1-8B, covering residual streams, attention outputs, MLP outputs, and transcoders at both 32K and 128K feature widths.

Sparse Autoencoders (SAEs) are unsupervised machine learning methods designed to extract interpretable features from neural networks by addressing superposition of features.
The paper employs TopK SAEs, an improved variant that directly selects the K highest-activating features rather than using L1 penalties. This, along with other improvements, result in high reconstruction quality while achieving ~3x better sparsity (L₀ ≈ 50 vs 150) compared to state-of-the-art JumpReLU SAEs.

Oct 27, 2024

Interpretability

arXiv

Attention Is All You Need

The seminal paper introducing the Transformer architecture.

Jun 12, 2017

LLM Architecture

arXiv