EMNLP 2025 Paper Notes

8 papers reviewed.

EMNLP2025 Eo: Expert Generalization in MoE in IFT

One-Liner cluster the input, activate a seperate expert group for cluster target. Motivation heterogeneity of input instruction tuning data poses difficulty for MoE routing only operates at token level, so can’t deal with sequence level generalization Novelty Architecture to enable hierarchical expert routing. Notable Methods Mixure of Clustered Experts Mixture of Clustered Experts Dual-stage routing mechanism. group the M experts into groups of N expert (i.e. (M = \left(N, \dots, ...

Full note

EMNLP2025 Extra Things

EMNLP2025 Yu: Long-Context LM Fail in Basic Retrieval Synthetic dataset finds that needle-in-the-haystack problems fail when needle needs reasoning

Full note

EMNLP2025 Friday Afternoon Posters

EMNLP2025 Ghonim: concept-ediq a massive bank of concepts multi model semantically linked EMNLP2025 Bai: understanding and leveraging expert specialization of context faithfulness Two steps: step one is to use router tuning to prioritize experts that rely on context, step two is to especially hit those for fine-tuning for improved Qantas alliance. Big gainz and hot pot and other QA data set just by the router tuning EMNLP2025 Vasu: literature grounded hypothesis generation Use citation links to ...

Full note

EMNLP2025 Index

Talks EMNLP2025 Keynote: Heng Ji EMNLP2025 Eo: Expert Generalization in MOE EMNLP2025 Wu: Zero Shot Graph Learning EMNLP2025: MUSE, MCTS Driven Red Teaming Posters EMNLP2025 Wednesday Morning Posters EMNLP2025 Friday Afternoon Posters EMNLP2025 Extra Things Takes although parsing maybe dead for natural language, structure helps parse scientific information (i.e. drugs, molecules, proteins, etc.) two idea: 1\right) how to formalize approach mathematically 2) what can LMs do that humans can’t do? ...

Full note

EMNLP2025 Keynote: Heng Ji

Motivation: drug discovery is extremely slow and expensive; mostly modulating previous iterations of work. Principles of Drug Discovery observation: acquire/fuse knowledge from multiple data modalities (sequence, stricture, etc.) think: critically generating actually new hypothesis — allowing iteratively allowing LMs to code-switch between moladities (i.e. fuse different modalities together in the most uniform way) LM as a heuristic helps prune down search space quickly.

Full note

EMNLP2025 Wednesday Morning Posters

EMNLP2025 Xu: tree of prompting Evaluate the quote attribution score as a way to prioritize more factual quotes. EMNLP2025 Fan: medium is not the message Unwanted feature such as language a medium who found in embedding, use linear concept of eraser to learn a projection that minimize information on unwanted features EMNLP2025 Hong: variance sensitivity induces attention entropy collapse Softmax is highly sensitive to variance which is why pre-training loss spikes without QK norm EMNLP2025 Vashu...

Full note

EMNLP2025 Wu: Zero Shot Graph Learning via Explicit Reasoning

One-Liner Novelty Background How do LLMs do graphs? predict text from graphs (convert graph into text, autoregression) align text with graph (GNN + LLM late fusion) encode text with graph (stick LLM embedding to a GNN as a prompt) Motivation Notable Methods Key Figs New Concepts Notes

Full note

EMNLP2025 Zhang: Diffusion vs. Autoregression Language Models

One-Liner Novelty Notable Methods Key Figs New Concepts Notes

Full note