EMNLP 2025 Paper Notes
8 papers reviewed.
EMNLP2025 Eo: Expert Generalization in MoE in IFT
One-Liner cluster the input, activate a seperate expert group for cluster target. Motivation heterogeneity of input instruction tuning data poses difficulty for MoE routing only operates at token level, so can’t deal with sequence level generalization Novelty Architecture to enable hierarchical expert routing. Notable Methods Mixure of Clustered Experts Mixture of Clustered Experts Dual-stage routing mechanism. group the M experts into groups of N expert (i.e. (M = \left(N, \dots, ...
EMNLP2025 Extra Things
EMNLP2025 Yu: Long-Context LM Fail in Basic Retrieval Synthetic dataset finds that needle-in-the-haystack problems fail when needle needs reasoning
EMNLP2025 Friday Afternoon Posters
EMNLP2025 Ghonim: concept-ediq a massive bank of concepts multi model semantically linked EMNLP2025 Bai: understanding and leveraging expert specialization of context faithfulness Two steps: step one is to use router tuning to prioritize experts that rely on context, step two is to especially hit those for fine-tuning for improved Qantas alliance. Big gainz and hot pot and other QA data set just by the router tuning EMNLP2025 Vasu: literature grounded hypothesis generation Use citation links to ...
EMNLP2025 Index
Talks EMNLP2025 Keynote: Heng Ji EMNLP2025 Eo: Expert Generalization in MOE EMNLP2025 Wu: Zero Shot Graph Learning EMNLP2025: MUSE, MCTS Driven Red Teaming Posters EMNLP2025 Wednesday Morning Posters EMNLP2025 Friday Afternoon Posters EMNLP2025 Extra Things Takes although parsing maybe dead for natural language, structure helps parse scientific information (i.e. drugs, molecules, proteins, etc.) two idea: 1\right) how to formalize approach mathematically 2) what can LMs do that humans can’t do? ...
EMNLP2025 Keynote: Heng Ji
Motivation: drug discovery is extremely slow and expensive; mostly modulating previous iterations of work. Principles of Drug Discovery observation: acquire/fuse knowledge from multiple data modalities (sequence, stricture, etc.) think: critically generating actually new hypothesis — allowing iteratively allowing LMs to code-switch between moladities (i.e. fuse different modalities together in the most uniform way) LM as a heuristic helps prune down search space quickly.
EMNLP2025 Wednesday Morning Posters
EMNLP2025 Xu: tree of prompting Evaluate the quote attribution score as a way to prioritize more factual quotes. EMNLP2025 Fan: medium is not the message Unwanted feature such as language a medium who found in embedding, use linear concept of eraser to learn a projection that minimize information on unwanted features EMNLP2025 Hong: variance sensitivity induces attention entropy collapse Softmax is highly sensitive to variance which is why pre-training loss spikes without QK norm EMNLP2025 Vashu...
EMNLP2025 Wu: Zero Shot Graph Learning via Explicit Reasoning
One-Liner Novelty Background How do LLMs do graphs? predict text from graphs (convert graph into text, autoregression) align text with graph (GNN + LLM late fusion) encode text with graph (stick LLM embedding to a GNN as a prompt) Motivation Notable Methods Key Figs New Concepts Notes
EMNLP2025 Zhang: Diffusion vs. Autoregression Language Models
One-Liner Novelty Notable Methods Key Figs New Concepts Notes