INDEX

Explanations

asking for reasons and explanations

This neuron detects assistant/model meta‑commentary and instructional text — phrases like "I will", "Here are", enumerated lists, explanations, and other planning/response scaffolding.

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

étricas

0.47

ਜ

0.44

 kaldığımız

0.42

Following

0.41

)<\

0.41

ಕ

0.39

しているので

0.39

こちらも

0.39

)$.

0.38

 errori

0.38

POSITIVE LOGITS

And

0.68

 explained

0.65

とその

0.64

 explain

0.64

 Explain

0.64

そして

0.60

 explanation

0.57

explain

0.56

 beserta

0.55

 Explained

0.55

Activations Density 0.221%