INDEX
Explanations
This neuron activates primarily on the standalone word “crash” (in any context or casing).
New Auto-Interp
Negative Logits
Intent
-0.07
edu
-0.06
reputed
-0.06
Activ
-0.06
voluntary
-0.06
Policy
-0.06
틴
-0.06
Dol
-0.06
evaluating
-0.06
діл
-0.06
POSITIVE LOGITS
crash
0.14
crashes
0.12
Crash
0.11
crashed
0.10
crashing
0.08
irsch
0.08
clashes
0.08
ช
0.08
CR
0.07
_trampoline
0.07
Activations Density 0.004%