INDEX
Explanations
This neuron activates on the verb “work” (including its forms like “works” or “working”).
New Auto-Interp
Negative Logits
recall
-0.08
ícul
-0.07
Intensity
-0.07
Ad
-0.07
onto
-0.06
imating
-0.06
Detect
-0.06
。\
-0.06
ucle
-0.06
-up
-0.06
POSITIVE LOGITS
working
0.14
worked
0.12
Working
0.11
works
0.10
work
0.10
Working
0.09
(work
0.08
collabor
0.08
_working
0.08
phil
0.07
Activations Density 0.036%