INDEX
Explanations
The neuron activates only on the word “funny.”
New Auto-Interp
Negative Logits
[test
-0.07
.obs
-0.06
mounted
-0.06
token
-0.06
Ingredients
-0.06
_impl
-0.06
obao
-0.06
def
-0.06
recounted
-0.06
транс
-0.06
POSITIVE LOGITS
ify
0.07
Crimes
0.07
mashed
0.06
~
0.06
ій
0.06
coder
0.06
IZE
0.06
_fold
0.06
Hitler
0.06
_ACTIVITY
0.06
Activations Density 0.010%