INDEX
Explanations
Action figures
The neuron fires on mentions of collectible action‐figure products (e.g. “figure,” “figures,” “action figure,” “vintage collection,” etc.).
New Auto-Interp
Negative Logits
沈
-0.06
hj
-0.06
عاشق
-0.06
/design
-0.06
_relation
-0.06
Das
-0.06
Smoking
-0.06
江
-0.06
Raised
-0.05
dubbed
-0.05
POSITIVE LOGITS
ROUND
0.07
�
0.07
iddet
0.07
ayet
0.07
ζει
0.06
empt
0.06
;");↵
0.06
reachable
0.06
ニニ
0.06
rich
0.06
Activations Density 0.014%