INDEX
Explanations
The neuron fires on mentions of the core rock-paper-scissors game terms, i.e. the move names.
New Auto-Interp
Negative Logits
lives
-0.07
_USED
-0.06
혼
-0.06
구
-0.06
LORD
-0.06
evaluated
-0.06
TED
-0.06
días
-0.06
_price
-0.06
Lord
-0.06
POSITIVE LOGITS
%)↵↵
0.08
この
0.07
="--
0.06
prep
0.06
primir
0.06
Immediately
0.06
[child
0.06
แกรม
0.06
.Abstract
0.06
↵
0.06
Activations Density 0.003%