INDEX
Explanations
references to strategy and comparison using chess metaphors
New Auto-Interp
Negative Logits
_ALIAS
-0.15
Getter
-0.14
oha
-0.14
egot
-0.14
óż
-0.13
заклад
-0.13
tub
-0.13
abr
-0.13
crew
-0.13
Alias
-0.13
POSITIVE LOGITS
pawn
0.31
bishop
0.30
Pawn
0.30
knight
0.29
pieces
0.28
Pawn
0.28
bishop
0.28
bishops
0.28
Pieces
0.27
chess
0.27
Activations Density 0.006%