INDEX
Explanations
references to video games
repeated mentions of the word "Games"
New Auto-Interp
Negative Logits
politic
-0.81
sie
-0.75
aye
-0.73
ALLY
-0.69
acknow
-0.69
gencies
-0.68
itudinal
-0.63
atever
-0.61
annex
-0.61
tow
-0.60
POSITIVE LOGITS
manship
1.20
erver
1.01
paces
0.97
consoles
0.90
pace
0.89
OTA
0.88
hare
0.85
chool
0.84
pad
0.83
hops
0.80
Activations Density 0.033%