INDEX
Explanations
mentions of the word "Games"
references to video games
New Auto-Interp
Negative Logits
politic
-0.85
aye
-0.78
acknow
-0.68
sie
-0.67
iffe
-0.67
wildfire
-0.65
isin
-0.61
tow
-0.61
annex
-0.59
evac
-0.58
POSITIVE LOGITS
manship
1.31
consoles
1.01
erver
1.01
cube
0.97
paces
0.93
wright
0.90
pad
0.88
pace
0.87
Played
0.87
indust
0.85
Activations Density 0.030%