INDEX
Explanations
references to the TV show "Game of Thrones"
mentions of the show "Game of Thrones."
New Auto-Interp
Negative Logits
orate
-0.74
etheless
-0.74
iffe
-0.73
ancies
-0.70
ought
-0.68
terday
-0.65
hips
-0.64
irst
-0.63
ORN
-0.61
taining
-0.60
POSITIVE LOGITS
cube
1.10
Cube
1.09
FAQ
1.07
play
1.05
cock
1.04
Stop
1.01
zeb
0.96
boy
0.92
Spot
0.91
Developers
0.89
Activations Density 0.022%