INDEX
Explanations
words related to political figures and events
references to specific locations or establishments
New Auto-Interp
Negative Logits
DragonMagazine
-0.92
estamp
-0.90
fucked
-0.70
============
-0.69
retty
-0.68
emort
-0.65
!--
-0.65
GOODMAN
-0.64
furt
-0.64
STD
-0.64
POSITIVE LOGITS
vous
0.67
Gael
0.62
de
0.61
deen
0.61
Labrador
0.61
á
0.60
Trou
0.60
Lago
0.60
Gamble
0.60
aga
0.58
Activations Density 0.288%