INDEX
Explanations
references to prominent figures or people holding authoritative positions
references to political or organizational leaders
New Auto-Interp
Negative Logits
ours
-0.65
Oo
-0.64
nder
-0.63
ighth
-0.63
gging
-0.62
Moor
-0.60
Mile
-0.60
gged
-0.59
INGTON
-0.59
OUT
-0.58
POSITIVE LOGITS
hip
0.99
hips
0.96
negotiator
0.90
cius
0.88
paces
0.86
doms
0.84
esses
0.83
wcs
0.81
pins
0.80
glim
0.78
Activations Density 0.022%