INDEX
Explanations
words related to authority figures or positions of power
words related to indicators or signs
New Auto-Interp
Negative Logits
meric
-0.70
lde
-0.68
nder
-0.66
slideshow
-0.65
Derby
-0.65
bury
-0.65
mileage
-0.64
Pegasus
-0.63
Mid
-0.62
MER
-0.62
POSITIVE LOGITS
ict
1.12
uple
0.93
icts
0.92
inel
0.90
omatic
0.87
rypt
0.86
ory
0.85
ificate
0.84
enance
0.84
ypes
0.84
Activations Density 0.013%