INDEX
Explanations
proper nouns related to geographical locations and titles of authority figures
New Auto-Interp
Negative Logits
iating
-0.72
chell
-0.70
mble
-0.70
orer
-0.68
ters
-0.68
idently
-0.68
ombies
-0.67
TING
-0.67
OUT
-0.67
heric
-0.65
POSITIVE LOGITS
pin
1.18
uin
1.14
doms
1.14
pins
1.05
dom
1.02
DOM
0.99
Arabian
0.93
Arabia
0.92
Abdullah
0.84
fish
0.80
Activations Density 0.960%