INDEX
Explanations
references to specific individuals and their roles or achievements
New Auto-Interp
Negative Logits
aju
-0.16
[__
-0.16
wick
-0.15
okino
-0.14
ارÙģ
-0.14
oger
-0.14
nement
-0.14
lier
-0.14
INV
-0.14
OTH
-0.13
POSITIVE LOGITS
zej
0.15
lue
0.15
agan
0.14
\s
0.14
hyp
0.14
ippi
0.14
IEW
0.13
_core
0.13
elsey
0.13
paren
0.13
Activations Density 0.056%