INDEX
Explanations
references to political figures or leadership roles
New Auto-Interp
Negative Logits
eland
-0.19
strcasecmp
-0.16
zew
-0.15
eken
-0.15
",__
-0.15
aled
-0.14
ableOpacity
-0.14
tual
-0.14
SCP
-0.14
-fat
-0.14
POSITIVE LOGITS
acks
0.15
oner
0.15
observation
0.15
ilater
0.14
tam
0.14
_accessible
0.14
fingers
0.14
pot
0.14
ger
0.14
description
0.14
Activations Density 0.059%