INDEX
Explanations
expressions of humility or references to being humble
New Auto-Interp
Negative Logits
adem
-0.17
oci
-0.17
heel
-0.15
ampion
-0.15
allen
-0.15
p
-0.15
upo
-0.14
pcf
-0.14
ogn
-0.14
pour
-0.14
POSITIVE LOGITS
hum
0.43
Hum
0.40
Hum
0.38
pty
0.30
hum
0.29
iliated
0.25
mers
0.24
iliate
0.24
ankind
0.23
mock
0.23
Activations Density 0.011%