INDEX
Explanations
negative descriptions of behaviors and attitudes
New Auto-Interp
Negative Logits
ebi
-0.18
517
-0.16
ofire
-0.14
ÑĦÑĢа
-0.14
urre
-0.14
asant
-0.13
æĭľ
-0.13
aylor
-0.13
imore
-0.13
anka
-0.13
POSITIVE LOGITS
empathy
0.44
compassion
0.43
sympathy
0.43
compass
0.41
pity
0.40
Compass
0.39
empath
0.37
sympath
0.36
Emp
0.35
sy
0.35
Activations Density 0.136%