INDEX
Explanations
terms associated with empathy and sympathy
New Auto-Interp
Negative Logits
ebi
-0.17
kl
-0.16
ý
-0.16
acci
-0.15
imore
-0.15
аниÑĨ
-0.15
egers
-0.14
als
-0.14
ÃŃt
-0.14
ller
-0.14
POSITIVE LOGITS
sympath
0.31
sympathetic
0.30
sympathy
0.29
empath
0.26
compass
0.26
empathy
0.25
Sy
0.23
etic
0.23
towards
0.23
symp
0.22
Activations Density 0.013%