INDEX
Explanations
words and phrases relating to empathy and sympathy
New Auto-Interp
Negative Logits
orer
-0.17
ledge
-0.16
stÃŃ
-0.15
ilver
-0.15
esty
-0.14
yne
-0.14
ei
-0.14
ardown
-0.14
unei
-0.14
auc
-0.14
POSITIVE LOGITS
isen
0.15
ronic
0.15
sympath
0.14
amas
0.14
ronics
0.14
130
0.14
berger
0.14
sympathetic
0.14
mis
0.14
/em
0.14
Activations Density 0.029%