INDEX
Explanations
expressions of empathy and sympathy
New Auto-Interp
Negative Logits
acco
-0.20
aler
-0.18
ebi
-0.17
unar
-0.15
IFICATION
-0.15
dök
-0.15
ebo
-0.15
kl
-0.15
antino
-0.15
URE
-0.14
POSITIVE LOGITS
etic
0.36
etically
0.35
izing
0.35
izers
0.35
ies
0.33
ize
0.32
izer
0.30
ized
0.29
izes
0.28
etical
0.27
Activations Density 0.015%