INDEX
Explanations
expressions of warmth and friendliness
New Auto-Interp
Negative Logits
acro
-0.16
ivic
-0.16
oman
-0.15
/do
-0.15
ocket
-0.15
horn
-0.14
904
-0.14
jed
-0.14
lor
-0.14
arb
-0.14
POSITIVE LOGITS
elter
0.17
ASCADE
0.16
illac
0.15
lok
0.15
erton
0.14
elry
0.14
argo
0.14
elsey
0.14
lier
0.14
inkel
0.14
Activations Density 0.016%