INDEX
Explanations
expressions of jealousy and envy
New Auto-Interp
Negative Logits
KP
-0.16
loy
-0.16
edis
-0.16
unga
-0.16
mare
-0.15
andon
-0.14
é¡
-0.14
Khu
-0.14
testdata
-0.14
мÑı
-0.13
POSITIVE LOGITS
envy
0.59
jealous
0.54
jealousy
0.48
env
0.37
-env
0.34
env
0.33
/env
0.31
Env
0.27
(env
0.26
ENV
0.26
Activations Density 0.180%