INDEX
Explanations
phrases indicating giving up or surrendering
New Auto-Interp
Negative Logits
Ïĥο
-0.16
EU
-0.15
appa
-0.14
premi
-0.14
olf
-0.14
Folk
-0.14
ÄĮer
-0.14
utsch
-0.13
adle
-0.13
олÑİ
-0.13
POSITIVE LOGITS
-eff
0.14
sert
0.14
nst
0.14
Eff
0.14
ÑĤÑĸв
0.14
razier
0.14
annie
0.14
page
0.14
Joy
0.13
eff
0.13
Activations Density 0.011%