INDEX
Explanations
negative sentiments or phrases indicating disapproval
New Auto-Interp
Negative Logits
lobals
-0.18
McB
-0.18
arhus
-0.17
shal
-0.16
æ¼Ķ
-0.14
anco
-0.14
abr
-0.14
ãĥĨãĥ«
-0.14
éry
-0.14
ãĥĩãĥ«
-0.14
POSITIVE LOGITS
anger
0.19
æľĹ
0.15
perator
0.15
CG
0.15
ÙIJÙĥ
0.15
Nob
0.15
nave
0.15
翼
0.15
aset
0.15
ANGER
0.15
Activations Density 0.035%