INDEX
Explanations
negative phrases or sentiments often related to dissatisfaction and disbelief
New Auto-Interp
Negative Logits
esan
-0.16
-0.16
atre
-0.16
avel
-0.15
isi
-0.15
nar
-0.15
ame
-0.14
apid
-0.14
rts
-0.14
θη
-0.14
POSITIVE LOGITS
823
0.16
uju
0.15
Eck
0.15
719
0.14
shit
0.14
ovny
0.14
Kak
0.13
Webb
0.13
ifacts
0.13
466
0.13
Activations Density 0.056%