INDEX
Explanations
phrases related to dissatisfaction or complaints
New Auto-Interp
Negative Logits
opp
-0.14
arak
-0.14
alue
-0.14
-pos
-0.14
ud
-0.14
Everything
-0.13
opia
-0.13
anko
-0.13
611
-0.13
pos
-0.13
POSITIVE LOGITS
whom
0.36
who
0.36
who
0.27
whose
0.26
myself
0.25
some
0.22
whose
0.22
اÙĦذÙĬÙĨ
0.22
kteÅĻÃŃ
0.21
ones
0.21
Activations Density 0.233%