INDEX
Explanations
words that convey a sense of dissatisfaction or negative emotions
New Auto-Interp
Negative Logits
bol
-0.16
elim
-0.15
-library
-0.14
shal
-0.14
Ìģc
-0.14
\grid
-0.14
elop
-0.13
ÑģпÑĢави
-0.13
aldi
-0.13
anzi
-0.13
POSITIVE LOGITS
覧
0.15
еÑĢÑĪ
0.15
yal
0.14
ı
0.14
avel
0.13
mainwindow
0.13
jang
0.13
vetica
0.13
è¾
0.13
ÃĦ
0.12
Activations Density 0.487%