INDEX
Explanations
phrases indicating dissatisfaction or complaints
New Auto-Interp
Negative Logits
Heck
-0.17
thal
-0.16
çī¹åĪ«
-0.15
imore
-0.15
ONO
-0.14
бо
-0.14
ÐĶÐIJ
-0.14
íĬ¹íŀĪ
-0.14
heck
-0.14
heck
-0.14
POSITIVE LOGITS
sounds
0.31
Sounds
0.30
Sounds
0.28
Translation
0.25
sounds
0.24
Translation
0.23
translation
0.23
sound
0.22
nice
0.21
ounds
0.20
Activations Density 0.318%