INDEX
Explanations
instances of discontent or negative emotional expressions
New Auto-Interp
Negative Logits
ãĥ³ãĥĦ
-0.17
elpers
-0.16
petto
-0.16
adiens
-0.15
lobs
-0.15
inyin
-0.15
ackson
-0.15
大人
-0.15
chio
-0.15
etat
-0.15
POSITIVE LOGITS
ep
0.15
603
0.14
kali
0.14
ampus
0.14
uj
0.14
unlimited
0.14
micro
0.14
ãĥ©ãĤ¯
0.14
ahlen
0.13
gre
0.13
Activations Density 0.004%