INDEX
Explanations
expressions of grumpiness and related negative emotions
New Auto-Interp
Negative Logits
oodles
-0.15
enet
-0.15
eden
-0.14
طار
-0.14
orderid
-0.14
ernetes
-0.14
edral
-0.14
trot
-0.14
anye
-0.14
paran
-0.14
POSITIVE LOGITS
gr
0.38
ueling
0.26
udging
0.24
aces
0.23
inning
0.22
uff
0.21
gr
0.21
uel
0.20
Gr
0.19
istle
0.19
Activations Density 0.013%