INDEX
Explanations
phrases or sentences containing idiomatic expressions involving physical actions and reactions
expressions that describe criticism or negative judgment
New Auto-Interp
Negative Logits
nces
-0.94
xual
-0.70
fts
-0.69
alg
-0.66
AUD
-0.63
RELE
-0.62
aren
-0.62
ouver
-0.61
iferation
-0.61
qus
-0.60
POSITIVE LOGITS
os
0.68
coffin
0.67
bucket
0.63
osa
0.60
ozone
0.57
forehead
0.57
Thor
0.56
shoulder
0.56
paradise
0.55
sore
0.55
Activations Density 0.138%