INDEX
Explanations
expressions related to criticism and confrontation
New Auto-Interp
Negative Logits
nces
-0.73
CLUS
-0.71
unchanged
-0.70
urdue
-0.70
USER
-0.68
ItemImage
-0.66
unaccompanied
-0.64
aepernick
-0.64
iae
-0.62
externalActionCode
-0.62
POSITIVE LOGITS
feathers
0.96
horn
0.91
chops
0.90
horns
0.86
drum
0.86
crap
0.86
brim
0.86
punches
0.86
nose
0.85
finger
0.84
Activations Density 3.543%