INDEX
Explanations
actions related to eating
actions related to eating and social interactions
New Auto-Interp
Negative Logits
pects
-0.63
Reviewer
-0.62
Timeline
-0.61
ocus
-0.58
etheless
-0.56
RELE
-0.55
NetMessage
-0.55
spective
-0.54
nutshell
-0.54
notably
-0.54
POSITIVE LOGITS
toilet
0.67
refill
0.64
snack
0.63
washing
0.62
chewing
0.62
saline
0.60
toilets
0.60
cig
0.60
drink
0.60
dye
0.58
Activations Density 2.489%