INDEX
Explanations
references to eating behaviors or actions related to food consumption
instances of the word "eating" in various contexts
New Auto-Interp
Negative Logits
Mehran
-0.76
vals
-0.76
AU
-0.76
Spectre
-0.76
Boll
-0.68
lv
-0.65
Static
-0.64
bors
-0.63
meier
-0.62
Revolution
-0.62
POSITIVE LOGITS
eating
3.48
Eating
2.94
eating
2.34
eat
2.29
ate
2.07
eats
1.99
eaten
1.93
consuming
1.91
swallowing
1.88
Eat
1.81
Activations Density 0.009%