INDEX
Explanations
words related to snacks
mentions of snacks and snack-related items
New Auto-Interp
Negative Logits
ne
-0.75
Caldwell
-0.63
nec
-0.62
plur
-0.61
negatives
-0.61
transcription
-0.60
wills
-0.60
Aux
-0.60
Wein
-0.60
secession
-0.59
POSITIVE LOGITS
cereal
1.02
snacks
0.98
tray
0.92
snack
0.90
tasty
0.90
meal
0.89
beverage
0.87
eaten
0.86
cookies
0.86
dessert
0.85
Activations Density 0.032%