INDEX
Explanations
food-related words, specifically focusing on snacks
references to snack foods
New Auto-Interp
Negative Logits
negatives
-0.72
Tsarnaev
-0.68
ne
-0.65
bred
-0.64
ocal
-0.60
ttle
-0.59
Angels
-0.59
nec
-0.59
priesthood
-0.58
Huntington
-0.58
POSITIVE LOGITS
snacks
1.05
snack
0.98
eteria
0.94
oleon
0.85
beverage
0.84
eaten
0.84
Food
0.83
Drink
0.82
washer
0.82
Tray
0.81
Activations Density 0.012%