INDEX
Explanations
food-related instructions or descriptions
New Auto-Interp
Negative Logits
pudding
-0.17
erah
-0.16
Cake
-0.16
pancakes
-0.15
etine
-0.15
strawberry
-0.15
cakes
-0.15
Cake
-0.15
andles
-0.15
Dess
-0.15
POSITIVE LOGITS
chips
0.42
Chips
0.37
chip
0.37
chip
0.34
Chip
0.31
cris
0.30
Chip
0.29
snack
0.28
snacks
0.27
crackers
0.25
Activations Density 0.061%