INDEX
Explanations
references to specific snack foods or candies
New Auto-Interp
Negative Logits
.tom
-0.18
bé
-0.17
ãĥ¥
-0.16
pigeon
-0.16
potatoes
-0.16
åľ°ä¸ĭ
-0.16
tomatoes
-0.16
Governors
-0.15
gravy
-0.15
Soup
-0.15
POSITIVE LOGITS
candy
0.28
candies
0.26
nou
0.24
brittle
0.23
marsh
0.21
treat
0.21
Candy
0.21
cand
0.20
chew
0.20
treats
0.20
Activations Density 0.075%