INDEX
Explanations
mentions of food-related content and dining experiences
New Auto-Interp
Negative Logits
fruit
-0.18
fruit
-0.17
cocoa
-0.17
sugar
-0.17
cookies
-0.17
candy
-0.16
berries
-0.16
Fruit
-0.16
berry
-0.16
fluoride
-0.16
POSITIVE LOGITS
restaurant
0.47
Restaurant
0.44
restaurants
0.43
dining
0.42
restaur
0.41
dine
0.41
din
0.39
restaurant
0.38
Restaurant
0.38
Restaurants
0.38
Activations Density 0.590%