INDEX
Explanations
descriptions of food and flavors
the word "delicious" and variations of its sentiment in different contexts
New Auto-Interp
Negative Logits
vernment
-0.70
BU
-0.69
Dru
-0.68
Mesh
-0.65
mberg
-0.64
Huntington
-0.64
den
-0.64
vere
-0.63
patient
-0.62
FT
-0.61
POSITIVE LOGITS
delicious
1.40
Delicious
1.14
tasty
1.12
juicy
1.07
nesses
0.98
nutritious
0.97
meals
0.96
pastry
0.93
tasting
0.92
ness
0.92
Activations Density 0.009%