INDEX
Explanations
mentions of food-related words, particularly those indicating deliciousness
references to food and cooking, particularly appealing dishes
New Auto-Interp
Negative Logits
cath
-0.79
sold
-0.76
ãĥĩãĤ£
-0.72
åĤ
-0.71
izational
-0.70
ttle
-0.69
GROUND
-0.67
walker
-0.66
thood
-0.65
stances
-0.65
POSITIVE LOGITS
Delicious
1.01
ness
0.82
avorite
0.81
upid
0.80
vous
0.79
nesses
0.78
isine
0.76
endish
0.75
ery
0.74
ï¸
0.74
Activations Density 0.028%