INDEX
Explanations
mentions of cheese
references to cheese in various contexts
New Auto-Interp
Negative Logits
oday
-0.86
igious
-0.82
ITNESS
-0.79
aylor
-0.76
glim
-0.73
uating
-0.73
uate
-0.72
igating
-0.72
igators
-0.69
nces
-0.68
POSITIVE LOGITS
cloth
1.35
slic
1.05
cheese
0.96
ecake
0.94
sandwiches
0.89
nut
0.88
sandwich
0.87
fruit
0.87
bread
0.86
bowl
0.85
Activations Density 0.034%