INDEX
Explanations
cheese-related items
references to cheese in various food contexts
New Auto-Interp
Negative Logits
oday
-0.78
uating
-0.74
igious
-0.73
aylor
-0.72
uate
-0.70
rians
-0.69
nant
-0.69
DPR
-0.68
NEY
-0.68
ITNESS
-0.67
POSITIVE LOGITS
cloth
1.32
slic
0.98
mint
0.92
ecake
0.91
cheese
0.90
sandwiches
0.87
bowl
0.85
melts
0.84
fruit
0.84
weed
0.84
Activations Density 0.020%