INDEX
Explanations
various types of cheese and cheese-related instructions or dishes
New Auto-Interp
Negative Logits
oday
-0.82
igious
-0.79
ITNESS
-0.72
NCT
-0.70
ership
-0.69
nces
-0.69
uating
-0.68
uate
-0.68
ocally
-0.67
inen
-0.66
POSITIVE LOGITS
cloth
1.37
ecake
1.07
slic
1.07
cheese
0.99
nut
0.94
sandwiches
0.93
sandwich
0.91
cream
0.90
bread
0.89
bowl
0.87
Activations Density 0.015%