INDEX
Explanations
cheese varieties, especially cheddar
food-related terms, particularly different types of cheese
New Auto-Interp
Negative Logits
YL
-0.69
writer
-0.67
onics
-0.66
Paige
-0.66
pter
-0.63
WC
-0.62
Eck
-0.62
iating
-0.62
persecut
-0.61
Deng
-0.61
POSITIVE LOGITS
Cheese
1.31
cheese
1.23
clair
1.15
heddar
0.97
chees
0.92
insula
0.83
asper
0.80
rontal
0.78
xon
0.77
Angus
0.76
Activations Density 0.025%