INDEX
Explanations
mentions of specific types of cheese
the end-of-document token
New Auto-Interp
Negative Logits
eering
-0.77
estamp
-0.72
urrent
-0.70
rities
-0.69
unders
-0.69
estern
-0.67
points
-0.67
er
-0.67
isse
-0.65
yi
-0.62
POSITIVE LOGITS
ificial
0.76
atos
0.76
isoft
0.73
adeon
0.72
Downloadha
0.70
natureconservancy
0.67
amia
0.66
Pwr
0.65
HAHAHAHA
0.65
Garc
0.64
Activations Density 0.062%