INDEX
Explanations
the word "food" in various contexts
references to food and edible items
New Auto-Interp
Negative Logits
Constantin
-0.79
urity
-0.68
ISTER
-0.66
asper
-0.65
isters
-0.65
MIA
-0.65
Asheville
-0.64
agne
-0.64
chal
-0.63
APE
-0.63
POSITIVE LOGITS
lers
1.06
yssey
0.97
ood
0.96
edly
0.94
irect
0.94
ler
0.91
e
0.89
ed
0.88
roid
0.88
ragon
0.87
Activations Density 0.044%