INDEX
Explanations
references to cookies, specifically focusing on actions involving cookies like making, eating, and tracking them
references to cookies, both as a food item and in a metaphorical context
New Auto-Interp
Negative Logits
SI
-0.71
ashtra
-0.70
ities
-0.68
rior
-0.64
abouts
-0.63
orth
-0.63
WAYS
-0.62
ity
-0.62
ORN
-0.62
itate
-0.62
POSITIVE LOGITS
cookies
1.24
dough
1.09
Clicker
1.09
cookie
1.07
jar
1.01
Cookies
1.00
Cookie
0.97
cookie
0.96
cutter
0.95
jars
0.92
Activations Density 0.015%