INDEX
Explanations
mentions of the dessert "ice cream"
references to ice cream
New Auto-Interp
Negative Logits
ancial
-0.80
nesday
-0.77
enance
-0.77
entary
-0.74
rompt
-0.73
INGTON
-0.73
ilitary
-0.72
ention
-0.71
ENTION
-0.70
ufact
-0.70
POSITIVE LOGITS
breaker
1.23
cream
1.21
breakers
1.18
cream
1.15
rink
1.04
pick
1.01
Cream
1.00
skating
0.97
crystals
0.90
flake
0.89
Activations Density 0.029%