INDEX
Explanations
references to sugary substances
terms related to sugar and its effects or uses
New Auto-Interp
Negative Logits
naire
-0.86
naires
-0.73
semble
-0.73
atform
-0.71
agame
-0.71
ļé
-0.71
orate
-0.70
ership
-0.68
atche
-0.68
ersen
-0.68
POSITIVE LOGITS
cane
1.15
beet
1.13
syrup
1.11
coating
0.84
sugar
0.83
coat
0.81
daddy
0.80
mell
0.80
bush
0.77
water
0.76
Activations Density 0.029%