INDEX
Explanations
mentions of sugar, whether in a negative context (sugar-free) or referring to its effects
New Auto-Interp
Negative Logits
orial
-0.17
RLF
-0.16
eh
-0.15
.dds
-0.15
.dtd
-0.15
sa
-0.15
sing
-0.15
sel
-0.15
mgr
-0.15
ek
-0.14
POSITIVE LOGITS
cane
0.26
coat
0.26
CRM
0.23
crm
0.23
Cube
0.19
refin
0.19
imoto
0.18
å°¿
0.18
Zucker
0.18
daddy
0.18
Activations Density 0.009%