INDEX
Explanations
references to candy and sweets
New Auto-Interp
Negative Logits
edList
-0.18
ething
-0.17
tings
-0.15
anlar
-0.15
åı·
-0.15
umpt
-0.15
tant
-0.15
adera
-0.15
ewire
-0.14
aldo
-0.14
POSITIVE LOGITS
cane
0.24
-striped
0.20
bars
0.20
wrappers
0.20
corn
0.20
apple
0.19
Wrapper
0.19
gram
0.19
Corn
0.18
wrapper
0.18
Activations Density 0.009%