INDEX
Explanations
mentions of chocolate and dessert-related terms
New Auto-Interp
Negative Logits
atives
-0.86
WARD
-0.82
umbnail
-0.73
kus
-0.72
ership
-0.71
igate
-0.71
Imran
-0.70
ATIVE
-0.70
inen
-0.68
REPORT
-0.68
POSITIVE LOGITS
cake
0.94
pudding
0.91
anut
0.89
chocolate
0.89
coated
0.89
cane
0.84
chip
0.83
âĺħâĺħ
0.82
flavored
0.81
butter
0.81
Activations Density 7.922%