INDEX
Explanations
occurrences of personal preferences and experiences related to food and self-expression
New Auto-Interp
Negative Logits
apos
-0.17
atz
-0.16
uin
-0.15
biz
-0.14
xin
-0.14
Halk
-0.14
oner
-0.14
éij
-0.14
olson
-0.14
ver
-0.13
POSITIVE LOGITS
entar
0.18
eless
0.15
ltr
0.15
@student
0.14
mia
0.13
arty
0.13
åĥıæĺ¯
0.13
eneg
0.13
dds
0.13
spons
0.13
Activations Density 0.602%