INDEX
Explanations
words pertaining to food descriptions or types of cuisine
New Auto-Interp
Negative Logits
nya
-0.34
y
-0.32
li
-0.32
me
-0.31
pro
-0.30
mente
-0.29
ka
-0.29
so
-0.28
v
-0.28
ger
-0.28
POSITIVE LOGITS
Ùĭ
0.29
ugh
0.29
'nın
0.29
frica
0.29
’nın
0.28
oui
0.25
issance
0.25
ught
0.25
eus
0.25
ughty
0.24
Activations Density 0.931%