INDEX
Explanations
phrases related to food and cooking
New Auto-Interp
Negative Logits
ilet
-0.15
conti
-0.15
rosso
-0.14
okay
-0.14
abi
-0.14
Argb
-0.13
chg
-0.13
shitty
-0.13
æĺŃ
-0.13
annya
-0.13
POSITIVE LOGITS
goodness
0.38
action
0.29
greatness
0.29
treat
0.29
fun
0.28
treats
0.28
aw
0.26
glory
0.26
magic
0.25
action
0.24
Activations Density 0.313%