INDEX
Explanations
words related to food or nutritional content
New Auto-Interp
Negative Logits
ogo
-0.17
ondon
-0.17
ingu
-0.17
rouch
-0.15
loc
-0.15
574
-0.15
Accepted
-0.15
ande
-0.14
atin
-0.13
ãĥªãĥ¼
-0.13
POSITIVE LOGITS
jang
0.17
olla
0.17
jah
0.15
raith
0.15
loor
0.15
θοÏĤ
0.14
abajo
0.14
technik
0.13
uracion
0.13
oki
0.13
Activations Density 0.033%