INDEX
Explanations
terms related to food, health, and their social implications
New Auto-Interp
Negative Logits
anca
-0.19
imest
-0.16
yle
-0.15
ewe
-0.14
ida
-0.14
Bender
-0.14
ega
-0.14
ide
-0.14
iran
-0.14
legate
-0.13
POSITIVE LOGITS
dbg
0.16
.LA
0.16
irket
0.15
pais
0.15
завд
0.14
avit
0.14
nues
0.14
aket
0.14
виÑħ
0.14
umba
0.14
Activations Density 0.348%