INDEX
Explanations
references to potatoes and related food terms
New Auto-Interp
Negative Logits
Marco
-0.16
Marco
-0.16
adil
-0.15
Kra
-0.15
asin
-0.15
pw
-0.15
adel
-0.15
se
-0.15
adio
-0.14
CCC
-0.14
POSITIVE LOGITS
potatoes
0.35
potato
0.35
Potato
0.33
tub
0.26
Idaho
0.24
Chips
0.24
fries
0.24
starch
0.23
chips
0.23
atoes
0.22
Activations Density 0.016%