INDEX
Explanations
references to food and dining experiences
New Auto-Interp
Negative Logits
fal
-0.17
rub
-0.16
cona
-0.15
gradable
-0.14
fal
-0.14
Cage
-0.14
peria
-0.14
jenter
-0.14
rub
-0.14
icket
-0.14
POSITIVE LOGITS
liver
0.23
Liver
0.20
Liver
0.20
kidney
0.19
Spam
0.19
cust
0.18
kidneys
0.18
hashed
0.17
spam
0.17
stew
0.16
Activations Density 0.089%