INDEX
Explanations
mentions of fruit in the text
New Auto-Interp
Negative Logits
aution
-0.80
DOS
-0.71
orld
-0.71
kson
-0.64
entric
-0.64
Schwarzenegger
-0.62
Worldwide
-0.62
govern
-0.62
awar
-0.61
Govern
-0.61
POSITIVE LOGITS
cake
1.22
fruit
1.15
juice
1.06
cakes
1.04
juices
0.94
fulness
0.93
fruit
0.90
nect
0.85
less
0.83
ruit
0.82
Activations Density 0.021%