INDEX
Explanations
mentions of different types of fruits
mentions of the word "fruit."
New Auto-Interp
Negative Logits
Standing
-0.73
agonists
-0.68
nee
-0.65
Century
-0.62
citizens
-0.62
Sioux
-0.61
Rated
-0.60
uled
-0.60
silenced
-0.60
rupulous
-0.59
POSITIVE LOGITS
fruit
1.40
fruit
1.39
fruits
1.25
juice
1.03
Fruit
1.01
ruit
1.01
ruits
0.98
cake
0.95
cakes
0.93
mango
0.82
Activations Density 0.017%