INDEX
Explanations
mentions of the word "bee"
references to "bee" or "bees" in the text
New Auto-Interp
Negative Logits
ional
-0.87
igators
-0.77
aries
-0.75
igator
-0.74
ories
-0.74
Honour
-0.74
Balkans
-0.72
ablishment
-0.69
icked
-0.66
Archdemon
-0.65
POSITIVE LOGITS
bee
1.56
pee
1.10
Bee
1.07
bees
1.06
Bee
1.06
bee
1.03
ffe
1.02
zeb
0.96
hyde
0.93
gee
0.92
Activations Density 0.006%