INDEX
Explanations
mentions of the word "bull"
references to bull-related themes or contexts
New Auto-Interp
Negative Logits
mble
-0.91
Rad
-0.81
nces
-0.81
LY
-0.79
lly
-0.77
ä½ľ
-0.76
MENT
-0.74
ALLY
-0.72
nce
-0.72
DATA
-0.72
POSITIVE LOGITS
frog
1.00
elephants
0.95
fights
0.93
sharks
0.93
elephant
0.93
horns
0.92
ocks
0.88
fighters
0.87
fighter
0.84
bull
0.84
Activations Density 0.006%