INDEX
Explanations
mentions of elephants
occurrences of the word "elephant" and its variations
New Auto-Interp
Negative Logits
hub
-0.82
Hub
-0.81
oplan
-0.77
bp
-0.74
Hub
-0.73
bnb
-0.72
hubs
-0.72
burgers
-0.71
bsp
-0.71
wcs
-0.70
POSITIVE LOGITS
Ele
3.33
Ele
2.40
ele
2.37
Elephant
1.50
д
1.44
ELE
1.33
elephants
1.29
ele
1.22
Alexandra
1.12
elephant
1.10
Activations Density 0.040%