INDEX
Explanations
mentions or references to elephants
references to elephants
New Auto-Interp
Negative Logits
sburgh
-0.85
Kenobi
-0.77
aldehyde
-0.73
Fargo
-0.72
raints
-0.70
Kislyak
-0.67
Kendall
-0.66
enegger
-0.66
DERR
-0.65
Keane
-0.65
POSITIVE LOGITS
venth
1.37
phant
1.16
oton
1.02
fter
0.94
phies
0.90
ven
0.89
lect
0.82
ighth
0.82
reon
0.81
ught
0.80
Activations Density 0.011%