INDEX
Explanations
phrases or terms related to elephants
references to "Elephant" or related entities
New Auto-Interp
Negative Logits
sburgh
-0.87
lace
-0.72
Kenobi
-0.72
atchewan
-0.69
Kislyak
-0.69
Kers
-0.67
aird
-0.66
Keane
-0.66
DERR
-0.66
raints
-0.66
POSITIVE LOGITS
venth
1.34
Ele
0.97
phant
0.96
oton
0.90
mosqu
0.83
ele
0.82
fter
0.82
phies
0.82
teenth
0.82
Ele
0.81
Activations Density 0.010%