INDEX
Explanations
names containing the letter combination "ele"
references to elephants
New Auto-Interp
Negative Logits
DPR
-0.87
Kers
-0.75
Kenobi
-0.73
Present
-0.71
DERR
-0.69
Adv
-0.68
displayText
-0.66
EED
-0.66
Papers
-0.66
Fuk
-0.66
POSITIVE LOGITS
ele
1.29
phant
1.08
venth
1.00
ghan
0.93
izabeth
0.89
fter
0.85
ven
0.85
ttes
0.81
bies
0.79
theless
0.78
Activations Density 0.008%