INDEX
Explanations
capital letters or specific names and titles
New Auto-Interp
Negative Logits
elin
-0.20
cx
-0.18
cxx
-0.18
ocs
-0.17
elas
-0.16
amax
-0.15
ec
-0.15
ears
-0.15
SCII
-0.15
Bird
-0.15
POSITIVE LOGITS
yo
0.31
gun
0.30
jez
0.27
duk
0.25
ndo
0.25
wo
0.25
dog
0.24
lor
0.24
kit
0.24
du
0.24
Activations Density 0.010%