INDEX
Explanations
occurrences of specific names or titles
New Auto-Interp
Negative Logits
lew
-0.15
ž
-0.14
Maxim
-0.14
uy
-0.14
casts
-0.14
nich
-0.14
enic
-0.13
oce
-0.13
uye
-0.13
Bris
-0.13
POSITIVE LOGITS
ics
0.18
ANTE
0.16
Cs
0.16
osto
0.15
esz
0.15
sz
0.15
Sz
0.15
sz
0.14
axon
0.14
ross
0.14
Activations Density 0.004%