INDEX
Explanations
specific nouns related to literature and authors
New Auto-Interp
Negative Logits
ovich
-0.19
zv
-0.17
aver
-0.17
kv
-0.16
Shay
-0.16
ayd
-0.16
AVA
-0.16
pv
-0.15
Royale
-0.15
ayer
-0.15
POSITIVE LOGITS
Jew
0.23
Paw
0.21
jaw
0.21
Sew
0.20
Lew
0.20
wj
0.20
Wor
0.20
elow
0.19
pj
0.19
ds
0.19
Activations Density 0.020%