INDEX
Explanations
proper nouns, particularly names of characters and titles from stories or media
New Auto-Interp
Negative Logits
ALLY
-0.16
_EOL
-0.16
Spoiler
-0.15
iaux
-0.15
beg
-0.15
hire
-0.15
_finalize
-0.15
ÑĪÑĮ
-0.14
gaard
-0.14
argas
-0.14
POSITIVE LOGITS
uki
0.17
B
0.15
etti
0.15
Harris
0.14
achi
0.13
mine
0.13
M
0.13
lt
0.13
irie
0.13
Vo
0.13
Activations Density 0.030%