INDEX
Explanations
proper nouns, specifically names and places
New Auto-Interp
Negative Logits
ssi
-0.16
rale
-0.16
usercontent
-0.15
resse
-0.15
ogue
-0.15
aeda
-0.15
ombo
-0.15
_codegen
-0.14
ampoo
-0.14
iao
-0.14
POSITIVE LOGITS
ob
0.19
obs
0.18
oard
0.18
obus
0.17
obi
0.17
cob
0.16
ub
0.16
arta
0.16
ında
0.15
иÑĤоÑĢ
0.15
Activations Density 0.009%