INDEX
Explanations
words related to political figures
instances of the comma punctuation
New Auto-Interp
Negative Logits
olars
-0.71
acters
-0.69
ocial
-0.66
versive
-0.65
eals
-0.65
isers
-0.65
inner
-0.62
atlantic
-0.62
ictional
-0.61
offender
-0.61
POSITIVE LOGITS
76561
0.81
Jr
0.77
paio
0.75
aka
0.72
ĪĴ
0.68
ucci
0.67
Kard
0.67
etc
0.66
Baird
0.65
Sr
0.65
Activations Density 0.175%