INDEX
Explanations
proper nouns or names
mentions of specific names, particularly the name "Abe" in various contexts
New Auto-Interp
Negative Logits
phis
-0.96
imates
-0.87
neapolis
-0.87
ivities
-0.82
ileaks
-0.82
angular
-0.81
prus
-0.80
matic
-0.80
ophical
-0.80
imedia
-0.80
POSITIVE LOGITS
zz
0.84
legates
0.81
legate
0.79
FORE
0.76
zza
0.76
deen
0.71
zzi
0.71
ça
0.69
gger
0.69
cki
0.69
Activations Density 0.037%