INDEX
Explanations
names of places or people
proper nouns, particularly names and locations
New Auto-Interp
Negative Logits
perty
-0.82
ongyang
-0.82
citiz
-0.69
etheless
-0.66
resil
-0.66
pse
-0.65
fundament
-0.64
unden
-0.63
shenan
-0.63
glim
-0.63
POSITIVE LOGITS
phia
0.73
nery
0.69
phant
0.69
ibrary
0.68
quin
0.65
otide
0.64
Towers
0.64
steen
0.63
gow
0.63
bill
0.62
Activations Density 0.396%