INDEX
Explanations
names of people or entities
proper nouns, specifically names
New Auto-Interp
Negative Logits
ORTS
-0.71
FACE
-0.70
ENTS
-0.67
BILITY
-0.67
ornia
-0.65
pection
-0.63
perature
-0.63
arenas
-0.63
士
-0.62
sylv
-0.62
POSITIVE LOGITS
jad
0.99
henko
0.74
atz
0.74
Nak
0.71
ndra
0.71
kov
0.68
opoulos
0.68
ullah
0.67
uty
0.66
aleb
0.66
Activations Density 0.173%