INDEX
Explanations
names of famous individuals
proper nouns, specifically names of individuals
New Auto-Interp
Negative Logits
intendent
-0.74
Reviewer
-0.70
Region
-0.69
DCS
-0.69
Dhabi
-0.67
theless
-0.67
tein
-0.66
dylib
-0.65
Purg
-0.64
Ow
-0.63
POSITIVE LOGITS
ravis
0.70
lawy
0.68
stad
0.66
enty
0.65
eman
0.64
beck
0.63
hler
0.63
acher
0.62
assian
0.61
itton
0.61
Activations Density 0.181%