INDEX
Explanations
names of famous individuals or entities
references to something being famous
New Auto-Interp
Negative Logits
otent
-0.71
avery
-0.68
Bots
-0.67
ifle
-0.67
heed
-0.66
vae
-0.65
alone
-0.65
cise
-0.65
adies
-0.65
THER
-0.63
POSITIVE LOGITS
famous
1.04
rities
1.01
famous
0.95
Famous
0.82
nickname
0.80
headlines
0.80
infamous
0.78
ness
0.74
renown
0.74
NESS
0.73
Activations Density 0.009%