INDEX
Explanations
proper nouns, specifically names of people
names of prominent individuals, particularly in the context of fashion, film, and politics
New Auto-Interp
Negative Logits
ORTS
-0.83
士
-0.79
merce
-0.72
ruary
-0.71
lished
-0.70
âĶĢâĶĢ
-0.68
STATE
-0.68
essee
-0.66
DragonMagazine
-0.62
alore
-0.62
POSITIVE LOGITS
kov
0.74
uty
0.72
zinski
0.68
jad
0.68
atz
0.66
opoulos
0.66
bard
0.65
ansky
0.62
ahl
0.62
(@
0.61
Activations Density 0.207%