INDEX
Explanations
proper nouns related to literature
mentions of a specific person or entity, likely related to identity or prominence
New Auto-Interp
Negative Logits
calib
-0.76
marqu
-0.68
Met
-0.68
cent
-0.67
diam
-0.63
central
-0.61
Worlds
-0.61
quart
-0.58
Europe
-0.58
Robots
-0.58
POSITIVE LOGITS
hee
4.82
hey
1.39
hea
1.37
hend
1.24
heon
1.22
he
1.21
het
1.19
heet
1.18
kee
1.13
heed
1.11
Activations Density 0.010%