INDEX
Explanations
references to a prominent central figure or main attraction in various contexts
New Auto-Interp
Negative Logits
æľĹ
-0.17
TOTYPE
-0.16
apons
-0.15
rone
-0.15
γοÏħ
-0.15
ammers
-0.15
pts
-0.15
nal
-0.15
gabe
-0.15
İ
-0.15
POSITIVE LOGITS
927
0.15
PA
0.15
ar
0.14
onica
0.14
_PO
0.14
-hero
0.14
ziel
0.14
loved
0.13
PA
0.13
starred
0.13
Activations Density 0.032%