INDEX
Explanations
references to celebrities or prominent figures, particularly in relation to their status or roles
New Auto-Interp
Negative Logits
ester
-0.20
erson
-0.19
kart
-0.18
ÛĮا
-0.18
als
-0.17
estro
-0.17
stakes
-0.17
adesh
-0.17
esters
-0.17
spir
-0.17
POSITIVE LOGITS
ry
0.29
vation
0.28
ved
0.28
burst
0.27
kest
0.26
light
0.25
fish
0.24
red
0.24
bucks
0.23
let
0.23
Activations Density 0.038%