INDEX
Negative Logits
fence
-0.07
lives
-0.07
radar
-0.07
scene
-0.06
Hack
-0.06
roof
-0.06
Safe
-0.06
Xavier
-0.06
fan
-0.06
reproductive
-0.06
POSITIVE LOGITS
entitled
0.13
entitlement
0.10
itled
0.08
entic
0.07
inc
0.07
worth
0.07
itles
0.07
elt
0.07
nt
0.07
IBUT
0.07
Activations Density 0.003%