INDEX
Negative Logits
Hannah
-0.07
(comment
-0.07
impressive
-0.07
"),
-0.07
antaged
-0.07
Kath
-0.07
Listener
-0.06
Tuesday
-0.06
-control
-0.06
varying
-0.06
POSITIVE LOGITS
ilo
0.22
elo
0.14
alo
0.14
Milo
0.13
o
0.10
Angelo
0.09
ilos
0.09
ilon
0.08
ило
0.08
Halo
0.07
Activations Density 0.008%