INDEX
Negative Logits
ActionTypes
-0.10
wreck
-0.09
ãĢĥ
-0.09
wre
-0.09
Sap
-0.09
Brut
-0.09
;/*
-0.08
serif
-0.08
zano
-0.08
);$
-0.08
POSITIVE LOGITS
forall
0.14
everyone
0.13
forall
0.12
others
0.12
everyone
0.12
towards
0.11
vůÄįi
0.11
toward
0.11
bagi
0.10
вÑģеÑħ
0.10
Activations Density 0.053%