INDEX
Explanations
patterns of refusal or denial relating to decision-making or communication
New Auto-Interp
Negative Logits
crm
-0.15
riba
-0.15
apur
-0.15
uppe
-0.15
mrt
-0.14
ãĤ¸ãĤ¢
-0.14
lexport
-0.14
mutable
-0.14
xbf
-0.14
ardy
-0.13
POSITIVE LOGITS
accept
0.30
accepting
0.28
accept
0.28
accepts
0.28
Accept
0.26
let
0.24
Accept
0.24
allow
0.23
letting
0.23
ACCEPT
0.23
Activations Density 0.099%