INDEX
Explanations
instances of refusal or non-compliance
New Auto-Interp
Negative Logits
XmlAccessType
-0.84
حياته
-0.75
tigt
-0.70
IBOutlet
-0.67
PathVariable
-0.64
BuildContext
-0.63
thern
-0.63
seca
-0.62
adal
-0.61
fekt
-0.61
POSITIVE LOGITS
refusé
0.93
refusal
0.92
Willing
0.89
Refuse
0.89
refus
0.89
Willing
0.86
refusing
0.83
препратки
0.83
refuse
0.83
unwilling
0.82
Activations Density 0.131%