INDEX
Negative Logits
crec
-0.08
质
-0.08
simple
-0.08
Simple
-0.07
bool
-0.07
sederhana
-0.07
valid
-0.07
Building
-0.07
onderzoek
-0.07
Cong
-0.07
POSITIVE LOGITS
unintended
0.15
inadvert
0.11
unint
0.11
نطاق
0.11
неж
0.11
inadvertently
0.10
specificity
0.10
undes
0.10
collateral
0.10
unwanted
0.10
Activations Density 0.010%