INDEX
Explanations
instances of violations or breaches of laws, rules, or agreements
New Auto-Interp
Negative Logits
nila
-0.18
ìĤ¼
-0.17
è®
-0.16
egov
-0.16
æŀĿ
-0.14
voksne
-0.14
ienne
-0.14
eser
-0.13
λι
-0.13
(dtype
-0.13
POSITIVE LOGITS
norms
0.23
fundamental
0.21
terms
0.20
accepted
0.19
trust
0.18
basic
0.18
confidentiality
0.18
bounds
0.18
confidence
0.18
etiquette
0.18
Activations Density 0.063%