INDEX
Explanations
references to rules, policies, and legal statutes
New Auto-Interp
Negative Logits
onica
-0.18
Verd
-0.15
_traits
-0.15
spell
-0.14
iky
-0.14
asses
-0.14
cona
-0.13
Traits
-0.13
ĸ
-0.13
asse
-0.13
POSITIVE LOGITS
violations
0.24
violation
0.24
violated
0.23
honored
0.22
Viol
0.21
viol
0.20
Violation
0.20
honoured
0.19
-viol
0.19
Applied
0.19
Activations Density 0.199%