INDEX
Explanations
statements related to legal authority and human rights issues
New Auto-Interp
Negative Logits
achsen
-0.14
Crime
-0.14
çĬ¯
-0.14
Murder
-0.14
罪
-0.14
incap
-0.13
Rebellion
-0.13
Crime
-0.13
628
-0.13
utt
-0.13
POSITIVE LOGITS
arbitrary
0.27
Kafka
0.26
Dra
0.26
retro
0.24
Arbitrary
0.24
rushed
0.23
dra
0.22
discriminatory
0.22
kafka
0.21
subjective
0.21
Activations Density 0.400%