INDEX
Explanations
mentions of legal consequences and disobedience towards authority
terms related to legal contempt and obstruction of justice
New Auto-Interp
Negative Logits
nas
-0.82
makers
-0.81
ndra
-0.80
eor
-0.80
bourg
-0.77
erk
-0.74
tt
-0.73
llers
-0.70
kov
-0.69
maker
-0.68
POSITIVE LOGITS
uously
1.28
uous
1.24
ible
0.99
ensibly
0.84
atural
0.78
unfocusedRange
0.77
uments
0.76
atus
0.76
igent
0.76
ciating
0.75
Activations Density 0.039%