INDEX
Explanations
references to governance, law, and authoritative institutions
New Auto-Interp
Negative Logits
ses
-0.20
/or
-0.20
tempts
-0.18
bidden
-0.17
nger
-0.16
/her
-0.15
planation
-0.15
plevel
-0.15
eyse
-0.14
ldr
-0.14
POSITIVE LOGITS
orem
0.31
notated
0.20
semble
0.18
/OR
0.18
noDB
0.17
manuel
0.17
/Branch
0.17
_FILENO
0.17
/Sub
0.16
ocene
0.16
Activations Density 0.716%