INDEX
Explanations
references to law enforcement and legal terms
New Auto-Interp
Head Attr Weights
0:0.03
1:0.04
2:0.19
3:0.09
4:0.02
5:0.03
6:0.19
7:0.11
8:0.04
9:0.06
10:0.05
11:0.09
Negative Logits
yip
-1.41
️
-1.29
earable
-1.20
isite
-1.14
esome
-1.10
inventory
-1.10
hift
-1.03
oute
-1.02
essage
-1.02
abwe
-1.01
POSITIVE LOGITS
Rae
1.12
iren
1.03
ENN
1.01
CLASSIFIED
0.99
enforcement
0.96
illac
0.96
icus
0.91
lesi
0.91
HQ
0.90
opa
0.89
Activations Density 0.027%