INDEX
Explanations
references to law enforcement and violent incidents
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.05
3:0.04
4:0.05
5:0.04
6:0.37
7:0.05
8:0.03
9:0.03
10:0.13
11:0.10
Negative Logits
heit
-1.26
uous
-1.22
IRD
-1.20
JV
-1.18
¯¯
-1.17
EW
-1.17
Norn
-1.15
Latter
-1.14
catentry
-1.12
invoke
-1.12
POSITIVE LOGITS
anamo
1.72
abwe
1.53
裏�
1.51
accompan
1.49
obin
1.45
EStream
1.44
destro
1.42
ibaba
1.41
Sparks
1.40
eleph
1.36
Activations Density 0.022%