INDEX
Explanations
references to the detention of individuals, particularly in the context of law enforcement and immigration
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.05
3:0.05
4:0.13
5:0.03
6:0.03
7:0.33
8:0.03
9:0.03
10:0.14
11:0.10
Negative Logits
issance
-1.64
iour
-1.58
synergy
-1.46
IENCE
-1.44
coffers
-1.43
benef
-1.43
support
-1.40
encour
-1.40
natureconservancy
-1.40
姫
-1.38
POSITIVE LOGITS
interrogated
1.74
detained
1.52
Guant
1.43
Pakistani
1.42
interrog
1.42
detain
1.40
questioning
1.38
suspected
1.38
suspicious
1.38
Domin
1.37
Activations Density 0.007%