INDEX
Explanations
phrases related to being arrested or booked into jail
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.05
3:0.05
4:0.09
5:0.02
6:0.07
7:0.39
8:0.03
9:0.03
10:0.12
11:0.06
Negative Logits
framework
-1.61
agall
-1.57
roots
-1.55
tro
-1.53
icans
-1.51
istas
-1.47
rule
-1.47
abies
-1.45
VIDEOS
-1.44
rane
-1.44
POSITIVE LOGITS
misdem
1.61
misdemeanor
1.48
Sed
1.47
Cobb
1.39
boarding
1.38
felony
1.37
Circuit
1.36
Kuala
1.36
arra
1.34
Siberian
1.33
Activations Density 0.001%