INDEX
Explanations
phrases related to being arrested or charged with violations
phrases related to legal issues and crime
New Auto-Interp
Negative Logits
nos
-0.80
20439
-0.65
jas
-0.60
dit
-0.58
oris
-0.56
Koh
-0.56
Levi
-0.55
john
-0.54
idas
-0.54
Martian
-0.53
POSITIVE LOGITS
izoph
0.73
ificate
0.67
tein
0.64
ulent
0.63
unia
0.62
netflix
0.61
Citadel
0.61
zilla
0.59
wright
0.59
anooga
0.58
Activations Density 0.402%