INDEX
Explanations
phrases related to legal or law enforcement contexts, with a particular focus on individuals
mentions of "person" in various contexts related to individual actions or situations
New Auto-Interp
Negative Logits
unctions
-0.74
Lans
-0.71
enthal
-0.71
DL
-0.64
CCC
-0.63
corridors
-0.63
Lions
-0.62
rss
-0.61
urations
-0.61
Lank
-0.61
POSITIVE LOGITS
hood
1.34
nel
1.05
who
0.84
ification
0.82
ifies
0.80
ified
0.80
uscript
0.79
else
0.78
acles
0.77
who
0.77
Activations Density 0.032%