INDEX
Explanations
instances of the word "arrests" in the text
mentions of arrests
New Auto-Interp
Negative Logits
yss
-0.68
vironment
-0.61
psons
-0.59
reen
-0.59
learning
-0.58
Myth
-0.58
WE
-0.58
enthus
-0.57
Mon
-0.57
wb
-0.56
POSITIVE LOGITS
arrests
0.99
arrest
0.87
Arrest
0.74
arrested
0.73
onto
0.73
decriminal
0.69
oppable
0.69
detain
0.68
eering
0.68
quotas
0.65
Activations Density 0.009%