INDEX
Explanations
mentions of legal and criminal activities or events
phrases and terms related to legal penalties and incarceration
New Auto-Interp
Negative Logits
his
-0.61
wered
-0.60
"))
-0.57
proble
-0.57
HIS
-0.52
Its
-0.52
atcher
-0.51
its
-0.51
alon
-0.51
his
-0.50
POSITIVE LOGITS
respectively
2.14
apiece
1.63
together
1.60
together
1.32
jointly
1.30
collectively
1.28
themselves
1.23
respective
1.22
selves
1.20
Together
1.17
Activations Density 0.930%