INDEX
Explanations
references to evaluating or holding law enforcement accountable for their actions
phrases related to political events and figures
New Auto-Interp
Negative Logits
)"
-0.87
)",
-0.79
)\
-0.67
\)
-0.65
etc
-0.65
beneficial
-0.60
)[
-0.60
english
-0.60
cooper
-0.59
oday
-0.58
POSITIVE LOGITS
etheless
0.80
downright
0.80
inexpl
0.72
utterly
0.71
unmist
0.69
reckoning
0.67
lurking
0.67
unaccount
0.67
goddamn
0.65
shockingly
0.65
Activations Density 1.491%