INDEX
Explanations
phrases related to the use of excessive force
references to excessive or disproportionate behavior or actions
New Auto-Interp
Negative Logits
better
-0.81
ector
-0.79
stone
-0.78
ect
-0.77
imore
-0.73
spot
-0.72
herer
-0.72
bard
-0.70
stones
-0.69
scene
-0.69
POSITIVE LOGITS
amounts
0.99
reliance
0.94
haste
0.89
complexity
0.88
burdens
0.87
caution
0.87
consumption
0.87
burden
0.86
scrutiny
0.82
quantities
0.81
Activations Density 0.061%