INDEX
Explanations
mentions of physical damage or harm
mentions of "damage" and its related contexts
New Auto-Interp
Negative Logits
zsche
-0.74
zee
-0.72
ramid
-0.72
rams
-0.69
atorial
-0.66
liner
-0.66
ellar
-0.66
tre
-0.65
ounty
-0.64
Screen
-0.61
POSITIVE LOGITS
inflicted
1.08
damage
1.04
mitigation
0.92
damage
0.85
wrought
0.84
damaged
0.81
havoc
0.78
horm
0.77
limitation
0.74
wounds
0.73
Activations Density 0.018%