INDEX
Explanations
words related to physical damage
references to damage and its impact
New Auto-Interp
Negative Logits
zsche
-0.80
zee
-0.78
uesday
-0.69
zbek
-0.69
ramid
-0.69
Reloaded
-0.67
unin
-0.64
guyen
-0.64
rams
-0.64
liner
-0.63
POSITIVE LOGITS
inflicted
1.18
wrought
1.00
damage
0.96
mitigation
0.94
damage
0.88
sustained
0.82
caused
0.81
damaged
0.80
incurred
0.79
done
0.78
Activations Density 0.042%