INDEX
Explanations
violent actions such as punching, stabbing, or assault
New Auto-Interp
Negative Logits
stellar
-0.77
spons
-0.75
redes
-0.71
soDeliveryDate
-0.70
DragonMagazine
-0.68
isoft
-0.66
Climate
-0.65
amins
-0.63
natureconservancy
-0.62
Insider
-0.62
POSITIVE LOGITS
him
1.01
someone
0.90
somebody
0.85
bystanders
0.84
them
0.82
someone
0.78
awa
0.77
himself
0.76
ched
0.74
bery
0.73
Activations Density 0.134%