INDEX
Explanations
phrases related to violent actions resulting in severe harm or death
phrases and actions related to violent or fatal events
New Auto-Interp
Negative Logits
Observer
-0.66
hospitality
-0.60
prosecutions
-0.60
Situation
-0.58
unaffected
-0.58
Anthem
-0.58
Trop
-0.58
outgoing
-0.57
Nasa
-0.55
GAM
-0.55
POSITIVE LOGITS
wered
0.98
pless
0.85
pieces
0.85
ãĥİ
0.83
ffee
0.83
Pieces
0.82
ãĤ©
0.81
shred
0.79
othy
0.76
perfection
0.76
Activations Density 0.203%