INDEX
Explanations
words related to horrific and violent events
references to extreme violence or trauma
New Auto-Interp
Negative Logits
cius
-0.68
pai
-0.68
pta
-0.66
arten
-0.66
sama
-0.66
arton
-0.65
Folder
-0.64
annis
-0.64
VO
-0.63
ritis
-0.63
POSITIVE LOGITS
ally
1.04
earthqu
0.93
atrocities
0.81
horrors
0.81
nightmares
0.76
eleph
0.74
iously
0.73
asylum
0.73
beasts
0.72
tort
0.72
Activations Density 0.027%