INDEX
Explanations
references to horror or action movies, particularly with violent or intense themes
New Auto-Interp
Negative Logits
uration
-0.67
xual
-0.62
fty
-0.62
unity
-0.61
antly
-0.60
ciation
-0.59
efully
-0.59
eering
-0.58
ĸļ
-0.58
ERC
-0.57
POSITIVE LOGITS
spin
0.82
Redemption
0.77
liest
0.71
pool
0.70
ritten
0.68
riter
0.67
rick
0.67
Alive
0.67
lift
0.66
soever
0.65
Activations Density 6.986%