INDEX
Explanations
phrases related to controversial or extreme actions or beliefs
terms related to violence or violence-related concepts
New Auto-Interp
Negative Logits
asions
-0.57
umption
-0.56
itars
-0.55
cies
-0.54
istries
-0.51
ancies
-0.51
ispers
-0.50
interviews
-0.49
Scenes
-0.49
arton
-0.48
POSITIVE LOGITS
conduit
0.64
starter
0.63
contender
0.62
underdog
0.62
worthy
0.60
sleeper
0.58
unto
0.57
acea
0.56
incarn
0.56
fodder
0.56
Activations Density 0.934%