INDEX
Explanations
phrases related to violent actions or situations
phrases describing violent actions and their consequences
New Auto-Interp
Negative Logits
veto
-0.70
postponed
-0.63
allowed
-0.62
eton
-0.62
hur
-0.62
orea
-0.61
cens
-0.61
offensively
-0.60
vetoed
-0.58
chall
-0.58
POSITIVE LOGITS
Pieces
1.01
pieces
0.88
paste
0.84
fragments
0.84
rubble
0.80
utonium
0.80
shred
0.79
ãĤ©
0.76
ffee
0.73
powder
0.71
Activations Density 0.251%