INDEX
Explanations
violent actions or descriptions
phrases that express negative emotions or actions
New Auto-Interp
Negative Logits
icipated
-0.82
authorised
-0.72
andum
-0.71
psey
-0.70
envis
-0.69
Honour
-0.68
ndum
-0.68
undertaken
-0.68
fulfil
-0.66
commenced
-0.65
POSITIVE LOGITS
stuff
0.79
crap
0.74
shit
0.73
weird
0.71
Kids
0.67
Crazy
0.66
Creep
0.66
garbage
0.66
kinda
0.65
dude
0.64
Activations Density 1.821%