INDEX
Explanations
phrases related to physical force or motion
terms related to disciplinary actions
New Auto-Interp
Negative Logits
igmatic
-0.78
gow
-0.78
oret
-0.73
inav
-0.70
zynski
-0.68
NTS
-0.66
Import
-0.66
mund
-0.64
Diff
-0.64
tex
-0.64
POSITIVE LOGITS
leash
0.89
blaster
0.79
nails
0.78
darts
0.78
away
0.77
knife
0.76
gered
0.73
levers
0.73
thumbs
0.73
vines
0.72
Activations Density 0.063%