INDEX
Explanations
activities involving social interactions
instances of the word "with."
New Auto-Interp
Negative Logits
©¶æ¥µ
-0.68
ights
-0.57
mage
-0.55
ati
-0.54
BUG
-0.52
handler
-0.52
ouf
-0.51
ajor
-0.51
>[
-0.51
omsky
-0.51
POSITIVE LOGITS
stood
1.33
regard
1.24
regards
1.24
standing
1.07
impunity
0.96
respect
0.96
drawn
0.93
draw
0.92
holding
0.86
dignity
0.73
Activations Density 0.328%