INDEX
Explanations
words related to interactions and engagement with an audience
phrases indicating audience acceptance or engagement
New Auto-Interp
Negative Logits
WithNo
-0.80
ĨĴ
-0.78
ELD
-0.70
ãĥĺãĥ©
-0.69
ãĤ¬
-0.67
ãĥĩãĤ£
-0.66
ãĤ´
-0.65
itary
-0.65
ãĤ¦ãĤ¹
-0.63
ARB
-0.62
POSITIVE LOGITS
approve
1.25
respond
1.16
react
1.15
disapprove
1.13
agree
1.09
reciproc
1.09
laugh
1.07
flock
1.05
notice
1.04
balk
1.02
Activations Density 0.393%