INDEX
Explanations
phrases prompting or urging action
questions and prompts that invite action or engagement
New Auto-Interp
Negative Logits
occupation
-0.81
angered
-0.80
anomal
-0.79
perce
-0.77
sustained
-0.76
alleged
-0.74
hospitalized
-0.74
disturbed
-0.74
perceived
-0.74
inco
-0.73
POSITIVE LOGITS
;)
1.42
ðŁĻĤ
1.34
:)
1.30
Simply
1.22
:-)
1.14
Enjoy
1.13
ðŁĺ
1.12
You
1.11
Besides
1.09
Anyway
1.08
Activations Density 0.631%