INDEX
Explanations
phrases indicating recommended actions or opinions on actions to be taken
expressions of recommendation or obligation
New Auto-Interp
Negative Logits
cryptic
-0.65
cycles
-0.65
Pse
-0.64
slips
-0.64
unlucky
-0.61
Cir
-0.61
Sci
-0.60
Bonds
-0.58
WI
-0.57
ãĤ¼
-0.57
POSITIVE LOGITS
reconsider
0.96
©¶æ
0.92
rethink
0.90
apologise
0.88
ashamed
0.84
eryl
0.81
apologize
0.79
clarify
0.79
ople
0.79
rador
0.78
Activations Density 0.219%