INDEX
Explanations
verbs indicating future actions or intentions
phrases that express intentions or plans
New Auto-Interp
Negative Logits
Leban
-0.73
pires
-0.69
eatures
-0.65
anooga
-0.64
selves
-0.63
Apart
-0.58
senal
-0.58
Comes
-0.58
Lot
-0.56
yourselves
-0.56
POSITIVE LOGITS
myself
1.28
write
1.09
confess
1.06
rant
1.04
admit
1.02
recommend
0.96
dedicate
0.92
apologise
0.91
apologize
0.90
reiterate
0.89
Activations Density 0.271%