INDEX
Explanations
words related to actions or intentions
instances of potential actions or capabilities expressed in the conditional or ability form
New Auto-Interp
Negative Logits
irony
-0.65
understatement
-0.64
downside
-0.63
rift
-0.60
reperc
-0.60
RM
-0.59
dimensional
-0.59
ARK
-0.58
supra
-0.58
narciss
-0.57
POSITIVE LOGITS
ilater
0.84
sylv
0.80
safely
0.77
satisfy
0.71
eming
0.68
begin
0.67
shed
0.67
stay
0.66
continue
0.66
anqu
0.65
Activations Density 0.127%