INDEX
Explanations
modal verbs and phrases indicating ability, possibility, necessity, and permission
negative contractions and expressions of inability or restriction
New Auto-Interp
Negative Logits
Chel
-0.68
revisions
-0.64
burgh
-0.64
IVES
-0.62
iets
-0.62
REF
-0.61
Impro
-0.61
iyah
-0.60
unchanged
-0.60
ridden
-0.59
POSITIVE LOGITS
imagine
0.93
nikov
0.84
guys
0.81
lihood
0.79
yourselves
0.78
yourself
0.76
ÅŁ
0.74
wonder
0.70
think
0.69
realise
0.68
Activations Density 0.367%