INDEX
Explanations
modal verbs indicating possibility or permission
New Auto-Interp
Negative Logits
forms
-0.63
coming
-0.62
adolesc
-0.62
POL
-0.60
comes
-0.60
po
-0.59
resents
-0.58
IDs
-0.58
jri
-0.57
dri
-0.57
POSITIVE LOGITS
wonder
1.08
notice
1.01
wondering
0.92
wanna
0.91
want
0.91
prefer
0.90
choose
0.90
haps
0.87
hear
0.87
tempted
0.84
Activations Density 0.042%