INDEX
Explanations
modal verbs indicating ability or possibility
New Auto-Interp
Negative Logits
phin
-0.15
domicile
-0.14
Val
-0.14
ransition
-0.14
ecome
-0.14
Wunused
-0.13
/domain
-0.13
theast
-0.13
ampp
-0.13
hl
-0.13
POSITIVE LOGITS
afford
0.21
stomach
0.20
somehow
0.18
opic
0.17
354
0.15
kuk
0.15
lek
0.15
AFF
0.15
trust
0.14
acom
0.14
Activations Density 0.111%