INDEX
Explanations
contractions ending in 't
negative statements or prohibitions
New Auto-Interp
Negative Logits
imity
-0.71
CI
-0.69
ULTS
-0.66
è£ıè
-0.64
yk
-0.63
è¦ļéĨĴ
-0.61
ENDED
-0.60
sidx
-0.59
ELD
-0.59
senal
-0.59
POSITIVE LOGITS
afford
1.36
rely
0.95
ignore
0.95
possibly
0.94
deny
0.93
underestimate
0.90
imagine
0.89
resist
0.89
blame
0.86
accuse
0.86
Activations Density 0.068%