INDEX
Explanations
phrases that express negation or absence, particularly the word "none."
New Auto-Interp
Negative Logits
sb
-0.16
èį
-0.16
avo
-0.15
seed
-0.15
nt
-0.15
eer
-0.15
ÙĨا
-0.15
roc
-0.15
eah
-0.15
ENCES
-0.14
POSITIVE LOGITS
THING
0.19
of
0.19
/all
0.17
erg
0.16
ISON
0.15
emachine
0.15
ĵ¨
0.14
ison
0.14
erts
0.14
umber
0.14
Activations Density 0.016%