INDEX
Explanations
negative contractions and phrases implying prohibition or denial
New Auto-Interp
Negative Logits
ÑģÑĮ
-0.15
not
-0.15
enet
-0.14
swire
-0.14
ois
-0.14
à¹Īาย
-0.14
undle
-0.13
NOT
-0.13
adia
-0.13
ãĤ¦ãĤ¹
-0.13
POSITIVE LOGITS
necessarily
0.45
exactly
0.35
'
0.35
quite
0.31
even
0.31
really
0.31
yet
0.26
anymore
0.26
ches
0.24
really
0.24
Activations Density 0.200%