INDEX
Explanations
negative or restrictive phrases, particularly using the word "nor."
New Auto-Interp
Negative Logits
sel
-0.17
ano
-0.16
anim
-0.15
enko
-0.14
Franken
-0.14
ing
-0.14
392
-0.14
niet
-0.13
NotNull
-0.13
urs
-0.13
POSITIVE LOGITS
lify
0.17
deen
0.16
any
0.16
theless
0.16
ãĤīãģĦ
0.16
wegian
0.16
ctal
0.15
necessarily
0.15
tamp
0.15
thern
0.15
Activations Density 0.022%