INDEX
Explanations
negations or words indicating the absence of something
New Auto-Interp
Negative Logits
zeug
-0.16
ruc
-0.15
ει
-0.15
ãģĦãĤĦ
-0.14
Ñĥки
-0.14
sometimes
-0.14
aler
-0.14
undan
-0.14
384
-0.13
USH
-0.13
POSITIVE LOGITS
surprising
0.28
unique
0.25
altogether
0.24
unique
0.24
surpr
0.23
unexpected
0.22
unprecedented
0.22
unusual
0.20
news
0.20
surprise
0.20
Activations Density 0.099%