INDEX
Explanations
conjunctions that indicate contrast or exceptions
New Auto-Interp
Negative Logits
ardon
-0.17
inand
-0.16
ichen
-0.16
chk
-0.15
ãĤ¤ãĤ¯
-0.15
ullo
-0.15
аÑĢам
-0.14
emme
-0.14
ìłķìĿĦ
-0.14
ombok
-0.14
POSITIVE LOGITS
unn
0.15
umen
0.15
tamp
0.15
Nil
0.15
neutral
0.15
omm
0.15
Nam
0.14
课
0.14
alue
0.14
tat
0.14
Activations Density 0.023%