INDEX
Explanations
the word "other" in various contexts
New Auto-Interp
Negative Logits
otherwise
-0.20
Otherwise
-0.18
cken
-0.17
otherwise
-0.17
uel
-0.17
Otherwise
-0.16
sonst
-0.15
lain
-0.15
anner
-0.15
rong
-0.15
POSITIVE LOGITS
-than
0.37
than
0.35
world
0.31
niż
0.30
wis
0.28
than
0.28
equally
0.28
similarly
0.27
ewise
0.26
-world
0.25
Activations Density 0.113%