INDEX
Explanations
phrases indicating uncertainty or multiple options
New Auto-Interp
Negative Logits
zig
-0.16
obao
-0.15
RIX
-0.15
omi
-0.15
öl
-0.15
voksne
-0.14
owe
-0.14
cept
-0.13
iola
-0.13
åĻ
-0.13
POSITIVE LOGITS
Whatever
0.32
whichever
0.30
whatever
0.30
whatever
0.30
Regardless
0.30
Anyway
0.29
Whatever
0.28
Regardless
0.28
Anyway
0.28
anyway
0.24
Activations Density 0.175%