INDEX
Explanations
variations of the word "or."
New Auto-Interp
Negative Logits
empo
-0.16
både
-0.15
eki
-0.15
ems
-0.15
">ÃĹ</
-0.15
аниÑĨ
-0.15
太éĥİ
-0.14
urve
-0.14
ylon
-0.14
опÑĢоÑģ
-0.14
POSITIVE LOGITS
/or
0.34
phans
0.30
ifice
0.28
lando
0.24
acular
0.24
许
0.23
else
0.22
naments
0.22
phan
0.22
indeed
0.20
Activations Density 0.273%