INDEX
Explanations
phrases expressing uncertainty or an unspecified idea
New Auto-Interp
Negative Logits
elmet
-0.17
åĤ
-0.15
éis
-0.14
lz
-0.14
dou
-0.14
eller
-0.14
emble
-0.14
sey
-0.14
bler
-0.14
okol
-0.13
POSITIVE LOGITS
else
0.20
that
0.18
we
0.17
which
0.17
ÑĤоÑĢ
0.16
ÙĮ
0.15
ÏĢοÏħ
0.15
worth
0.15
that
0.15
ksam
0.15
Activations Density 0.023%