INDEX
Explanations
expressions of identity and existence
New Auto-Interp
Negative Logits
reportedly
-0.20
said
-0.18
æį®
-0.17
said
-0.16
uxtap
-0.16
_expected
-0.15
ìķĮ볤
-0.15
reminder
-0.15
bekannt
-0.15
æĺİ
-0.15
POSITIVE LOGITS
indeed
0.44
inde
0.30
Indeed
0.29
Indeed
0.28
headed
0.24
somehow
0.23
actually
0.21
heading
0.19
ÙĪØ£ÙĨ
0.19
actually
0.17
Activations Density 0.384%