INDEX
Explanations
instances of the word "this" in various contexts
New Auto-Interp
Negative Logits
ctor
-0.15
orman
-0.15
ék
-0.14
iants
-0.14
Gates
-0.14
alendar
-0.14
Jacobs
-0.13
алÑĭ
-0.13
bast
-0.13
etc
-0.13
POSITIVE LOGITS
is
0.20
-this
0.18
Whole
0.16
whole
0.16
âĢĮس
0.15
æĺ¯æĪij
0.15
_requires
0.15
morning
0.15
ones
0.14
ÏĪε
0.14
Activations Density 0.114%