INDEX
Explanations
instances of specific number words or numerical references
New Auto-Interp
Negative Logits
záv
-0.15
립
-0.15
ilogy
-0.14
δÏĮ
-0.14
ardash
-0.14
kers
-0.14
elp
-0.14
ker
-0.14
ASM
-0.14
ắp
-0.14
POSITIVE LOGITS
394
0.15
368
0.15
Lamp
0.14
hower
0.14
ronic
0.14
PUTE
0.14
trash
0.14
valuator
0.14
364
0.14
396
0.13
Activations Density 0.027%