INDEX
Explanations
numerical values, particularly related to measurements and quantities
New Auto-Interp
Negative Logits
0
-0.32
00
-0.31
zero
-0.25
ï¼IJ
-0.23
âĤĢ
-0.21
鼶
-0.21
०
-0.20
_zero
-0.20
ZERO
-0.20
Zero
-0.19
POSITIVE LOGITS
óst
0.15
inky
0.14
Trojan
0.14
ishi
0.14
ophon
0.13
ï¸ı
0.13
eland
0.13
tip
0.13
Deck
0.13
عر
0.13
Activations Density 0.077%