INDEX
Explanations
specific unicode characters or symbols
New Auto-Interp
Negative Logits
--
-0.18
['
-0.18
‘
-0.18
.--
-0.17
'
-0.16
-->↵
-0.16
`
-0.16
‘
-0.16
!--
-0.16
---
-0.16
POSITIVE LOGITS
ÂŃ
0.61
ÂŃ
0.43
ÂŃt
0.39
ÂŃs
0.38
âĢħ
0.36
ÂŃn
0.35
ÂŃing
0.34
ÂŃi
0.32
ÂŃtion
0.31
č
0.31
Activations Density 0.001%