INDEX
Explanations
references to video games or software management
New Auto-Interp
Negative Logits
...č↵
-0.20
,)↵
-0.20
...)↵
-0.19
...')↵
-0.18
(...)↵
-0.18
_)↵
-0.18
ãĢĭ↵
-0.18
...',↵
-0.18
>↵
-0.17
...";↵
-0.17
POSITIVE LOGITS
·
0.30
.
0.28
;.
0.27
.
0.27
¶
0.24
:.
0.23
;
0.23
[].
0.23
...
0.23
-----
0.22
Activations Density 0.539%