INDEX
Explanations
punctuation and syntax elements in programming or code-related text
New Auto-Interp
Negative Logits
Orr
-0.17
132
-0.17
131
-0.16
/REC
-0.16
135
-0.16
133
-0.15
/Foundation
-0.15
17
-0.14
332
-0.14
19
-0.14
POSITIVE LOGITS
0.39
0.38
0.26
--------------------
0.25
0.24
104
0.24
105
0.24
↵
0.23
č↵
0.23
↵ ↵
0.23
Activations Density 0.009%