INDEX
Explanations
fragments of code or programming-related syntax
New Auto-Interp
Negative Logits
()"↵
-0.16
}'↵
-0.16
%'↵
-0.15
/'↵
-0.15
.''
-0.15
.'↵
-0.14
!'↵
-0.14
ãĢı↵↵
-0.14
;'↵
-0.14
]"↵
-0.14
POSITIVE LOGITS
),
0.63
},
0.58
",
0.58
”,
0.58
],
0.57
),
0.54
>,
0.54
»,
0.54
.),
0.52
',
0.52
Activations Density 2.474%