INDEX
Explanations
elements and components related to code structure or syntactical elements in programming
New Auto-Interp
Negative Logits
416
-0.20
387
-0.20
412
-0.19
422
-0.19
438
-0.19
402
-0.19
384
-0.18
403
-0.18
373
-0.18
442
-0.18
POSITIVE LOGITS
0.46
632
0.33
630
0.30
634
0.25
633
0.24
-------------
0.23
650
0.23
↵
0.23
640
0.22
635
0.22
Activations Density 0.008%