INDEX
Explanations
instances of programming syntax and structure
New Auto-Interp
Negative Logits
*/
-1.30
")]
-1.23
");
-1.20
";
-1.17
'];
-1.17
');
-1.16
";
-1.16
");
-1.15
:");
-1.10
"));
-1.10
POSITIVE LOGITS
↵
3.81
↵↵↵
1.14
↵↵
0.93
↵↵↵↵
0.91
↵↵↵↵↵
0.88
↵↵↵↵↵↵↵
0.82
↵↵↵↵↵↵
0.80
↵↵↵↵↵↵↵↵
0.73
↵↵↵↵↵↵↵↵↵
0.71
↵↵↵↵↵↵↵↵↵↵↵
0.69
Activations Density 17.586%