INDEX
Explanations
coding-related structures and data
New Auto-Interp
Negative Logits
];↵↵
-0.19
}")↵↵
-0.18
"");↵↵
-0.18
>");↵↵
-0.18
)]↵↵
-0.17
]);↵↵
-0.17
_;↵↵
-0.16
);↵↵
-0.16
{});↵↵-0.16
};↵↵
-0.16
POSITIVE LOGITS
↵↵↵
0.29
,↵↵↵
0.27
()↵↵↵
0.27
*↵↵↵
0.26
"↵↵↵
0.26
'↵↵↵
0.25
{}↵↵↵0.25
[]↵↵↵
0.24
?↵↵↵
0.24
/↵↵↵
0.24
Activations Density 0.010%