INDEX
Explanations
structural elements of programming code, particularly function definitions and calls
New Auto-Interp
Negative Logits
);}↵↵
-0.21
)}↵↵
-0.18
')}↵
-0.18
)]↵↵
-0.17
'])↵↵
-0.17
);}↵
-0.17
")]↵↵
-0.17
)};↵
-0.16
")}↵
-0.16
'])↵↵↵
-0.16
POSITIVE LOGITS
")))↵
0.37
)))↵
0.36
')))↵
0.36
}))
0.35
")))
0.34
())))↵
0.34
its
0.33
)))
0.33
')))
0.33
}))↵
0.32
Activations Density 0.044%