INDEX
Explanations
elements representing coding structures or programming concepts
New Auto-Interp
Negative Logits
})).
-0.23
"]').
-0.22
})",
-0.21
"]),
-0.21
']),
-0.20
')),
-0.20
])),
-0.20
]));
-0.19
')).
-0.19
))).
-0.19
POSITIVE LOGITS
}}↵
0.42
))↵
0.41
)}↵
0.40
]]↵
0.38
]}↵
0.38
']}↵
0.37
()))↵
0.36
"))↵
0.36
']]↵
0.36
)]↵
0.36
Activations Density 0.124%