INDEX
Explanations
structural elements and organization in code-related text
New Auto-Interp
Negative Logits
s
-0.55
i
-0.31
L
-0.27
n
-0.27
B
-0.26
S
-0.25
C
-0.25
_
-0.25
T
-0.25
[
-0.24
POSITIVE LOGITS
'gc
0.16
anan
0.15
2
0.15
InstanceOf
0.15
1
0.15
er
0.15
3
0.14
ics
0.14
ty
0.14
unde
0.14
Activations Density 0.241%