INDEX
Explanations
occurrences of punctuation marks and specific formatting characters in a code context
New Auto-Interp
Negative Logits
Bates
-0.15
remaining
-0.14
pron
-0.14
babys
-0.14
laughter
-0.14
PG
-0.14
Ãłi
-0.14
гл
-0.13
lico
-0.13
ли
-0.13
POSITIVE LOGITS
break
0.39
break
0.35
break
0.34
-break
0.32
breaks
0.28
Break
0.27
_break
0.26
Break
0.26
brake
0.25
;break
0.25
Activations Density 0.030%