INDEX
Explanations
occurrences of specific characters or symbols
New Auto-Interp
Negative Logits
ÂĿ
-0.17
..."↵
-0.14
PIX
-0.14
`↵↵
-0.14
LOUR
-0.13
kind
-0.13
raç
-0.13
"`↵
-0.13
sort
-0.13
;"↵
-0.13
POSITIVE LOGITS
<
0.48
,<
0.39
<
0.38
<br
0.35
</
0.35
.<
0.35
<i
0.35
<strong
0.34
<span
0.34
(<
0.34
Activations Density 0.002%