INDEX
Explanations
elements that suggest structure or organization, such as headers, bullet points, and function definitions
Mathematical or code notation
beginning of introductory phrases
New Auto-Interp
Negative Logits
-1.15
msgTypes
-0.97
ligiloj
-0.94
queſta
-0.93
surla
-0.90
ujednoznacz
-0.89
帖最后由
-0.88
<unused41>
-0.86
ſicht
-0.86
<unused8>
-0.86
POSITIVE LOGITS
s
0.45
↵↵
0.45
<eos>
0.40
="
0.34
1
0.33
[toxicity=0]
0.32
2
0.32
<strong>
0.31
is
0.31
’
0.31
Activations Density 0.025%