INDEX
Explanations
the beginning of new sections or topics indicated by specific tokens
New Auto-Interp
Negative Logits
فريبيس
-1.10
Tikang
-1.02
المعيارى
-1.01
AssemblyTitle
-0.99
MessageBoxIcon
-0.98
NUMX
-0.98
gawas
-0.95
lenker
-0.94
parsedMessage
-0.92
complexContent
-0.92
POSITIVE LOGITS
*
1.15
*
1.02
*.
0.76
<blockquote>
0.76
*)
0.74
))*
0.72
[toxicity=0]
0.70
*,
0.70
*:
0.69
<eos>
0.68
Activations Density 0.089%