INDEX
Explanations
technical jargon and specific formatting elements common in code or mathematical notation
New Auto-Interp
Negative Logits
'\\;'
-0.95
IsContent
-0.82
myſelf
-0.81
itſelf
-0.79
"}";
-0.76
الحره
-0.76
Jefus
-0.75
(;;)
-0.73
Efq
-0.72
مشين
-0.70
POSITIVE LOGITS
<b>
1.48
<strong>
1.29
//
1.12
**
0.83
mathbf
0.82
<!--
0.70
//
0.69
**
0.64
boldsymbol
0.61
{\0.54
Activations Density 0.544%