INDEX
Explanations
instances of code syntax and special characters related to programming
New Auto-Interp
Negative Logits
ViewFeatures
-0.79
•••
-0.77
er
-0.74
++++++++++++++++
-0.71
Ratna
-0.69
судар
-0.68
AsUp
-0.68
••••
-0.65
Beatty
-0.64
Kruse
-0.62
POSITIVE LOGITS
:`
1.29
.`
1.26
=`
1.24
>`
1.15
(`
1.13
)`
1.10
]`
1.05
(`
1.04
{`1.04
})`
1.03
Activations Density 0.330%