INDEX
Explanations
the presence of indentation and formatting characters in code snippets
New Auto-Interp
Negative Logits
=[]
-0.60
way
-0.58
fine
-0.54
ERTA
-0.53
”
-0.52
ho
-0.52
<>
-0.52
’
-0.52
qu
-0.51
παρά
-0.51
POSITIVE LOGITS
1.24
1.12
0.97
0.96
0.92
0.91
tvguidetime
0.86
الحره
0.82
myſelf
0.81
0.81
Activations Density 1.447%