INDEX
Explanations
punctuation marks and syntactic elements in code
New Auto-Interp
Negative Logits
hea
-0.17
trib
-0.16
verage
-0.15
enza
-0.15
attering
-0.14
ayan
-0.14
HE
-0.14
mons
-0.14
attern
-0.14
ypad
-0.13
POSITIVE LOGITS
лаÑĪ
0.15
ulton
0.15
fi
0.14
ford
0.14
Coalition
0.14
alaxy
0.14
eful
0.14
rozen
0.14
ocker
0.14
дам
0.14
Activations Density 0.331%