INDEX
Explanations
tokens related to variable names and mathematical symbols
New Auto-Interp
Negative Logits
purpoſe
-1.10
myſelf
-1.04
pleaſure
-1.01
faſt
-1.00
ſeveral
-1.00
Monfieur
-0.96
themſelves
-0.95
ſever
-0.92
Beſ
-0.91
ſame
-0.91
POSITIVE LOGITS
Z
1.92
Z
1.67
z
1.62
z
1.17
getZ
1.04
X
0.94
l
0.91
K
0.91
S
0.88
0.88
Activations Density 0.113%