INDEX
Explanations
syntactical structures and formatting in mathematical or technical expressions
New Auto-Interp
Negative Logits
X
-0.60
Te
-0.57
M
-0.57
s
-0.57
之
-0.53
Man
-0.53
đ
-0.53
x
-0.52
\
-0.52
S
-0.52
POSITIVE LOGITS
purpoſe
1.19
myſelf
1.16
raiſ
1.15
ſelves
1.14
whoſe
1.12
pleaſure
1.11
ſtill
1.10
ſelf
1.10
themſelves
1.09
juſt
1.07
Activations Density 0.774%