INDEX
Explanations
sequences of characters resembling mathematical or programming syntax
New Auto-Interp
Negative Logits
u
-0.36
l
-0.35
t
-0.29
p
-0.23
lint
-0.22
uw
-0.22
lil
-0.21
la
-0.21
uC
-0.20
lac
-0.20
POSITIVE LOGITS
ubits
0.32
ubit
0.32
eu
0.27
eer
0.26
uries
0.25
oq
0.25
rcode
0.24
e
0.23
eel
0.22
o
0.21
Activations Density 0.034%