INDEX
Explanations
special characters and formatting
New Auto-Interp
Negative Logits
n
0.51
I
0.47
`
0.47
u
0.47
Like
0.46
'
0.46
\
0.45
U
0.45
Alex
0.45
i
0.45
POSITIVE LOGITS
<0x99>
0.55
adhipp
0.53
<0xAC>
0.52
<0x92>
0.52
<0x97>
0.52
<0x9B>
0.52
GALAD
0.51
<0x91>
0.50
<0x95>
0.50
<0xAB>
0.49
Activations Density 0.001%