INDEX
    Explanations

    mathematical symbols and expressions related to formal logic or quantifiers

    mathematical notation and logic symbols

    New Auto-Interp
    Negative Logits
    <unused68>
    -1.09
    <pad>
    -1.09
    [@BOS@]
    -1.08
    <unused3>
    -1.08
    <unused23>
    -1.08
    <unused28>
    -1.08
    <unused17>
    -1.08
    <unused16>
    -1.08
    <unused8>
    -1.08
    <unused14>
    -1.08
    POSITIVE LOGITS
     (
    0.39
     $
    0.33
     false
    0.31
    <eos>
    0.29
     f
    0.29
     $\
    0.29
     !
    0.29
     x
    0.28
     h
    0.28
     S
    0.28
    Act Density 0.783%

    No Known Activations