INDEX
    Explanations

    mathematical expressions and symbols

    New Auto-Interp
    Negative Logits
     queſta
    -1.24
    featureID
    -1.18
    ChildScrollView
    -1.13
    transQ
    -1.09
    <unused43>
    -1.08
    <unused41>
    -1.07
    <unused42>
    -1.07
    <unused14>
    -1.07
    <unused74>
    -1.07
    <pad>
    -1.06
    POSITIVE LOGITS
     the
    0.72
     an
    0.53
     a
    0.48
    the
    0.44
     its
    0.39
     their
    0.38
     his
    0.38
    The
    0.36
     The
    0.36
     THE
    0.35
    Act Density 0.620%

    No Known Activations