INDEX
    Explanations

    the presence of specific chemical symbols or abbreviations related to scientific contexts

    New Auto-Interp
    Negative Logits
    u
    -0.77
    ا
    -0.76
    y
    -0.76
    in
    -0.73
    is
    -0.68
    as
    -0.66
    ו
    -0.62
    an
    -0.61
    l
    -0.61
    at
    -0.60
    POSITIVE LOGITS
    parsedMessage
    1.13
     purpoſe
    0.90
    featureID
    0.88
    <bos>
    0.85
     houſe
    0.82
    <unused43>
    0.81
    <pad>
    0.80
    <unused41>
    0.79
    <unused17>
    0.79
    <unused23>
    0.79
    Act Density 2.259%

    No Known Activations