INDEX
    Explanations

    proper nouns, specifically names

    New Auto-Interp
    Negative Logits
    /Internal
    -0.15
    esub
    -0.15
    nan
    -0.14
    ipay
    -0.14
    ungal
    -0.14
    yx
    -0.14
    issen
    -0.14
    silver
    -0.13
    MouseButton
    -0.13
     anomal
    -0.13
    POSITIVE LOGITS
    IVEN
    0.19
    aux
    0.15
    ervation
    0.14
    iven
    0.14
    Anywhere
    0.14
    ur
    0.14
    urname
    0.14
    trap
    0.14
    thing
    0.14
    able
    0.14
    Act Density 0.001%

    No Known Activations