INDEX
    Explanations

    phrases indicating possession and attributes

    New Auto-Interp
    Negative Logits
    aba
    -0.15
    oy
    -0.15
     Lil
    -0.14
     receipt
    -0.14
    und
    -0.14
    782
    -0.14
     experience
    -0.14
    ik
    -0.13
    nings
    -0.13
    TT
    -0.13
    POSITIVE LOGITS
    ataire
    0.16
    htag
    0.16
    untu
    0.16
    eldorf
    0.15
    oud
    0.15
    coli
    0.15
    essaging
    0.15
    ouz
    0.15
    orre
    0.15
    ouble
    0.15
    Act Density 0.313%

    No Known Activations