INDEX
    Explanations

    substrings that match particular patterns or structures within words

    New Auto-Interp
    Negative Logits
    å²³
    -0.18
    owed
    -0.17
    afari
    -0.16
    enerator
    -0.16
    classifier
    -0.15
    esel
    -0.15
    oodle
    -0.15
    ipi
    -0.15
    edl
    -0.15
    ibar
    -0.15
    POSITIVE LOGITS
    uncated
    0.19
    preneur
    0.18
    ondheim
    0.17
    ivial
    0.17
    insic
    0.17
    actions
    0.16
    uss
    0.16
    IBUTE
    0.16
    ong
    0.16
    actors
    0.16
    Act Density 0.067%

    No Known Activations