INDEX
    Explanations

    nouns and their associated forms

    New Auto-Interp
    Negative Logits
     Flag
    -0.17
    ëĭ´
    -0.16
    ohn
    -0.15
    áno
    -0.15
     Crest
    -0.14
    odel
    -0.14
     bpp
    -0.14
    igar
    -0.14
    arten
    -0.14
    idel
    -0.13
    POSITIVE LOGITS
    _NATIVE
    0.15
     coll
    0.15
    enza
    0.14
    .binary
    0.14
    /big
    0.14
     casting
    0.14
    oll
    0.14
    doll
    0.14
    енз
    0.13
    los
    0.13
    Act Density 0.034%

    No Known Activations