INDEX
    Explanations

    adjectives that denote suitability or perfection for specific purposes

    New Auto-Interp
    Negative Logits
    AWN
    -0.15
    ouri
    -0.15
    ridor
    -0.15
    OWER
    -0.15
    _encoded
    -0.14
    ẩu
    -0.14
    imento
    -0.14
    istar
    -0.13
    elib
    -0.13
    inh
    -0.13
    POSITIVE LOGITS
     for
    0.17
    642
    0.13
    675
    0.13
    215
    0.13
     dla
    0.13
    adle
    0.13
     ashamed
    0.13
    .shell
    0.13
     für
    0.13
    ler
    0.13
    Act Density 0.078%

    No Known Activations