INDEX
    Explanations

    words that suggest high quality or suitability for a specific purpose

    New Auto-Interp
    Negative Logits
    anzi
    -0.16
    ombat
    -0.15
     courtesy
    -0.15
    iggins
    -0.14
    embali
    -0.14
    ắng
    -0.14
    abei
    -0.14
    inning
    -0.14
    582
    -0.14
    essaging
    -0.14
    POSITIVE LOGITS
     choice
    0.24
     whether
    0.21
     for
    0.19
    choice
    0.19
     addition
    0.18
    whether
    0.17
     WHETHER
    0.17
     when
    0.17
     choices
    0.17
    ç͍äºİ
    0.17
    Act Density 0.075%

    No Known Activations