INDEX
    Explanations

    adjectives describing a specific type or quality

    phrases that describe types or categories of things

    New Auto-Interp
    Negative Logits
    å§«
    -0.89
    INA
    -0.76
    æ©
    -0.75
    heid
    -0.70
    orsi
    -0.70
    omer
    -0.69
    æķ
    -0.67
    æĸ
    -0.67
    milo
    -0.63
    åħī
    -0.63
    POSITIVE LOGITS
    liest
    0.86
    etting
    0.74
    etter
    0.73
     thing
    0.67
    linger
    0.65
    rouse
    0.64
     appro
    0.64
    natureconservancy
    0.63
     antidote
    0.62
    lihood
    0.58
    Act Density 0.036%

    No Known Activations