INDEX
    Explanations

    words and phrases related to adjectives and their grammatical roles

    New Auto-Interp
    Negative Logits
    aro
    -0.20
    aghan
    -0.18
    izon
    -0.15
    ussen
    -0.15
     Olsen
    -0.15
    fen
    -0.14
    hardt
    -0.14
    mazon
    -0.14
     Amer
    -0.14
     Mate
    -0.14
    POSITIVE LOGITS
    igham
    0.17
    ADIUS
    0.17
    eps
    0.16
    akk
    0.15
    cape
    0.15
    Wiki
    0.15
    andom
    0.15
    ivec
    0.14
    ÃĹ↵↵
    0.14
    .sponge
    0.14
    Act Density 0.003%

    No Known Activations