INDEX
    Explanations

    instances of demonstrative pronouns and adjectives

    New Auto-Interp
    Negative Logits
    s
    -0.19
    er
    -0.16
     det
    -0.16
    orman
    -0.16
    uel
    -0.16
    ole
    -0.15
    ous
    -0.15
    r
    -0.15
    ont
    -0.15
    arta
    -0.15
    POSITIVE LOGITS
    maal
    0.17
    że
    0.17
    ParameterValue
    0.17
    gre
    0.16
    ATRIX
    0.16
    ìłĢ
    0.16
    rale
    0.16
    avir
    0.15
    .openg
    0.14
    nect
    0.14
    Act Density 0.031%

    No Known Activations