INDEX
    Explanations

    well- descriptive adjectives

    New Auto-Interp
    Negative Logits
    ен
    1.63
    et
    1.60
    eau
    1.37
    𝐢
    1.37
    eu
    1.35
    wikkel
    1.26
    ecer
    1.25
    য়ের
    1.25
    იკ
    1.24
    1.23
    POSITIVE LOGITS
     menengah
    1.30
    ities
    1.29
     beho
    1.26
    ل
    1.19
     acclaim
    1.18
    ^{-}
    1.18
    l
    1.18
    fv
    1.16
    一群
    1.16
    1.15
    Act Density 0.167%

    No Known Activations