INDEX
    Explanations

    characteristics and features of items or entities, especially in a descriptive or analytical context

    New Auto-Interp
    Negative Logits
    axe
    -0.20
    ers
    -0.19
    ott
    -0.16
    iej
    -0.16
    ensch
    -0.16
    owl
    -0.16
    agne
    -0.16
    omb
    -0.15
    avi
    -0.15
    emb
    -0.15
    POSITIVE LOGITS
    s
    0.38
    t
    0.34
    tica
    0.32
    nge
    0.29
    ska
    0.28
    ï¸ı
    0.28
    tte
    0.28
    tal
    0.27
    sar
    0.27
    e
    0.27
    Act Density 0.052%

    No Known Activations