INDEX
    Explanations

    words related to positions, roles, and active participation in various contexts

    New Auto-Interp
    Negative Logits
     hol
    -0.19
    zew
    -0.17
    ittest
    -0.16
    hol
    -0.15
    APPER
    -0.15
    orris
    -0.15
    iga
    -0.15
    zp
    -0.14
    aris
    -0.14
    ouser
    -0.14
    POSITIVE LOGITS
    urate
    0.16
    ervised
    0.15
    interop
    0.15
     flows
    0.14
    é¼ĵ
    0.14
    letics
    0.14
    лиÑĪ
    0.14
    flows
    0.14
    夫
    0.14
    Translate
    0.14
    Act Density 0.002%

    No Known Activations