INDEX
    Explanations

    terms related to superiority and authority

    New Auto-Interp
    Negative Logits
    toa
    -0.16
    ÄĻk
    -0.16
    ü
    -0.15
    ÏĮ
    -0.15
    øy
    -0.15
    dehyde
    -0.15
    zes
    -0.15
    blas
    -0.15
    tar
    -0.14
    blem
    -0.14
    POSITIVE LOGITS
     sup
    0.23
    erv
    0.22
    posing
    0.21
    posed
    0.21
    erville
    0.20
    erc
    0.19
    à¹Ģà¸Ľà¸Ńร
    0.19
    ervisor
    0.19
     Sup
    0.19
    erset
    0.18
    Act Density 0.013%

    No Known Activations