INDEX
    Explanations

    words related to authority and supervision

    New Auto-Interp
    Negative Logits
    onec
    -0.19
    ÄĻk
    -0.17
    segue
    -0.16
    üs
    -0.16
     Hayes
    -0.16
    zes
    -0.15
    ypad
    -0.15
    ture
    -0.15
    bilt
    -0.15
    ü
    -0.15
    POSITIVE LOGITS
     sup
    0.25
    posed
    0.20
    posing
    0.19
     Sup
    0.19
    stit
    0.18
    à¹Ģà¸Ľà¸Ńร
    0.18
    pling
    0.17
    lub
    0.16
    reme
    0.16
     خاÙħ
    0.16
    Act Density 0.011%

    No Known Activations