INDEX
    Explanations

    instances of authority figures and their interactions with subordinates

    New Auto-Interp
    Negative Logits
    主人
    -0.17
    мом
    -0.17
     ÑħозÑı
    -0.17
    utters
    -0.16
    klä
    -0.15
    geber
    -0.14
    icros
    -0.14
    лава
    -0.14
    mdir
    -0.14
    ála
    -0.14
    POSITIVE LOGITS
     subordinate
    0.28
     assistant
    0.27
     assistants
    0.26
     his
    0.26
     followers
    0.25
     associate
    0.25
     deputy
    0.25
    åī¯
    0.24
     team
    0.22
     deputies
    0.22
    Act Density 0.252%

    No Known Activations