INDEX
    Explanations

    high-ranking political and leadership positions

    New Auto-Interp
    Head Attr Weights
    0:0.06
    1:0.01
    2:0.08
    3:0.07
    4:0.20
    5:0.03
    6:0.21
    7:0.07
    8:0.05
    9:0.04
    10:0.06
    11:0.07
    Negative Logits
     Dip
    -1.49
    aan
    -1.43
    amy
    -1.41
     fingert
    -1.39
    osal
    -1.38
     MPH
    -1.36
    ocent
    -1.35
     palm
    -1.35
    romy
    -1.34
    zers
    -1.34
    POSITIVE LOGITS
     publishes
    1.42
     refres
    1.41
     unlocks
    1.35
     promptly
    1.34
    の魔
    1.32
    rified
    1.30
    ]),
    1.30
    ibly
    1.30
    holiday
    1.30
     crashed
    1.29
    Act Density 0.001%

    No Known Activations