INDEX
    Explanations

    references to people in positions of authority or organizational leadership

    New Auto-Interp
    Negative Logits
     FW
    -0.14
    oload
    -0.14
    eg
    -0.13
    ůr
    -0.13
    ãĥ³ãĤ°
    -0.13
     Schneider
    -0.13
     «
    -0.13
    uthor
    -0.13
    alet
    -0.13
    elan
    -0.13
    POSITIVE LOGITS
    GINE
    0.13
    ìļ°ìĬ¤
    0.13
    kla
    0.13
    hots
    0.13
     dem
    0.13
     console
    0.12
    .less
    0.12
     akıl
    0.12
     ê²ĥìĿ´ëĭ¤
    0.12
    EE
    0.12
    Act Density 0.089%

    No Known Activations