INDEX
    Explanations

    words related to power dynamics or authority

    New Auto-Interp
    Negative Logits
    ume
    -0.16
    лÑĭ
    -0.15
    iren
    -0.15
    ForObject
    -0.14
    zc
    -0.14
    osc
    -0.14
    ayers
    -0.14
    ventory
    -0.14
    ALA
    -0.14
    mapper
    -0.14
    POSITIVE LOGITS
     Sez
    0.16
    Ø´ÙħارÛĮ
    0.16
    .libs
    0.14
    plib
    0.14
    ÑģÑĤÑĢи
    0.14
     Orr
    0.14
    elight
    0.14
    heat
    0.13
    lesi
    0.13
    ãĥ¼ãĥĵ
    0.13
    Act Density 0.007%

    No Known Activations