INDEX
    Explanations

    references to perspectives and observations on systemic issues

    New Auto-Interp
    Negative Logits
    essen
    -0.19
    ativ
    -0.16
    ffd
    -0.15
    umat
    -0.15
    bage
    -0.15
    lich
    -0.15
    ureka
    -0.15
    olio
    -0.14
    inals
    -0.14
    als
    -0.14
    POSITIVE LOGITS
     merely
    0.28
     simply
    0.20
    åıªæĺ¯
    0.18
    -selector
    0.17
    à¹ģà¸Ħ
    0.17
     instead
    0.17
    nor
    0.16
     juste
    0.16
     just
    0.16
     nor
    0.16
    Act Density 0.097%

    No Known Activations