INDEX
    Explanations

    discussions about complex relationships and societal issues

    New Auto-Interp
    Negative Logits
    wich
    -0.15
    oleÄį
    -0.15
    ucs
    -0.14
    ะ
    -0.14
     concrete
    -0.14
    _KIND
    -0.14
    inh
    -0.14
    lit
    -0.13
     polished
    -0.13
     Lit
    -0.13
    POSITIVE LOGITS
     merely
    0.34
     mere
    0.28
     simplement
    0.28
    mere
    0.28
     simply
    0.25
     harmless
    0.21
     пÑĢоÑģÑĤо
    0.20
     Simply
    0.20
    åıªæĺ¯
    0.19
    Simply
    0.19
    Act Density 0.227%

    No Known Activations