INDEX
    Explanations

    phrases discussing moral or judgmental assessments

    New Auto-Interp
    Negative Logits
    cci
    -0.15
     Briggs
    -0.15
    aint
    -0.14
    del
    -0.14
    usra
    -0.14
    uf
    -0.14
    aget
    -0.14
    shi
    -0.14
     Keystone
    -0.13
     lou
    -0.13
    POSITIVE LOGITS
    cis
    0.15
    erb
    0.15
    hoo
    0.15
    ìĸ´ì§Ħ
    0.15
    plies
    0.15
    pedo
    0.15
    .elapsed
    0.14
    favor
    0.14
    rupa
    0.14
    routing
    0.14
    Act Density 0.250%

    No Known Activations