INDEX
    Explanations

    Positive adjectives

    New Auto-Interp
    Negative Logits
    [z
    -0.07
    ังไม
    -0.06
    _est
    -0.06
     Eis
    -0.06
     Zero
    -0.06
    ]]↵↵
    -0.06
    ForKey
    -0.06
    ;y
    -0.06
    	Integer
    -0.06
     ei
    -0.06
    POSITIVE LOGITS
    lucent
    0.09
     lovely
    0.09
     Modern
    0.07
     wonderful
    0.07
     Lovely
    0.07
    later
    0.07
    _model
    0.07
    .sf
    0.07
    VERY
    0.07
    ABEL
    0.07
    Act Density 0.010%

    No Known Activations