INDEX
    Explanations

    concepts related to moral character and integrity

    New Auto-Interp
    Negative Logits
    argon
    -0.16
    uye
    -0.15
    ento
    -0.14
    addField
    -0.14
     dạng
    -0.14
    ootball
    -0.14
    elon
    -0.13
    engo
    -0.13
    ån
    -0.13
    oyer
    -0.13
    POSITIVE LOGITS
    antha
    0.18
    bal
    0.16
     ethical
    0.15
    orus
    0.15
    rin
    0.15
    mel
    0.15
    è»
    0.15
    lik
    0.14
    rop
    0.14
    ll
    0.14
    Act Density 0.169%

    No Known Activations