INDEX
    Explanations

    phrases and expressions related to critique and moral reasoning

    New Auto-Interp
    Negative Logits
    en
    -0.18
     Parl
    -0.15
    ë§ī
    -0.15
    äft
    -0.15
    zan
    -0.14
    isay
    -0.14
    vore
    -0.14
    iset
    -0.13
    onom
    -0.13
    iken
    -0.13
    POSITIVE LOGITS
    าà¸ĺ
    0.16
    .accounts
    0.15
    -cookie
    0.15
    trieve
    0.15
    umed
    0.15
    PUR
    0.14
    PLIED
    0.14
     orth
    0.14
    ãĤ¤ãĥĪ
    0.14
    .builders
    0.14
    Act Density 0.127%

    No Known Activations