INDEX
    Explanations

    expressions of pretense and claims of sincerity in discussions about morality and personal values

    New Auto-Interp
    Negative Logits
    æłª
    -0.16
    Ïĥια
    -0.15
    ÏĥοÏħ
    -0.15
    ForRow
    -0.14
    pras
    -0.14
    ilent
    -0.14
    Hierarchy
    -0.14
    irk
    -0.14
    üz
    -0.14
    adders
    -0.13
    POSITIVE LOGITS
    IJ
    0.16
    nya
    0.15
    oux
    0.14
    icht
    0.14
    âm
    0.14
    UGH
    0.14
    .BorderFactory
    0.14
    gon
    0.14
     Neck
    0.14
    ibr
    0.14
    Act Density 0.188%

    No Known Activations