INDEX
    Explanations

    instances of betrayal and moral dilemmas

    New Auto-Interp
    Negative Logits
    862
    -0.16
    kov
    -0.15
    -lite
    -0.15
    imiter
    -0.15
    lder
    -0.14
    ambre
    -0.14
     entitlement
    -0.14
    iali
    -0.14
    rastructure
    -0.14
    Wunused
    -0.14
    POSITIVE LOGITS
     passion
    0.17
     pet
    0.17
     passions
    0.17
    il
    0.17
     Formal
    0.16
     pinch
    0.16
     Juda
    0.15
     cab
    0.15
     unnatural
    0.15
     import
    0.15
    Act Density 0.406%

    No Known Activations