INDEX
    Explanations

    negative or challenging situations

    New Auto-Interp
    Negative Logits
    ased
    -0.67
    tein
    -0.67
    hai
    -0.64
    yan
    -0.61
    acht
    -0.60
    lass
    -0.60
    ase
    -0.60
    TW
    -0.59
    ocard
    -0.58
     Mant
    -0.57
    POSITIVE LOGITS
     any
    1.06
     slightest
    1.04
     anything
    1.03
     ever
    1.02
     anyone
    0.97
     anybody
    0.96
     whatsoever
    0.88
     anywhere
    0.86
     mention
    0.86
     remotely
    0.84
    Act Density 0.673%

    No Known Activations