INDEX
    Explanations

    instances where actions or decisions are taken in place of other actions or decisions

    New Auto-Interp
    Negative Logits
    artisan
    -0.56
    Nap
    -0.54
    read
    -0.54
    Vers
    -0.54
     Palestin
    -0.53
    marine
    -0.51
    anded
    -0.50
    essen
    -0.50
    Guest
    -0.49
    STAT
    -0.49
    POSITIVE LOGITS
     quitting
    0.56
     rever
    0.56
     being
    0.55
     fixing
    0.53
     dwelling
    0.53
     wasting
    0.53
     clock
    0.53
     retiring
    0.52
     anger
    0.52
     anything
    0.52
    Act Density 13.812%

    No Known Activations