INDEX
    Explanations

    contrary statements about potential outcomes

    New Auto-Interp
    Negative Logits
    iddle
    -0.63
    raft
    -0.62
    iling
    -0.59
    ding
    -0.58
    ature
    -0.58
     Enhance
    -0.58
    rift
    -0.57
    ishing
    -0.55
    anwhile
    -0.54
     guiName
    -0.53
    POSITIVE LOGITS
     liked
    1.12
     been
    1.06
     gotten
    1.02
     benefited
    0.99
    been
    0.98
     preferred
    0.96
     fared
    0.94
     gladly
    0.91
     gone
    0.85
     avoided
    0.85
    Act Density 0.054%

    No Known Activations