INDEX
    Explanations

    phrases indicating a choice or preference

    choices or decisions along with associated preferences

    New Auto-Interp
    Negative Logits
    ankind
    -0.76
    ipples
    -0.63
     latent
    -0.58
    ONY
    -0.57
     disturbances
    -0.56
     appre
    -0.56
    vind
    -0.55
    pires
    -0.55
     glimps
    -0.55
     incidents
    -0.55
    POSITIVE LOGITS
     instead
    1.46
    instead
    1.22
     Instead
    1.12
    Instead
    1.10
     because
    1.05
     option
    1.04
     rather
    1.03
     alternatives
    1.00
     lest
    0.94
     anyways
    0.92
    Act Density 0.565%

    No Known Activations