INDEX
    Explanations

    the word "instead" and its variations, indicating a preference for alternatives or shifts in perspective

    New Auto-Interp
    Negative Logits
     "));
    -0.78
    StatusOK
    -0.74
     Cahill
    -0.73
     Hollis
    -0.73
     Kass
    -0.69
    "));
    
    -0.67
    Bil
    -0.67
    er
    -0.67
     Rik
    -0.66
    `{.
    -0.66
    POSITIVE LOGITS
     Instead
    1.84
    Instead
    1.82
     instead
    1.78
    instead
    1.70
     Rather
    1.23
    uttosto
    1.22
    Rather
    1.17
     rather
    1.08
    tdessen
    1.05
     Statt
    1.04
    Act Density 0.180%

    No Known Activations