INDEX
    Explanations

    words expressing a preference for a particular option over another

    instances of the word "Rather" indicating contrast or comparison

    New Auto-Interp
    Negative Logits
    mberg
    -0.65
    ammy
    -0.63
    amba
    -0.62
    DD
    -0.62
     MIL
    -0.62
     [+
    -0.61
    ORN
    -0.60
     saf
    -0.60
     championship
    -0.59
     Quake
    -0.59
    POSITIVE LOGITS
     Rather
    0.86
    tons
    0.84
    Rather
    0.83
    rather
    0.77
    tif
    0.77
     ado
    0.75
    swer
    0.75
    itably
    0.75
     Than
    0.71
    Instead
    0.70
    Act Density 0.004%

    No Known Activations