INDEX
    Explanations

    phrases with advice or instructions

    New Auto-Interp
    Negative Logits
     constit
    -0.63
    agra
    -0.60
    quin
    -0.59
    swick
    -0.59
    arial
    -0.57
    aced
    -0.56
     center
    -0.56
     prompted
    -0.56
     altogether
    -0.56
     intrig
    -0.55
    POSITIVE LOGITS
     beware
    1.11
     Beware
    1.03
    Always
    1.00
     Always
    0.99
     Avoid
    0.90
    Eat
    0.88
    Avoid
    0.86
     Keep
    0.85
     Stay
    0.84
    Keep
    0.83
    Act Density 0.385%

    No Known Activations