INDEX
    Explanations

    verbs expressing obligation, advice, or expectation

    New Auto-Interp
    Negative Logits
     Fra
    -0.77
     Wid
    -0.71
     Hilbert
    -0.67
     Wolfgang
    -0.61
     maze
    -0.59
     Afgh
    -0.59
     Kah
    -0.59
     FW
    -0.58
     Cir
    -0.58
     WI
    -0.58
    POSITIVE LOGITS
     beware
    1.03
    ered
    1.01
     be
    0.90
     ideally
    0.88
    nt
    0.87
    n
    0.85
     strive
    0.84
     aspire
    0.83
     reconsider
    0.83
    ering
    0.82
    Act Density 3.187%

    No Known Activations