INDEX
    Explanations

    the word "for" in various contexts

    New Auto-Interp
    Negative Logits
    orthy
    -0.81
    rete
    -0.79
    oS
    -0.79
    dar
    -0.78
    zu
    -0.74
    rets
    -0.71
    yn
    -0.69
    irl
    -0.68
    shi
    -0.67
    osterone
    -0.66
    POSITIVE LOGITS
    etheless
    1.01
     nonetheless
    0.99
     reality
    0.86
     nevertheless
    0.78
     sheer
    0.77
     hindsight
    0.73
     actual
    0.71
     Garg
    0.70
     retrospect
    0.67
     Builder
    0.63
    Act Density 0.153%

    No Known Activations