INDEX
    Explanations

    the word "sat" with varying activation values

    instances of the word "sat."

    New Auto-Interp
    Negative Logits
    ALLY
    -0.75
    escal
    -0.74
    credit
    -0.73
    enforcement
    -0.70
    ISO
    -0.69
    achev
    -0.68
    obs
    -0.65
    negative
    -0.64
    raid
    -0.64
    ever
    -0.64
    POSITIVE LOGITS
     seiz
    0.86
    ivas
    0.86
    anic
    0.82
    chers
    0.78
     nav
    0.74
    lie
    0.73
    anism
    0.73
    chel
    0.73
     toget
    0.71
    urn
    0.71
    Act Density 0.009%

    No Known Activations