INDEX
    Explanations

    words related to functionality or operation

    instances of the word "work" in various contexts

    New Auto-Interp
    Negative Logits
    ilings
    -0.77
    rition
    -0.74
    sbm
    -0.72
     Flavoring
    -0.69
    ildo
    -0.68
    gart
    -0.67
    anamo
    -0.64
    aez
    -0.64
     Gamble
    -0.63
    ewitness
    -0.62
    POSITIVE LOGITS
    flows
    1.11
     flaw
    1.06
    heet
    1.02
     correctly
    1.01
     seamlessly
    1.00
     reliably
    1.00
     properly
    0.97
    paces
    0.94
     smoothly
    0.93
     offline
    0.93
    Act Density 0.074%

    No Known Activations