INDEX
    Explanations

    instances of the word "working" followed by different context

    New Auto-Interp
    Negative Logits
    antha
    -0.79
     Ukrain
    -0.77
     Bubble
    -0.72
     Flavoring
    -0.69
     constitu
    -0.68
     Augustus
    -0.67
    Ń·
    -0.67
    anamo
    -0.67
    ylon
    -0.66
    emonic
    -0.66
    POSITIVE LOGITS
    bench
    1.21
     ethic
    1.20
    aday
    1.09
    station
    1.09
    flows
    1.08
    hops
    1.07
    horse
    1.01
    forces
    1.00
     overtime
    0.98
    heet
    0.97
    Act Density 3.156%

    No Known Activations