INDEX
    Explanations

    words related to updates, news, and progress

    New Auto-Interp
    Negative Logits
    onz
    -0.66
    bilt
    -0.63
    sted
    -0.62
    oso
    -0.61
    hedon
    -0.60
     chains
    -0.60
    iard
    -0.59
    sha
    -0.58
    toe
    -0.58
     corps
    -0.58
    POSITIVE LOGITS
     happening
    1.28
     transpired
    1.04
     happ
    0.99
     happened
    0.95
     occurring
    0.84
     happen
    0.81
     happens
    0.79
     done
    0.78
     Happ
    0.76
     bothering
    0.74
    Act Density 0.586%

    No Known Activations