INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    eers
    -0.87
    states
    -0.72
    ered
    -0.71
     quo
    -0.70
    eer
    -0.68
     Hots
    -0.66
    manship
    -0.65
    SHIP
    -0.64
    rers
    -0.63
     trumpet
    -0.60
    POSITIVE LOGITS
    udge
    1.22
    acker
    1.18
    atton
    1.16
    anded
    1.14
    umpy
    1.12
    acket
    1.12
    acking
    1.10
    abbit
    1.10
    aternity
    1.08
    agnar
    1.07
    Act Density 1.712%

    No Known Activations