INDEX
    Explanations

    Slack or ack

    New Auto-Interp
    Negative Logits
    .record
    -0.07
     Dome
    -0.07
     birds
    -0.07
     Philosophy
    -0.07
     Photos
    -0.06
    izard
    -0.06
     diagn
    -0.06
    あった
    -0.06
    에서도
    -0.06
    .special
    -0.06
    POSITIVE LOGITS
     Slack
    0.12
     slack
    0.09
    slack
    0.08
     misleading
    0.07
     Lack
    0.06
     завер
    0.06
     Sloan
    0.06
     shuttle
    0.06
     зави
    0.06
    Wake
    0.06
    Act Density 0.001%

    No Known Activations