INDEX
    Explanations

    references to the middle class and associated concepts

    New Auto-Interp
    Negative Logits
    :✨
    -0.80
     françaises
    -0.77
     Warhol
    -0.73
    enoord
    -0.71
     torchvision
    -0.70
    bewah
    -0.69
    HNO
    -0.69
     Harrell
    -0.67
     Nantucket
    -0.65
     tph
    -0.65
    POSITIVE LOGITS
     Middel
    1.45
     Middle
    1.43
     MIDDLE
    1.37
    Middle
    1.33
     middle
    1.29
     Middles
    1.20
     MID
    1.20
    MIDDLE
    1.20
    middle
    1.19
     Mid
    1.10
    Act Density 0.080%

    No Known Activations