INDEX
    Explanations

    references to socioeconomic classes, particularly the middle class

    New Auto-Interp
    Negative Logits
     Warhol
    -0.72
     françaises
    -0.72
     torchvision
    -0.71
     Réponses
    -0.71
     pitié
    -0.71
    enoord
    -0.71
    :✨
    -0.70
    NamedQueries
    -0.70
    uresti
    -0.70
    Renault
    -0.68
    POSITIVE LOGITS
     Middle
    1.66
     MIDDLE
    1.56
    Middle
    1.55
     Middel
    1.49
     middle
    1.41
    MIDDLE
    1.38
    middle
    1.38
     Middles
    1.29
    middlewares
    1.10
     Middleware
    1.08
    Act Density 0.061%

    No Known Activations