INDEX
    Explanations

    mentions of car brands, particularly luxury brands like Mercedes and Adidas

    New Auto-Interp
    Negative Logits
    resa
    -0.16
    _handling
    -0.15
    egal
    -0.14
    eb
    -0.14
    RESH
    -0.14
    owied
    -0.14
    robat
    -0.13
    оÑĩка
    -0.13
    aks
    -0.13
    eg
    -0.13
    POSITIVE LOGITS
    -Benz
    0.18
     Bieber
    0.16
    inde
    0.15
    hausen
    0.15
    parer
    0.15
    γη
    0.15
    roller
    0.14
    warts
    0.14
    Ñģий
    0.14
    -dir
    0.13
    Act Density 0.001%

    No Known Activations