INDEX
    Explanations

    mentions of prominent individuals or celebrities, particularly in contexts related to social issues or challenges they face

    New Auto-Interp
    Negative Logits
    upert
    -0.07
    ernet
    -0.07
    achten
    -0.07
    irk
    -0.06
    uning
    -0.06
    Wake
    -0.06
    á»Ń
    -0.06
    vid
    -0.06
    aucoup
    -0.06
     Hav
    -0.06
    POSITIVE LOGITS
    -ves
    0.07
    elter
    0.06
    gerald
    0.06
    LK
    0.06
    icrous
    0.06
    ë¡ł
    0.06
     çģ
    0.06
    arian
    0.06
     Feinstein
    0.06
    ycle
    0.05
    Act Density 0.039%

    No Known Activations