INDEX
    Explanations

    phrases related to political and social commentary, including terms like "cultural critique," "religious and racial categories," and "immigration sentiments."

    New Auto-Interp
    Negative Logits
     bordeaux
    -1.46
     Juf
    -1.43
     fluo
    -1.41
     franz
    -1.35
     casio
    -1.34
     lyon
    -1.32
     dises
    -1.30
     Châ
    -1.30
     canel
    -1.29
     levis
    -1.29
    POSITIVE LOGITS
     doesn
    0.85
     happens
    0.80
     seems
    0.80
     helps
    0.79
    It
    0.79
     enables
    0.79
     is
    0.79
     allows
    0.79
     does
    0.78
     represents
    0.78
    Act Density 0.232%

    No Known Activations