INDEX
    Explanations

    words related to advocating for or supporting a particular issue or belief

    references to the concept of "cause."

    New Auto-Interp
    Negative Logits
    PDATE
    -0.79
    Ku
    -0.72
     Seym
    -0.68
    illet
    -0.67
    egu
    -0.66
     Seasons
    -0.65
    aeper
    -0.65
    awatts
    -0.64
    raph
    -0.63
     Leopard
    -0.63
    POSITIVE LOGITS
     cele
    1.37
    cause
    0.86
    way
    0.79
    ality
    0.79
     celeb
    0.76
    vier
    0.72
    facts
    0.71
    fare
    0.70
    forge
    0.70
    wagon
    0.69
    Act Density 0.028%

    No Known Activations