INDEX
    Explanations

    phrases related to collective identity or shared experiences

    New Auto-Interp
    Negative Logits
     Kush
    -0.67
     gratification
    -0.65
     firsthand
    -0.64
     Cliff
    -0.63
     conflicts
    -0.62
     citation
    -0.59
     contradictions
    -0.58
     Authority
    -0.57
     stressing
    -0.57
     sectarian
    -0.56
    POSITIVE LOGITS
    athered
    1.33
    bsite
    1.28
    lder
    1.25
    arers
    1.20
    aving
    1.19
    asel
    1.19
    eding
    1.19
    avers
    1.17
    eps
    1.16
    eping
    1.15
    Act Density 0.080%

    No Known Activations