INDEX
    Explanations

    phrases emphasizing collective actions or perspectives over individual ones

    New Auto-Interp
    Negative Logits
     Irwin
    -0.79
    imura
    -0.71
    itor
    -0.71
    hillary
    -0.69
     Saving
    -0.69
     Karin
    -0.67
     Fitzgerald
    -0.67
    irlf
    -0.64
     GOODMAN
    -0.64
     Irvine
    -0.63
    POSITIVE LOGITS
     discrete
    1.16
     gradual
    1.00
     staggered
    0.99
     epis
    0.98
     abstract
    0.93
     continuous
    0.93
     fragmented
    0.93
     linear
    0.93
     binary
    0.92
     sequential
    0.91
    Act Density 1.076%

    No Known Activations