INDEX
    Explanations

    references to the word "ups" with varying activations

    references to group activities or gatherings

    New Auto-Interp
    Negative Logits
     Boxing
    -0.69
     Caribbean
    -0.69
     Reconstruction
    -0.69
    SPONSORED
    -0.66
     SAM
    -0.65
     FORM
    -0.65
     Ruler
    -0.64
     Revolutionary
    -0.64
     Bah
    -0.62
     bis
    -0.62
    POSITIVE LOGITS
    dates
    1.16
    oons
    1.12
    etting
    1.11
    etts
    1.10
    ups
    1.07
    etter
    1.06
    poons
    1.02
    olicy
    0.99
    uits
    0.97
    icult
    0.95
    Act Density 0.007%

    No Known Activations