INDEX
    Explanations

    mentions of specific individuals and entities, particularly in entertainment and sports contexts

    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.02
    2:0.08
    3:0.16
    4:0.39
    5:0.03
    6:0.05
    7:0.04
    8:0.05
    9:0.04
    10:0.05
    11:0.03
    Negative Logits
    iliated
    -1.87
    berus
    -1.65
     contrasting
    -1.65
    ioch
    -1.65
    orgetown
    -1.62
    untarily
    -1.62
    emale
    -1.60
    ellery
    -1.58
    ecycle
    -1.57
    elled
    -1.56
    POSITIVE LOGITS
     folks
    2.28
     dudes
    2.24
     nerds
    2.19
    oooo
    2.15
     dear
    2.06
     Stupid
    2.00
    fuck
    1.91
    HAHAHAHA
    1.90
     sucks
    1.87
     dude
    1.83
    Act Density 0.120%

    No Known Activations