INDEX
    Explanations

    mentions of legal or criminal activities

    New Auto-Interp
    Negative Logits
    .","
    -0.57
     ."
    -0.56
    +.
    -0.56
    .</
    -0.55
    $.
    -0.53
    +(
    -0.52
    *.
    -0.51
    milo
    -0.51
     ..."
    -0.50
     [(
    -0.50
    POSITIVE LOGITS
     meanwhile
    0.58
    odore
    0.55
    resa
    0.54
     Canaver
    0.52
     GOODMAN
    0.50
     Chomsky
    0.45
     irony
    0.45
     transcript
    0.45
     HuffPost
    0.44
     Hopkins
    0.44
    Act Density 1.732%

    No Known Activations