INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    rium
    -0.79
    tools
    -0.68
    meric
    -0.68
    atever
    -0.66
    ilitarian
    -0.65
    avorite
    -0.63
    pering
    -0.62
    iland
    -0.61
    abouts
    -0.59
    cot
    -0.59
    POSITIVE LOGITS
    Published
    0.75
     Tue
    0.69
     POST
    0.69
     Publication
    0.68
    Tue
    0.65
     October
    0.63
    INGTON
    0.63
     July
    0.63
     Thu
    0.63
     VIDEOS
    0.61
    Act Density 0.029%

    No Known Activations