INDEX
    Explanations

    evaluative and promotional language about products or experiences

    New Auto-Interp
    Negative Logits
     <<<<<<<<<<<<<<
    -0.61
     fucking
    -0.57
    fucking
    -0.54
    UnusedPrivate
    -0.52
     FUCKING
    -0.52
    fuck
    -0.51
     raped
    -0.50
    Fucking
    -0.49
     retarded
    -0.49
     stupid
    -0.48
    POSITIVE LOGITS
     summertime
    0.57
     festive
    0.54
     sizzling
    0.52
    letoe
    0.51
     holiday
    0.50
     frosty
    0.50
     ruff
    0.49
    BibitemShut
    0.49
     roars
    0.48
     garantiert
    0.47
    Act Density 0.459%

    No Known Activations