INDEX
    Explanations

    references to spoilers in discussions about TV shows or movies

    New Auto-Interp
    Negative Logits
    ³
    -0.14
    apult
    -0.14
    ffa
    -0.14
    .espresso
    -0.13
    egas
    -0.13
    shoot
    -0.13
    enberg
    -0.13
    çĭ
    -0.13
    rians
    -0.13
     flakes
    -0.12
    POSITIVE LOGITS
     Spo
    0.67
     spoil
    0.63
     spoiler
    0.60
    spo
    0.60
     spoilers
    0.60
    Spo
    0.59
     spo
    0.59
     spoiled
    0.52
    Spoiler
    0.45
     spol
    0.37
    Act Density 0.048%

    No Known Activations