INDEX
    Explanations

    spoilers in a text

    phrases that indicate the presence of spoilers in a text

    New Auto-Interp
    Negative Logits
    kt
    -0.73
    ndra
    -0.73
    urat
    -0.69
    rique
    -0.69
    leanor
    -0.68
    trak
    -0.67
    ŃĶ
    -0.67
    riad
    -0.67
    ton
    -0.66
    ¬¼
    -0.66
    POSITIVE LOGITS
     spoilers
    1.17
     spoiler
    1.16
    oiler
    1.01
     Spoiler
    0.95
    OIL
    0.94
     spo
    0.93
     spoil
    0.92
    ervative
    0.83
     Collider
    0.75
    ":""},{"
    0.75
    Act Density 0.031%

    No Known Activations