INDEX
    Explanations

    spoiler warnings in texts

    references to spoilers in the text

    New Auto-Interp
    Negative Logits
    llan
    -0.73
    trak
    -0.72
    orney
    -0.71
    undai
    -0.70
    kes
    -0.69
     Architects
    -0.68
     Palest
    -0.67
    kos
    -0.67
    Motion
    -0.67
    urat
    -0.66
    POSITIVE LOGITS
     spoiler
    0.95
     spoilers
    0.94
    OIL
    0.92
    OUS
    0.84
     spoil
    0.82
    ervative
    0.75
     spo
    0.74
    ific
    0.73
     alert
    0.73
    pedia
    0.72
    Act Density 0.077%

    No Known Activations