INDEX
    Explanations

    references to spoilers in various contexts

    New Auto-Interp
    Negative Logits
    agi
    -0.16
    ument
    -0.16
    mae
    -0.15
    inine
    -0.15
    mente
    -0.14
    elic
    -0.14
    patial
    -0.14
     Koch
    -0.14
    /INFO
    -0.14
    ELY
    -0.14
    POSITIVE LOGITS
     spo
    0.30
     Spo
    0.26
    Spo
    0.25
    spo
    0.22
    ilers
    0.21
     spoil
    0.21
    ilt
    0.21
    orth
    0.19
     spoiler
    0.18
    elman
    0.18
    Act Density 0.007%

    No Known Activations