INDEX
    Explanations

    references to content warnings and spoilers in texts

    New Auto-Interp
    Negative Logits
    };*/
    -0.71
    .";
    
    -0.68
    LikeLike
    -0.67
     дописавши
    -0.67
    %");
    -0.67
    =*/
    -0.65
     }}$}
    -0.65
    ]));
    
    -0.64
    ()")
    -0.64
    $")
    -0.63
    POSITIVE LOGITS
     spoiler
    1.19
    Spoiler
    1.13
     spoilers
    1.12
     Spoilers
    1.09
     Spoiler
    1.08
    SPOILER
    1.05
     SPOILERS
    0.99
     SPOILER
    0.97
    spoiler
    0.93
    ネタバレ
    0.86
    Act Density 0.195%

    No Known Activations