INDEX
    Explanations

    phrases that emphasize a sense of universality or absolute statements

    New Auto-Interp
    Negative Logits
    er
    -0.65
    mino
    -0.57
    []);
    -0.54
    ylde
    -0.54
    tanque
    -0.53
    βο
    -0.53
    LAR
    -0.53
    stalker
    -0.52
    feri
    -0.52
    ’).
    -0.50
    POSITIVE LOGITS
     else
    1.28
     Everything
    1.27
    Everything
    1.27
     everything
    1.25
    everything
    1.24
     EVERYTHING
    1.06
    THING
    1.01
     Tudo
    0.99
     Tutto
    0.91
    Tudo
    0.89
    Act Density 0.045%

    No Known Activations