INDEX
    Explanations

    phrases expressing requests or demands

    New Auto-Interp
    Negative Logits
    Lma
    -0.92
    FTFY
    -0.91
     invin
    -0.90
     guarante
    -0.90
     encomp
    -0.89
    YMMV
    -0.87
     alre
    -0.87
     Lmao
    -0.86
     scrat
    -0.86
     affor
    -0.85
    POSITIVE LOGITS
     that
    0.60
    that
    0.56
     dass
    0.55
    Aholisi
    0.52
    UnusedPrivate
    0.52
     THAT
    0.51
     Roskov
    0.51
     ویکی‌پدی
    0.50
     rằng
    0.50
     everyone
    0.49
    Act Density 0.186%

    No Known Activations