INDEX
    Explanations

    expressing negative quality

    New Auto-Interp
    Negative Logits
    0.79
    0.78
    е
    0.77
    0.73
    та
    0.73
    0.71
    مر
    0.69
    0.69
    0.67
    秋冬
    0.67
    POSITIVE LOGITS
    </h2>
    0.79
    ",
    0.78
     ciri
    0.77
     are
    0.70
    ש
    0.68
    지만
    0.68
    0.67
    <0x0D>
    0.66
    0.66
     vaso
    0.64
    Act Density 0.001%

    No Known Activations