INDEX
    Explanations

    AI limitations and refusals

    New Auto-Interp
    Negative Logits
    乐观
    0.79
     nice
    0.79
    ranking
    0.75
     зробити
    0.75
     guesses
    0.74
     മികച്ച
    0.74
     मजा
    0.73
     अच्छा
    0.71
     easy
    0.70
    чать
    0.70
    POSITIVE LOGITS
    Again
    1.01
     পরিবর্তিত
    0.96
    如果您
    0.96
     novamente
    0.94
     Again
    0.92
     refusal
    0.89
    Notwithstanding
    0.89
     erneut
    0.89
     reaff
    0.88
    Despite
    0.88
    Act Density 0.294%

    No Known Activations