INDEX
    Explanations

    Refusals and warnings

    New Auto-Interp
    Negative Logits
     tevreden
    -0.08
     некоторое
    -0.08
    -0.08
     modest
    -0.07
     ocas
    -0.07
     tweaking
    -0.07
     some
    -0.07
     немного
    -0.07
     qualche
    -0.07
    али
    -0.07
    POSITIVE LOGITS
     গুরু
    0.10
    严重
    0.09
    unless
    0.09
     unethical
    0.09
     తీవ
    0.09
     conseils
    0.09
     قانونی
    0.09
     ernstige
    0.09
     discouraged
    0.09
     advisable
    0.09
    Act Density 0.060%

    No Known Activations