INDEX
    Explanations

    refusal to generate sexually explicit content

    New Auto-Interp
    Negative Logits
     etiam
    0.87
     Also
    0.86
     also
    0.85
    히려
    0.80
     també
    0.78
     szint
    0.77
     आल्सो
    0.77
    సరం
    0.77
     cũng
    0.77
     juga
    0.76
    POSITIVE LOGITS
     அந்தக்
    0.76
     […]
    0.73
     quei
    0.72
     détermination
    0.69
     determinato
    0.67
    ALLE
    0.66
    DESIGN
    0.66
     determination
    0.66
     अत्यंत
    0.66
    <unused2221>
    0.66
    Act Density 0.483%

    No Known Activations