INDEX
Explanations
refusal to generate sexually explicit content
New Auto-Interp
Negative Logits
etiam
0.87
Also
0.86
also
0.85
히려
0.80
també
0.78
szint
0.77
आल्सो
0.77
సరం
0.77
cũng
0.77
juga
0.76
POSITIVE LOGITS
அந்தக்
0.76
[…]
0.73
quei
0.72
détermination
0.69
determinato
0.67
ALLE
0.66
DESIGN
0.66
determination
0.66
अत्यंत
0.66
<unused2221>
0.66
Activations Density 0.483%