INDEX
Explanations
AI assistant explicit content refusal
New Auto-Interp
Negative Logits
“[
0.52
/[
0.51
[
0.49
:][
0.48
]=[
0.48
[][]
0.47
,[
0.47
:]
0.46
][
0.46
:[
0.46
POSITIVE LOGITS
"(
0.56
"(
0.55
“(
0.53
。(
0.49
...(
0.49
“(
0.49
ıkl
0.47
).(
0.47
'(
0.46
)(
0.44
Activations Density 0.006%