INDEX
Explanations
abstract opinions and qualities
New Auto-Interp
Negative Logits
↵
1.51
↵↵
0.98
<start_of_image>
0.94
↵↵↵
0.89
。”
0.82
!">
0.81
.”)
0.80
!”
0.77
?”
0.76
.:
0.76
POSITIVE LOGITS
[];
1.20
;,
1.19
;,
1.15
$;
1.11
`;
1.09
>;</
1.09
;",
1.08
{};1.07
}$;
1.06
*;
1.05
Activations Density 0.169%