INDEX
Explanations
phrases related to controversies and legal matters
phrases related to political scandals and their implications
New Auto-Interp
Negative Logits
.''.
-0.66
>.
-0.58
anwhile
-0.57
'.
-0.56
`.
-0.56
.�
-0.54
.}
-0.51
"!
-0.51
$.
-0.50
.''
-0.48
POSITIVE LOGITS
,[
1.03
?,
1.02
(),
0.91
!,
0.89
,
0.86
*,
0.85
,
0.85
+,
0.83
®,
0.82
%,
0.79
Activations Density 1.652%