INDEX
Explanations
elements related to political discourse and issues
New Auto-Interp
Negative Logits
.",
-0.94
/−
-0.92
.";
-0.87
‟
-0.87
."),
-0.84
"),
-0.82
)•
-0.82
)−
-0.81
?—
-0.79
(−
-0.78
POSITIVE LOGITS
!!
1.52
''
1.27
**
1.24
!!!
1.19
[[
1.19
??
1.18
''
1.17
***
1.17
!!
1.14
'''
1.12
Activations Density 0.822%