INDEX
Explanations
names of political figures or entities
references to prominent political figures and their statements
New Auto-Interp
Negative Logits
!.
-0.74
.�
-0.70
}.
-0.69
.''
-0.67
.$
-0.63
.ãĢį
-0.62
''.
-0.61
.--
-0.60
sqor
-0.58
utterstock
-0.58
POSITIVE LOGITS
hadn
0.89
should
0.85
had
0.83
shouldn
0.78
lacked
0.77
discriminated
0.74
lacks
0.74
could
0.72
violated
0.71
behaved
0.69
Activations Density 0.893%