INDEX
Explanations
phrases addressing racial and ethnic minority groups
New Auto-Interp
Negative Logits
RectangleBorder
-1.05
الحره
-1.02
BibitemShut
-1.00
AssemblyTitle
-0.97
InjectAttribute
-0.97
Hauptartikel
-0.96
AnchorStyles
-0.95
Италијани
-0.94
MemoryWarning
-0.93
ConstraintMaker
-0.93
POSITIVE LOGITS
<eos>
0.70
↵↵
0.60
and
0.59
h
0.55
e
0.54
E
0.52
‘
0.52
I
0.51
p
0.51
P
0.51
Activations Density 0.339%