INDEX
Explanations
mentions of racism and controversy surrounding public figures
New Auto-Interp
Negative Logits
iek
-0.14
increment
-0.14
AGMENT
-0.14
958
-0.14
increments
-0.14
ä»ĭ
-0.14
enberg
-0.13
croft
-0.13
adesh
-0.13
veillance
-0.13
POSITIVE LOGITS
insensitive
0.38
Insensitive
0.29
offensive
0.28
Offensive
0.25
remarks
0.24
ensitive
0.23
comments
0.23
racially
0.22
sensitive
0.22
remarks
0.21
Activations Density 0.060%