INDEX
Explanations
references to human rights abuses and injustices
New Auto-Interp
Negative Logits
anzi
-0.09
ampus
-0.08
enen
-0.08
porr
-0.07
acock
-0.07
fos
-0.07
ilece
-0.07
lant
-0.07
chwitz
-0.07
umbn
-0.07
POSITIVE LOGITS
often
0.11
sometimes
0.09
often
0.09
souvent
0.08
Often
0.07
usually
0.07
Sometimes
0.07
sometimes
0.07
oft
0.07
Often
0.07
Activations Density 0.006%