INDEX
Explanations
themes related to justice and societal issues
New Auto-Interp
Negative Logits
çīĪ
-0.74
confir
-0.61
»Ĵ
-0.60
adel
-0.58
liga
-0.55
idav
-0.55
ij士
-0.55
iple
-0.54
mentioned
-0.54
ilde
-0.54
POSITIVE LOGITS
unfairly
0.82
rapists
0.74
biologically
0.71
environmentally
0.68
"â̦
0.67
pedoph
0.65
terrorists
0.65
"
0.64
"'
0.64
psychologically
0.64
Activations Density 0.680%