INDEX
Explanations
phrases related to moral values and ethical principles
New Auto-Interp
Negative Logits
backer
-0.76
stal
-0.70
anie
-0.70
ETA
-0.68
biz
-0.66
Guard
-0.65
bg
-0.63
inar
-0.61
aukee
-0.61
gallery
-0.60
POSITIVE LOGITS
there
0.79
homosexuality
0.76
"[
0.75
although
0.73
"...
0.70
preserving
0.69
'[
0.69
legalizing
0.68
"â̦
0.67
they
0.66
Activations Density 0.181%