INDEX
Explanations
words related to morality
discussions related to concepts of morality and ethics
New Auto-Interp
Negative Logits
eding
-0.85
ept
-0.82
location
-0.81
mining
-0.77
upon
-0.74
Roses
-0.72
aways
-0.71
eworld
-0.70
WER
-0.70
berry
-0.70
POSITIVE LOGITS
ocracy
0.85
¿½
0.77
contag
0.75
ocratic
0.71
Petr
0.70
ocrats
0.69
acus
0.67
compass
0.65
prev
0.64
righteousness
0.63
Activations Density 0.020%