INDEX
Explanations
political and philosophical ideologies and principles
concepts related to values and principles
New Auto-Interp
Negative Logits
soon
-0.87
gren
-0.73
rm
-0.73
̶
-0.71
ptoms
-0.71
fold
-0.70
BE
-0.69
NetMessage
-0.69
è£ıè¦ļéĨĴ
-0.69
Ples
-0.68
POSITIVE LOGITS
egalitarian
1.36
individuality
1.33
liberty
1.32
equality
1.30
patriotism
1.30
humility
1.30
purity
1.28
volunt
1.27
fairness
1.26
tolerance
1.26
Activations Density 0.288%