INDEX
Explanations
political and societal topics of discussion
concepts related to societal beliefs and moral perceptions
New Auto-Interp
Negative Logits
adel
-0.73
çīĪ
-0.72
ierrez
-0.63
mentioned
-0.63
arthed
-0.61
rawdownloadcloneembedreportprint
-0.59
ourney
-0.57
sidx
-0.57
idav
-0.57
aback
-0.57
POSITIVE LOGITS
somehow
0.95
morally
0.93
infall
0.84
inherently
0.83
superior
0.82
immutable
0.81
oppressed
0.79
innate
0.79
evils
0.78
sufficiently
0.78
Activations Density 0.840%