INDEX
Explanations
content that critiques societal issues and injustices
New Auto-Interp
Negative Logits
nors
-0.15
irit
-0.13
Beled
-0.13
Uint
-0.13
kariy
-0.12
aktivit
-0.12
Aura
-0.12
icorn
-0.11
nebu
-0.11
Famil
-0.11
POSITIVE LOGITS
history
0.26
America
0.25
capitalism
0.24
Hitler
0.24
fascism
0.24
Adolf
0.22
communism
0.22
Roosevelt
0.22
Eisenhower
0.21
Stalin
0.21
Activations Density 1.368%