INDEX
Explanations
critique of progressive identity politics
New Auto-Interp
Negative Logits
hemi
0.47
greedy
0.40
കിട
0.40
Konrad
0.39
驛
0.39
रॉयल
0.39
geom
0.39
铤
0.39
粨
0.38
typhoid
0.37
POSITIVE LOGITS
woke
1.02
woken
0.95
activism
0.88
activist
0.86
progressive
0.85
activists
0.80
PC
0.72
identity
0.71
progressive
0.71
SJ
0.71
Activations Density 0.031%