INDEX
Explanations
words associated with social critique and moral accountability
New Auto-Interp
Negative Logits
ittel
-0.15
Enumerator
-0.15
veis
-0.14
éľ
-0.14
-Headers
-0.14
è¾°
-0.14
rou
-0.14
orrect
-0.13
ollider
-0.13
persona
-0.13
POSITIVE LOGITS
Perc
0.15
nackte
0.14
:::
0.14
447
0.14
Pare
0.14
uve
0.14
Kling
0.14
gew
0.14
Rudd
0.13
Middle
0.13
Activations Density 0.333%