INDEX
Explanations
terms related to criticism of societal norms and behaviors, particularly focusing on perceived stupidity and hypocrisy
New Auto-Interp
Negative Logits
elper
-0.16
째
-0.16
_Reset
-0.14
648
-0.14
orus
-0.14
Kind
-0.14
408
-0.14
884
-0.14
ãĥĬãĥ¼
-0.14
856
-0.14
POSITIVE LOGITS
оÑģÑĤÑĮ
0.16
.Path
0.15
GED
0.15
Ders
0.14
nat
0.14
reira
0.14
EGA
0.14
CONTRIBUTORS
0.14
оÑģÑĤи
0.14
ul
0.14
Activations Density 0.416%