INDEX
Explanations
references to social class and political critique
New Auto-Interp
Negative Logits
.fx
-0.16
Truy
-0.14
cigaret
-0.14
Crazy
-0.14
ãģ¤ãģ¶
-0.14
ichtig
-0.14
meli
-0.14
Ghost
-0.13
Ghost
-0.13
ppo
-0.13
POSITIVE LOGITS
uten
0.16
rotten
0.15
bour
0.15
ap
0.14
arkin
0.14
subjective
0.13
bourgeois
0.13
Ñij
0.13
etc
0.13
ancy
0.13
Activations Density 0.011%