INDEX
Explanations
thematic elements related to cultural or racial identity
New Auto-Interp
Negative Logits
[
-0.20
—
-0.19
...\
-0.19
-0.18
...\
-0.17
...");↵↵
-0.17
,...
-0.16
ãĢĮâ̦â̦
-0.16
“â̦
-0.16
...</
-0.15
POSITIVE LOGITS
(.
0.27
.
0.26
.↵
0.25
".↵
0.25
(.
0.24
".↵
0.22
".
0.22
".
0.21
-.
0.20
().
0.19
Activations Density 0.001%