INDEX
Explanations
mentions of protest songs and their cultural significance
New Auto-Interp
Negative Logits
Ars
-0.17
Chore
-0.17
.toolbox
-0.16
dance
-0.15
Robotics
-0.15
Ukraj
-0.15
رÙĤ
-0.14
çĵľ
-0.14
chore
-0.14
èĪŀ
-0.14
POSITIVE LOGITS
Dylan
0.61
Bob
0.48
Bob
0.42
bob
0.38
ylan
0.37
bob
0.33
DY
0.32
DY
0.30
Dy
0.27
Blonde
0.26
Activations Density 0.014%