INDEX
Explanations
references to the term "Black" in various contexts
New Auto-Interp
Negative Logits
uls
-0.16
å®Ĺ
-0.16
obus
-0.15
oot
-0.14
elta
-0.14
336
-0.14
ussy
-0.14
oh
-0.14
lue
-0.14
icles
-0.14
POSITIVE LOGITS
bl
0.27
Bl
0.27
/bl
0.25
.Bl
0.22
.bl
0.20
anche
0.19
-bl
0.19
Bl
0.19
anks
0.18
BL
0.18
Activations Density 0.019%