INDEX
Explanations
words related to racial or ethnic identity
references to the word "ir."
New Auto-Interp
Negative Logits
ĸļ
-0.76
İĭ
-0.72
CLSID
-0.72
Vinyl
-0.69
Avalon
-0.68
erker
-0.66
Canary
-0.66
plain
-0.64
YC
-0.63
Patreon
-0.63
POSITIVE LOGITS
rha
1.06
vana
1.00
andom
0.90
cles
0.88
ROR
0.83
onda
0.82
ilateral
0.82
ror
0.82
ashi
0.80
respective
0.80
Activations Density 0.013%