INDEX
Explanations
words related to specific ethnic or cultural identities
New Auto-Interp
Negative Logits
sar
-0.20
ted
-0.19
ses
-0.19
go
-0.18
shan
-0.17
tt
-0.17
ger
-0.16
gable
-0.16
gie
-0.16
scape
-0.16
POSITIVE LOGITS
apolis
0.31
ism
0.28
alysis
0.28
isme
0.24
thus
0.23
ische
0.22
-American
0.21
omics
0.21
stvo
0.20
ismo
0.20
Activations Density 0.093%