INDEX
Explanations
words related to glacial and icy environments
terms related to ethnicity and racial categories
New Auto-Interp
Negative Logits
href
-0.69
debian
-0.68
uden
-0.65
UG
-0.65
zig
-0.65
rb
-0.64
Recomm
-0.63
rug
-0.63
Lexington
-0.61
Later
-0.61
POSITIVE LOGITS
acial
1.26
anguage
0.87
vertisements
0.79
gestures
0.78
gemony
0.76
gments
0.75
feats
0.75
citiz
0.75
slurs
0.73
opian
0.73
Activations Density 0.009%