INDEX
Explanations
various forms of the word "race" and related terms
New Auto-Interp
Negative Logits
shire
-0.19
aires
-0.16
anguages
-0.16
589
-0.15
-HT
-0.15
ately
-0.15
ency
-0.15
self
-0.15
sh
-0.15
ness
-0.15
POSITIVE LOGITS
horse
0.19
erp
0.16
TokenType
0.16
/umd
0.16
presso
0.16
LAN
0.15
course
0.14
ovnÃŃ
0.14
dirty
0.14
ourcem
0.14
Activations Density 0.028%