INDEX
Explanations
references to Chinese culture or the Chinese community
New Auto-Interp
Negative Logits
hea
-0.18
uros
-0.18
finger
-0.16
aos
-0.16
Suc
-0.16
gaard
-0.15
iros
-0.14
geber
-0.14
ingen
-0.14
ingham
-0.14
POSITIVE LOGITS
ook
0.30
atown
0.29
OOK
0.23
ooks
0.23
agraph
0.20
Chin
0.20
strap
0.19
mayan
0.18
Swe
0.18
apon
0.17
Activations Density 0.009%