INDEX
Explanations
references to nationality or citizenship, particularly American and English identities
New Auto-Interp
Negative Logits
inspace
-0.16
zburg
-0.16
bate
-0.15
oyer
-0.14
vinc
-0.14
WithType
-0.14
Ade
-0.14
cord
-0.14
atk
-0.14
cdecl
-0.14
POSITIVE LOGITS
avern
0.18
alth
0.15
ADDE
0.15
ukan
0.14
bench
0.14
κÏĮ
0.13
ran
0.13
ká
0.13
aven
0.13
appen
0.13
Activations Density 0.015%