INDEX
Explanations
references to languages and cultural identities
New Auto-Interp
Negative Logits
anford
-0.17
utton
-0.17
Cunning
-0.16
458
-0.16
ạm
-0.15
ully
-0.15
ignum
-0.15
unate
-0.15
obre
-0.14
cona
-0.14
POSITIVE LOGITS
dialect
0.28
Standard
0.25
Standard
0.24
spoken
0.20
æłĩåĩĨ
0.19
standard
0.19
-standard
0.19
STANDARD
0.19
spoken
0.19
æ¨Ļæºĸ
0.18
Activations Density 0.037%