INDEX
Explanations
proper nouns, specifically names of people
New Auto-Interp
Negative Logits
ویکیپدی
-0.48
دانشنامهٔ
-0.48
restle
-0.44
ighorn
-0.44
Goy
-0.42
tfrac
-0.42
Grot
-0.41
FromArgb
-0.41
Hadd
-0.40
neth
-0.40
POSITIVE LOGITS
himself
0.64
himself
0.59
himſelf
0.59
但他
0.55
anyahu
0.53
Constitucional
0.53
Obrador
0.52
Jährige
0.47
kardeş
0.46
linawan
0.46
Activations Density 0.089%