INDEX
Explanations
proper nouns
special characters or symbols primarily used in non-Latin scripts
New Auto-Interp
Negative Logits
ters
-0.88
rette
-0.83
bluff
-0.80
verts
-0.78
sett
-0.78
fet
-0.77
raviolet
-0.76
zees
-0.75
iannopoulos
-0.72
anooga
-0.72
POSITIVE LOGITS
ĩ
1.25
ī
1.20
Į
1.13
ĥ
1.10
į
1.06
ĭ
1.04
ا
1.03
Ĥ
1.02
à¤
0.96
IJ
0.95
Activations Density 0.005%