INDEX
Explanations
characters from a non-Latin script
New Auto-Interp
Negative Logits
Cly
-0.79
iko
-0.69
abase
-0.69
estern
-0.69
eeper
-0.68
lyak
-0.67
Swim
-0.67
oké
-0.66
Probe
-0.65
yth
-0.65
POSITIVE LOGITS
ا
1.97
Ù
1.95
Ùĩ
1.92
ÙĨ
1.86
اØ
1.86
د
1.86
Ø
1.84
ÙĪ
1.82
ت
1.81
ر
1.76
Activations Density 0.013%