INDEX
Explanations
names with suffixes
names of people
New Auto-Interp
Negative Logits
ي
0.88
as
0.73
و
0.71
o
0.67
an
0.65
us
0.65
i
0.64
ח
0.63
ни
0.63
ح
0.63
POSITIVE LOGITS
ı
0.51
ഇത്
0.49
लाइक
0.49
🛰
0.48
保湿
0.47
哽
0.47
🈺
0.47
wirelessly
0.46
decentral
0.46
മാത്രമേ
0.46
Activations Density 0.059%