INDEX
Negative Logits
ri
0.75
n
0.70
ling
0.68
p
0.65
k
0.63
lim
0.61
ro
0.61
he
0.60
roo
0.60
s
0.59
POSITIVE LOGITS
ס
0.89
ו
0.81
ج
0.73
ּ
0.70
ג
0.68
ل
0.68
불구하고
0.68
ד
0.67
ر
0.63
𝙤
0.63
Activations Density 0.000%
ri
n
ling
p
k
lim
ro
he
roo
s
ס
ו
ج
ּ
ג
ل
불구하고
ד
ر
𝙤