INDEX
Negative Logits
ﺒ
1.39
ﺼ
1.30
ﻴ
1.24
ﻨ
1.23
ﺸ
1.22
VERTISING
1.19
ﺪ
1.16
ﺴ
1.16
ל
1.15
ﺎ
1.15
POSITIVE LOGITS
(
1.03
0.99
prompt
0.91
wood
0.89
|
0.89
cool
0.87
e
0.85
↵↵
0.85
bers
0.85
ei
0.85
Activations Density 0.035%
ﺒ
ﺼ
ﻴ
ﻨ
ﺸ
VERTISING
ﺪ
ﺴ
ל
ﺎ
(
prompt
wood
|
cool
e
↵↵
bers
ei