INDEX
Negative Logits
'
0.97
ot
0.80
-
0.78
)
0.73
),
0.69
ator
0.68
ott
0.67
ists
0.66
els
0.65
ong
0.65
POSITIVE LOGITS
ين
0.85
W
0.82
A
0.80
H
0.78
K
0.76
V
0.73
ر
0.72
お金
0.70
د
0.69
棻
0.68
Activations Density 0.008%
'
ot
-
)
),
ator
ott
ists
els
ong
ين
W
A
H
K
V
ر
お金
د
棻