INDEX
Negative Logits
Uhm
-0.97
$_"
-0.94
"..\..\
-0.91
للمعارف
-0.91
Bambi
-0.91
Lom
-0.90
Radu
-0.88
Pelt
-0.88
__)
-0.88
Schrö
-0.87
POSITIVE LOGITS
ing
1.25
King
1.04
KING
1.02
King
0.99
ING
0.96
i
0.92
Kings
0.91
o
0.81
king
0.79
king
0.76
Activations Density 0.010%