INDEX
Negative Logits
Magdal
-0.09
god
-0.09
domin
-0.09
grad
-0.08
же
-0.08
Golf
-0.08
Kā
-0.08
represented
-0.08
excluded
-0.08
flip
-0.07
POSITIVE LOGITS
&&
0.10
inducing
0.09
induce
0.09
induced
0.08
]);
0.08
obtaining
0.08
ਤੇ
0.08
ਮਹ
0.08
获取
0.08
assumes
0.07
Activations Density 0.041%