INDEX
Negative Logits
while
-1.29
While
-1.10
(
-1.05
before
-1.04
when
-1.01
exuberant
-0.92
особли
-0.90
Киє
-0.90
for
-0.88
at
-0.88
POSITIVE LOGITS
鵙
1.42
茈
1.32
౧
1.18
impon
1.18
艄
1.16
绒
1.14
ఽ
1.13
奩
1.13
蓐
1.10
1.09
Activations Density 0.001%