INDEX
Negative Logits
.beh
-0.28
rys
-0.27
FP
-0.27
Behavior
-0.27
behavior
-0.27
leh
-0.26
Occup
-0.25
rc
-0.25
under
-0.24
å¼Ģåıij
-0.24
POSITIVE LOGITS
æĪij羣çļĦ
0.30
ä¹ĭä½ľ
0.30
ÑĪа
0.29
errat
0.28
retro
0.28
ffa
0.27
ä¸įåIJĥ
0.27
çļĦåĨ³å¿ĥ
0.26
agna
0.26
.bat
0.26
Activations Density 0.016%