INDEX
Negative Logits
PLA
-0.07
구글
-0.06
politics
-0.06
flaw
-0.06
248
-0.06
analý
-0.06
Moderator
-0.06
exagger
-0.06
_datasets
-0.06
workspace
-0.06
POSITIVE LOGITS
atypes
0.07
(Bit
0.06
kingdoms
0.06
demise
0.06
оне
0.06
řád
0.06
slu
0.06
cling
0.06
Apartments
0.06
(Il
0.05
Activations Density 0.015%