INDEX
Negative Logits
щ
-0.06
prejudices
-0.06
-double
-0.06
iggs
-0.06
soothing
-0.06
através
-0.06
уяв
-0.05
rağmen
-0.05
politic
-0.05
않았다
-0.05
POSITIVE LOGITS
tarihli
0.07
USE
0.07
locations
0.07
happens
0.07
回
0.06
stopping
0.06
.Man
0.06
originally
0.06
сдел
0.06
ительным
0.06
Activations Density 0.023%