INDEX
Negative Logits
presumably
-0.10
incompetent
-0.08
irrelevant
-0.08
murderous
-0.08
cov
-0.08
trivial
-0.08
!!!↵
-0.08
violation
-0.07
complied
-0.07
Lucky
-0.07
POSITIVE LOGITS
takeaway
0.10
%左右
0.10
среднего
0.09
süd
0.09
рекомендуется
0.09
راوح
0.09
moderately
0.09
вари
0.09
примерно
0.09
정도
0.09
Activations Density 0.064%