INDEX
Negative Logits
srf
-0.86
ãĥĦ
-0.86
ãĥŃ
-0.82
nown
-0.77
Ń·
-0.76
ãĥ³ãĤ¸
-0.72
uyomi
-0.72
ãĥ¼ãĥĨ
-0.71
pione
-0.69
swick
-0.68
POSITIVE LOGITS
answers
0.98
clarification
0.93
accountability
0.86
explanations
0.86
payment
0.83
obedience
0.83
attention
0.82
acknowledgement
0.82
authenticity
0.82
eer
0.80
Activations Density 0.064%