INDEX
Negative Logits
�
-0.09
ayan
-0.08
([
-0.08
staw
-0.07
LICK
-0.07
łącz
-0.07
iedy
-0.07
�
-0.07
나타
-0.07
Ош
-0.07
POSITIVE LOGITS
genuine
0.11
legitimate
0.10
respectful
0.10
responsibly
0.09
reputable
0.09
безопас
0.09
sincerely
0.09
safer
0.09
正规的
0.09
safely
0.09
Activations Density 0.056%