INDEX
Negative Logits
achieves
0.46
queues
0.46
weaknesses
0.43
해야
0.43
해야
0.41
inhal
0.41
высокую
0.41
flam
0.39
violators
0.39
zvyš
0.39
POSITIVE LOGITS
know
0.87
biết
0.79
know
0.77
знать
0.69
知道
0.67
understand
0.64
Know
0.63
Know
0.61
知
0.61
જાણ
0.59
Activations Density 0.000%