INDEX
Negative Logits
а
0.39
на
0.38
这是
0.37
वायरिंग
0.37
This
0.36
다
0.36
это
0.34
这是
0.34
What
0.33
larını
0.33
POSITIVE LOGITS
malignant
0.39
rebellious
0.37
charismatic
0.36
sufferings
0.36
fanatic
0.36
ideology
0.36
sentimento
0.36
talento
0.35
corrupted
0.35
hedon
0.34
Activations Density 0.001%