INDEX
Negative Logits
труд
0.41
claras
0.38
">
0.38
しやすい
0.37
委会
0.36
/
0.35
کھیلنا
0.35
saddhim
0.35
infidelity
0.35
क्र
0.35
POSITIVE LOGITS
note
0.61
beware
0.57
beachten
0.54
PLEASE
0.54
please
0.51
forgive
0.51
don
0.50
excuse
0.50
見
0.50
цкі
0.49
Activations Density 0.019%