INDEX
Negative Logits
othed
0.70
↵↵
0.56
речь
0.55
ones
0.54
آموز
0.54
пре
0.52
具有
0.52
_|
0.51
UPD
0.51
rotated
0.51
POSITIVE LOGITS
này
0.93
embarrassing
0.92
นี้
0.90
glad
0.89
alot
0.89
questo
0.89
annoying
0.88
नं
0.85
intéressant
0.85
grateful
0.85
Activations Density 0.002%