INDEX
Negative Logits
阿拉伯
0.46
Тере
0.45
وعه
0.44
announced
0.43
合理
0.43
创造
0.42
izzo
0.42
الماء
0.42
🧟
0.42
ರವಾಗಿ
0.42
POSITIVE LOGITS
feel
0.44
ết
0.44
씬
0.42
n
0.41
ên
0.41
postdoctoral
0.41
spoof
0.41
ਨੂੰ
0.40
nics
0.40
mu
0.40
Activations Density 0.001%