INDEX
Negative Logits
condivid
0.50
kanske
0.46
之事
0.46
invests
0.42
khart
0.41
chose
0.41
ق
0.41
ativi
0.41
برخی
0.41
verwij
0.41
POSITIVE LOGITS
That
0.41
髏
0.38
욱
0.36
✿
0.35
Ско
0.35
それ
0.35
λ
0.35
Che
0.34
).
0.34
).^
0.33
Activations Density 0.000%