INDEX
Negative Logits
själv
0.60
selv
0.60
Andre
0.55
selbst
0.55
اند
0.54
Sel
0.53
自
0.52
Ander
0.51
само
0.50
Andre
0.50
POSITIVE LOGITS
lad
0.52
Lad
0.50
Lad
0.42
icin
0.39
ơm
0.36
סף
0.36
felt
0.36
purchases
0.36
fork
0.36
oor
0.35
Activations Density 0.000%