INDEX
Negative Logits
has
0.40
control
0.39
Span
0.39
공
0.38
favorit
0.38
خوف
0.38
공
0.37
controlling
0.37
Control
0.36
방식으로
0.35
POSITIVE LOGITS
المؤس
0.46
Bxb
0.45
ဋ
0.41
插
0.40
учре
0.40
㓩
0.40
dade
0.39
Elemente
0.39
Produkte
0.39
ক্যারি
0.39
Activations Density 0.001%