INDEX
Negative Logits
restrooms
0.41
lako
0.40
ventilators
0.40
motivator
0.40
楨
0.40
opters
0.39
axles
0.39
locomotives
0.39
motivated
0.39
motivating
0.38
POSITIVE LOGITS
έναν
0.45
στι
0.42
שני
0.40
answ
0.39
γεν
0.39
Brug
0.39
βαθ
0.38
watch
0.37
ński
0.36
Ring
0.36
Activations Density 0.003%