INDEX
Negative Logits
usehen
0.42
akey
0.41
tf
0.38
동
0.38
ocity
0.38
呈
0.38
槸
0.37
انا
0.36
phony
0.36
లేదు
0.36
POSITIVE LOGITS
ridiculed
0.46
criticised
0.42
अज्ञात
0.39
praised
0.39
criticized
0.39
voluntarily
0.39
Oud
0.38
хозяй
0.38
scratching
0.38
входя
0.37
Activations Density 0.009%