INDEX
Negative Logits
erroneously
0.44
correspondingly
0.40
provocative
0.39
optimizations
0.39
osar
0.38
Agree
0.38
analysis
0.38
publicized
0.38
ವರೆ
0.37
somebody
0.37
POSITIVE LOGITS
μά
0.44
ιδ
0.43
τσ
0.40
male
0.40
Fleet
0.39
恹
0.39
MOT
0.39
)
0.38
ίν
0.38
ת
0.38
Activations Density 0.001%