INDEX
Negative Logits
pomoći
0.42
નિ
0.41
authorize
0.41
应该是
0.41
忝
0.40
неоп
0.40
mouseenter
0.39
autor
0.39
asigur
0.38
authorizes
0.38
POSITIVE LOGITS
猖
0.61
threatens
0.59
threatening
0.57
threaten
0.56
prow
0.53
attacking
0.51
menacing
0.51
embold
0.50
menace
0.50
attacks
0.49
Activations Density 0.018%