INDEX
Negative Logits
ografija
-0.58
Contactez
-0.57
Identyfik
-0.54
esercito
-0.53
Personendaten
-0.53
ilang
-0.52
Géographie
-0.52
seldom
-0.50
sleeps
-0.50
]').
-0.50
POSITIVE LOGITS
المعيارى
0.56
0.49
using
0.48
发表于
0.46
developing
0.46
using
0.45
تضيفلها
0.45
,
0.45
producing
0.45
use
0.45
Activations Density 0.002%