INDEX
Explanations
the presence of phrases indicating origins or sources of actions
New Auto-Interp
Negative Logits
serializers
-0.35
engines
-0.33
Biografía
-0.33
iformes
-0.31
ris
-0.31
présidentielle
-0.30
académica
-0.30
darauf
-0.30
adaran
-0.30
uttgart
-0.30
POSITIVE LOGITS
fromnode
0.79
from
0.77
vanuit
0.71
from
0.70
FROM
0.69
From
0.69
จาก
0.68
From
0.68
từ
0.64
Administrativna
0.64
Activations Density 0.447%