INDEX
Explanations
words associated with actions and social responsibilities
New Auto-Interp
Negative Logits
viar
-0.16
isting
-0.16
ÙĨدگÛĮ
-0.14
dep
-0.14
udoku
-0.14
Pret
-0.14
Fell
-0.14
rello
-0.13
ivas
-0.13
itals
-0.13
POSITIVE LOGITS
ée
0.47
és
0.40
ées
0.39
é
0.39
né
0.26
auté
0.21
lé
0.21
owany
0.21
ables
0.20
ant
0.20
Activations Density 0.027%