INDEX
Explanations
expressions of desire or reluctance to engage in activities or confrontations
New Auto-Interp
Negative Logits
IGO
-0.17
atch
-0.15
avo
-0.14
tein
-0.14
scribed
-0.14
verg
-0.14
roman
-0.14
ÙħاÙĨÛĮ
-0.14
urbation
-0.14
igo
-0.14
POSITIVE LOGITS
anymore
0.22
γκο
0.17
ÑĦи
0.16
Aceptar
0.16
Lose
0.16
loses
0.15
orte
0.15
nor
0.15
ëŀ
0.15
ä»»ä½ķ
0.15
Activations Density 0.066%