INDEX
Explanations
phrases indicating confirmation or negation in a context of prior actions or states
New Auto-Interp
Negative Logits
InitVars
-0.75
schi
-0.68
gillar
-0.68
NamedQueries
-0.66
skär
-0.65
scolaires
-0.65
zeczytaj
-0.64
nemlig
-0.64
rød
-0.63
convaincre
-0.61
POSITIVE LOGITS
sudah
1.13
Sudah
1.04
đã
0.98
уже
0.98
Sudah
0.96
已
0.94
telah
0.92
já
0.92
Уже
0.91
Уже
0.86
Activations Density 0.137%