INDEX
Explanations
countries and their actions
New Auto-Interp
Negative Logits
u
0.49
ed
0.46
lardan
0.44
larda
0.42
ın
0.41
ين
0.40
iş
0.40
dır
0.39
as
0.39
?
0.39
POSITIVE LOGITS
to
0.38
professores
0.38
鸮
0.37
۰
0.33
("0.33
दोन
0.32
ACCOUNT
0.32
(
0.32
investigadores
0.32
it
0.31
Activations Density 0.264%