INDEX
Explanations
samurai, loyalty, independence
New Auto-Interp
Negative Logits
నిజ
0.55
绁
0.53
નક્કી
0.53
甕
0.52
िक्र
0.51
πολλ
0.51
loopholes
0.50
Unifier
0.49
ነገ
0.49
sasane
0.49
POSITIVE LOGITS
was
0.57
at
0.54
di
0.48
0.47
of
0.46
IS
0.43
IS
0.43
since
0.42
April
0.41
May
0.41
Activations Density 0.004%