INDEX
Explanations
language articles or determiners
New Auto-Interp
Negative Logits
هذا
0.97
一個
0.97
一个
0.93
남
0.91
ein
0.90
một
0.90
で
0.89
ると
0.89
the
0.88
と
0.86
POSITIVE LOGITS
شارة
0.77
situation
0.75
guidance
0.74
situazione
0.74
Vielzahl
0.73
considered
0.73
ości
0.72
situação
0.69
ინის
0.69
measured
0.68
Activations Density 0.143%