INDEX
Explanations
Spanish/French definite articles
New Auto-Interp
Negative Logits
1
2.06
2
1.98
9
1.95
들이
1.91
7
1.85
5
1.80
8
1.79
3
1.74
4
1.73
offic
1.72
POSITIVE LOGITS
아
1.99
มี
1.92
┆
1.73
Desarrollo
1.66
ู
1.66
ิน
1.61
و
1.60
ڙ
1.55
urile
1.50
𝘔
1.49
Activations Density 0.081%