INDEX
Explanations
relative clauses and specific entities
New Auto-Interp
Negative Logits
c
0.91
1
0.84
на
0.84
2
0.75
ت
0.75
b
0.74
g
0.73
्री
0.73
ة
0.71
م
0.68
POSITIVE LOGITS
Bedingungen
0.93
🕜
0.88
ľud
0.88
obat
0.87
nedenle
0.84
📪
0.84
檚
0.83
provinsi
0.82
njia
0.82
giants
0.82
Activations Density 1.573%