INDEX
Explanations
concept followed by specification
New Auto-Interp
Negative Logits
ع
1.90
ح
1.82
न
1.58
ומ
1.55
ق
1.43
的国家
1.39
ছে
1.38
ج
1.33
мо
1.31
ﻡ
1.30
POSITIVE LOGITS
el
1.69
ional
1.66
们的
1.55
sonucu
1.52
জনক
1.43
යෙන්
1.43
okhlov
1.42
wie
1.39
्हे
1.39
ibility
1.38
Activations Density 0.007%