INDEX
Explanations
social concepts and interactions
New Auto-Interp
Negative Logits
وم
0.88
on
0.72
يا
0.68
ك
0.68
كيد
0.65
ين
0.61
рили
0.59
ح
0.58
سا
0.57
新
0.56
POSITIVE LOGITS
t
1.09
socially
1.06
सामाजिक
1.05
sociais
1.03
sociali
1.02
social
1.01
اجتماعی
1.01
sosial
1.00
SOCIAL
1.00
사회
0.98
Activations Density 0.033%